Sample records for throughput functional genomics

  1. Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR library

    PubMed Central

    Zhu, Shiyou; Li, Wei; Liu, Jingze; Chen, Chen-Hao; Liao, Qi; Xu, Ping; Xu, Han; Xiao, Tengfei; Cao, Zhongzheng; Peng, Jingyu; Yuan, Pengfei; Brown, Myles; Liu, Xiaole Shirley; Wei, Wensheng

    2017-01-01

    CRISPR/Cas9 screens have been widely adopted to analyse coding gene functions, but high throughput screening of non-coding elements using this method is more challenging, because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. Herein, we report a high throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We individually validated 9 lncRNAs using CRISPR/Cas9-mediated genomic deletion and functional rescue, CRISPR activation or inhibition, and gene expression profiling. Our high-throughput pgRNA genome deletion method should enable rapid identification of functional mammalian non-coding elements. PMID:27798563

  2. Next-Generation High-Throughput Functional Annotation of Microbial Genomes.

    PubMed

    Baric, Ralph S; Crosson, Sean; Damania, Blossom; Miller, Samuel I; Rubin, Eric J

    2016-10-04

    Host infection by microbial pathogens cues global changes in microbial and host cell biology that facilitate microbial replication and disease. The complete maps of thousands of bacterial and viral genomes have recently been defined; however, the rate at which physiological or biochemical functions have been assigned to genes has greatly lagged. The National Institute of Allergy and Infectious Diseases (NIAID) addressed this gap by creating functional genomics centers dedicated to developing high-throughput approaches to assign gene function. These centers require broad-based and collaborative research programs to generate and integrate diverse data to achieve a comprehensive understanding of microbial pathogenesis. High-throughput functional genomics can lead to new therapeutics and better understanding of the next generation of emerging pathogens by rapidly defining new general mechanisms by which organisms cause disease and replicate in host tissues and by facilitating the rate at which functional data reach the scientific community. Copyright © 2016 Baric et al.

  3. Characterization of noncoding regulatory DNA in the human genome.

    PubMed

    Elkon, Ran; Agami, Reuven

    2017-08-08

    Genetic variants associated with common diseases are usually located in noncoding parts of the human genome. Delineation of the full repertoire of functional noncoding elements, together with efficient methods for probing their biological roles, is therefore of crucial importance. Over the past decade, DNA accessibility and various epigenetic modifications have been associated with regulatory functions. Mapping these features across the genome has enabled researchers to begin to document the full complement of putative regulatory elements. High-throughput reporter assays to probe the functions of regulatory regions have also been developed but these methods separate putative regulatory elements from the chromosome so that any effects of chromatin context and long-range regulatory interactions are lost. Definitive assignment of function(s) to putative cis-regulatory elements requires perturbation of these elements. Genome-editing technologies are now transforming our ability to perturb regulatory elements across entire genomes. Interpretation of high-throughput genetic screens that incorporate genome editors might enable the construction of an unbiased map of functional noncoding elements in the human genome.

  4. NCBI GEO: archive for high-throughput functional genomic data.

    PubMed

    Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Edgar, Ron

    2009-01-01

    The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as 'Minimum Information About a Microarray Experiment' (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

  5. ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.

    PubMed

    Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie

    2014-02-01

    Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.

  6. A Perspective on the Future of High-Throughput RNAi Screening: Will CRISPR Cut Out the Competition or Can RNAi Help Guide the Way?

    PubMed

    Taylor, Jessica; Woodcock, Simon

    2015-09-01

    For more than a decade, RNA interference (RNAi) has brought about an entirely new approach to functional genomics screening. Enabling high-throughput loss-of-function (LOF) screens against the human genome, identifying new drug targets, and significantly advancing experimental biology, RNAi is a fast, flexible technology that is compatible with existing high-throughput systems and processes; however, the recent advent of clustered regularly interspaced palindromic repeats (CRISPR)-Cas, a powerful new precise genome-editing (PGE) technology, has opened up vast possibilities for functional genomics. CRISPR-Cas is novel in its simplicity: one piece of easily engineered guide RNA (gRNA) is used to target a gene sequence, and Cas9 expression is required in the cells. The targeted double-strand break introduced by the gRNA-Cas9 complex is highly effective at removing gene expression compared to RNAi. Together with the reduced cost and complexity of CRISPR-Cas, there is the realistic opportunity to use PGE to screen for phenotypic effects in a total gene knockout background. This review summarizes the exciting development of CRISPR-Cas as a high-throughput screening tool, comparing its future potential to that of well-established RNAi screening techniques, and highlighting future challenges and opportunities within these disciplines. We conclude that the two technologies actually complement rather than compete with each other, enabling greater understanding of the genome in relation to drug discovery. © 2015 Society for Laboratory Automation and Screening.

  7. Assaying gene function by growth competition experiment.

    PubMed

    Merritt, Joshua; Edwards, Jeremy S

    2004-07-01

    High-throughput screening and analysis is one of the emerging paradigms in biotechnology. In particular, high-throughput methods are essential in the field of functional genomics because of the vast amount of data generated in recent and ongoing genome sequencing efforts. In this report we discuss integrated functional analysis methodologies which incorporate both a growth competition component and a highly parallel assay used to quantify results of the growth competition. Several applications of the two most widely used technologies in the field, i.e., transposon mutagenesis and deletion strain library growth competition, and individual applications of several developing or less widely reported technologies are presented.

  8. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leung, Elo; Huang, Amy; Cadag, Eithon

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  9. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE PAGES

    Leung, Elo; Huang, Amy; Cadag, Eithon; ...

    2016-01-20

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  10. Mining high-throughput experimental data to link gene and function

    PubMed Central

    Blaby-Haas, Crysten E.; de Crécy-Lagard, Valérie

    2011-01-01

    Nearly 2200 genomes encoding some 6 million proteins have now been sequenced. Around 40% of these proteins are of unknown function even when function is loosely and minimally defined as “belonging to a superfamily”. In addition to in silico methods, the swelling stream of high-throughput experimental data can give valuable clues for linking these “unknowns” with precise biological roles. The goal is to develop integrative data-mining platforms that allow the scientific community at large to access and utilize this rich source of experimental knowledge. To this end, we review recent advances in generating whole-genome experimental datasets, where this data can be accessed, and how it can be used to drive prediction of gene function. PMID:21310501

  11. Target Discovery for Precision Medicine Using High-Throughput Genome Engineering.

    PubMed

    Guo, Xinyi; Chitale, Poonam; Sanjana, Neville E

    2017-01-01

    Over the past few years, programmable RNA-guided nucleases such as the CRISPR/Cas9 system have ushered in a new era of precision genome editing in diverse model systems and in human cells. Functional screens using large libraries of RNA guides can interrogate a large hypothesis space to pinpoint particular genes and genetic elements involved in fundamental biological processes and disease-relevant phenotypes. Here, we review recent high-throughput CRISPR screens (e.g. loss-of-function, gain-of-function, and targeting noncoding elements) and highlight their potential for uncovering novel therapeutic targets, such as those involved in cancer resistance to small molecular drugs and immunotherapies, tumor evolution, infectious disease, inborn genetic disorders, and other therapeutic challenges.

  12. Computational Prediction of the Global Functional Genomic Landscape: Applications, Methods and Challenges

    PubMed Central

    Zhou, Weiqiang; Sherwood, Ben; Ji, Hongkai

    2017-01-01

    Technological advances have led to an explosive growth of high-throughput functional genomic data. Exploiting the correlation among different data types, it is possible to predict one functional genomic data type from other data types. Prediction tools are valuable in understanding the relationship among different functional genomic signals. They also provide a cost-efficient solution to inferring the unknown functional genomic profiles when experimental data are unavailable due to resource or technological constraints. The predicted data may be used for generating hypotheses, prioritizing targets, interpreting disease variants, facilitating data integration, quality control, and many other purposes. This article reviews various applications of prediction methods in functional genomics, discusses analytical challenges, and highlights some common and effective strategies used to develop prediction methods for functional genomic data. PMID:28076869

  13. CTD² Dashboard: a searchable web interface to connect validated results from the Cancer Target Discovery and Development Network* | Office of Cancer Genomics

    Cancer.gov

    The Cancer Target Discovery and Development (CTD2) Network aims to use functional genomics to accelerate the translation of high-throughput and high-content genomic and small-molecule data towards use in precision oncology.

  14. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource

    USDA-ARS?s Scientific Manuscript database

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic mode...

  15. Application of resequencing to rice genomics, functional genomics and evolutionary analysis

    PubMed Central

    2014-01-01

    Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively utilized in evolutionary analysis, rice genomics and functional genomics studies. This technique is beneficial for both bridging the knowledge gap between genotype and phenotype and facilitating molecular breeding via gene design in rice. Here, we also discuss the limitation, application and future prospects of rice resequencing. PMID:25006357

  16. MIPS plant genome information resources.

    PubMed

    Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X

    2007-01-01

    The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.

  17. Mining high-throughput experimental data to link gene and function.

    PubMed

    Blaby-Haas, Crysten E; de Crécy-Lagard, Valérie

    2011-04-01

    Nearly 2200 genomes that encode around 6 million proteins have now been sequenced. Around 40% of these proteins are of unknown function, even when function is loosely and minimally defined as 'belonging to a superfamily'. In addition to in silico methods, the swelling stream of high-throughput experimental data can give valuable clues for linking these unknowns with precise biological roles. The goal is to develop integrative data-mining platforms that allow the scientific community at large to access and utilize this rich source of experimental knowledge. To this end, we review recent advances in generating whole-genome experimental datasets, where this data can be accessed, and how it can be used to drive prediction of gene function. Copyright © 2011 Elsevier Ltd. All rights reserved.

  18. A Review of the Accomplishments of the CTD² Network | Office of Cancer Genomics

    Cancer.gov

    The Office of Cancer Genomics (OCG) Cancer Target Discovery and Development or CTD2 initiative was established by the National Cancer Institute (NCI) to accelerate the “translation” of high-throughput, high-content genomic data to the bedside through functional genomics. The CTD2 initiative is a collaborative network of 13 different research teams, or Centers.

  19. Emory University: High-Throughput Protein-Protein Interaction Analysis for Hippo Pathway Profiling | Office of Cancer Genomics

    Cancer.gov

    The CTD2 Center at Emory University used high-throughput protein-protein interaction (PPI) mapping for Hippo signaling pathway profiling to rapidly unveil promising PPIs as potential therapeutic targets and advance functional understanding of signaling circuitry in cells. Read the abstract.

  20. Genome editing in the mushroom-forming basidiomycete Coprinopsis cinerea, optimized by a high-throughput transformation system.

    PubMed

    Sugano, Shigeo S; Suzuki, Hiroko; Shimokita, Eisuke; Chiba, Hirofumi; Noji, Sumihare; Osakabe, Yuriko; Osakabe, Keishi

    2017-04-28

    Mushroom-forming basidiomycetes produce a wide range of metabolites and have great value not only as food but also as an important global natural resource. Here, we demonstrate CRISPR/Cas9-based genome editing in the model species Coprinopsis cinerea. Using a high-throughput reporter assay with cryopreserved protoplasts, we identified a novel promoter, CcDED1 pro , with seven times stronger activity in this assay than the conventional promoter GPD2. To develop highly efficient genome editing using CRISPR/Cas9 in C. cinerea, we used the CcDED1 pro to express Cas9 and a U6-snRNA promoter from C. cinerea to express gRNA. Finally, CRISPR/Cas9-mediated GFP mutagenesis was performed in a stable GFP expression line. Individual genome-edited lines were isolated, and loss of GFP function was detected in hyphae and fruiting body primordia. This novel method of high-throughput CRISPR/Cas9-based genome editing using cryopreserved protoplasts should be a powerful tool in the study of edible mushrooms.

  1. 'PACLIMS': a component LIM system for high-throughput functional genomic analysis.

    PubMed

    Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A

    2005-04-12

    Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the approximately 11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors.

  2. 'PACLIMS': A component LIM system for high-throughput functional genomic analysis

    PubMed Central

    Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A

    2005-01-01

    Background Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the ~11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. Results The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Conclusion Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors. PMID:15826298

  3. Novel genetic tools for studying food-borne Salmonella.

    PubMed

    Andrews-Polymenis, Helene L; Santiviago, Carlos A; McClelland, Michael

    2009-04-01

    Nontyphoidal Salmonellae are highly prevalent food-borne pathogens. High-throughput sequencing of Salmonella genomes is expanding our knowledge of the evolution of serovars and epidemic isolates. Genome sequences have also allowed the creation of complete microarrays. Microarrays have improved the throughput of in vivo expression technology (IVET) used to uncover promoters active during infection. In another method, signature tagged mutagenesis (STM), pools of mutants are subjected to selection. Changes in the population are monitored on a microarray, revealing genes under selection. Complete genome sequences permit the construction of pools of targeted in-frame deletions that have improved STM by minimizing the number of clones and the polarity of each mutant. Together, genome sequences and the continuing development of new tools for functional genomics will drive a revolution in the understanding of Salmonellae in many different niches that are critical for food safety.

  4. Enabling systematic interrogation of protein-protein interactions in live cells with a versatile ultra-high-throughput biosensor platform | Office of Cancer Genomics

    Cancer.gov

    The vast datasets generated by next generation gene sequencing and expression profiling have transformed biological and translational research. However, technologies to produce large-scale functional genomics datasets, such as high-throughput detection of protein-protein interactions (PPIs), are still in early development. While a number of powerful technologies have been employed to detect PPIs, a singular PPI biosensor platform featured with both high sensitivity and robustness in a mammalian cell environment remains to be established.

  5. 04-ERD-052-Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loots, G G; Ovcharenko, I; Collette, N

    2007-02-26

    Generating the sequence of the human genome represents a colossal achievement for science and mankind. The technical use for the human genome project information holds great promise to cure disease, prevent bioterror threats, as well as to learn about human origins. Yet converting the sequence data into biological meaningful information has not been immediately obvious, and we are still in the preliminary stages of understanding how the genome is organized, what are the functional building blocks and how do these sequences mediate complex biological processes. The overarching goal of this program was to develop novel methods and high throughput strategiesmore » for determining the functions of ''anonymous'' human genes that are evolutionarily deeply conserved in other vertebrates. We coupled analytical tool development and computational predictions regarding gene function with novel high throughput experimental strategies and tested biological predictions in the laboratory. The tools required for comparative genomic data-mining are fundamentally the same whether they are applied to scientific studies of related microbes or the search for functions of novel human genes. For this reason the tools, conceptual framework and the coupled informatics-experimental biology paradigm we developed in this LDRD has many potential scientific applications relevant to LLNL multidisciplinary research in bio-defense, bioengineering, bionanosciences and microbial and environmental genomics.« less

  6. Functional mapping of yeast genomes by saturated transposition

    PubMed Central

    Michel, Agnès H; Hatakeyama, Riko; Kimmig, Philipp; Arter, Meret; Peter, Matthias; Matos, Joao; De Virgilio, Claudio; Kornmann, Benoît

    2017-01-01

    Yeast is a powerful model for systems genetics. We present a versatile, time- and labor-efficient method to functionally explore the Saccharomyces cerevisiae genome using saturated transposon mutagenesis coupled to high-throughput sequencing. SAturated Transposon Analysis in Yeast (SATAY) allows one-step mapping of all genetic loci in which transposons can insert without disrupting essential functions. SATAY is particularly suited to discover loci important for growth under various conditions. SATAY (1) reveals positive and negative genetic interactions in single and multiple mutant strains, (2) can identify drug targets, (3) detects not only essential genes, but also essential protein domains, (4) generates both null and other informative alleles. In a SATAY screen for rapamycin-resistant mutants, we identify Pib2 (PhosphoInositide-Binding 2) as a master regulator of TORC1. We describe two antagonistic TORC1-activating and -inhibiting activities located on opposite ends of Pib2. Thus, SATAY allows to easily explore the yeast genome at unprecedented resolution and throughput. DOI: http://dx.doi.org/10.7554/eLife.23570.001 PMID:28481201

  7. Modelling Human Regulatory Variation in Mouse: Finding the Function in Genome-Wide Association Studies and Whole-Genome Sequencing

    PubMed Central

    Schmouth, Jean-François; Bonaguro, Russell J.; Corso-Diaz, Ximena; Simpson, Elizabeth M.

    2012-01-01

    An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from this information in our efforts to comprehend human disease. Humanized mouse models (HuMMs), in which human genes are introduced into the mouse, suggest an approach to this problem. In the past, HuMMs have been used successfully to study human disease variants; e.g., the complex genetic condition arising from Down syndrome, common monogenic disorders such as Huntington disease and β-thalassemia, and cancer susceptibility genes such as BRCA1. In this commentary, we highlight a novel method for high-throughput single-copy site-specific generation of HuMMs entitled High-throughput Human Genes on the X Chromosome (HuGX). This method can be applied to most human genes for which a bacterial artificial chromosome (BAC) construct can be derived and a mouse-null allele exists. This strategy comprises (1) the use of recombineering technology to create a human variant–harbouring BAC, (2) knock-in of this BAC into the mouse genome using Hprt docking technology, and (3) allele comparison by interspecies complementation. We demonstrate the throughput of the HuGX method by generating a series of seven different alleles for the human NR2E1 gene at Hprt. In future challenges, we consider the current limitations of experimental approaches and call for a concerted effort by the genetics community, for both human and mouse, to solve the challenge of the functional analysis of human regulatory variation. PMID:22396661

  8. Challenges in NMR-based structural genomics

    NASA Astrophysics Data System (ADS)

    Sue, Shih-Che; Chang, Chi-Fon; Huang, Yao-Te; Chou, Ching-Yu; Huang, Tai-huang

    2005-05-01

    Understanding the functions of the vast number of proteins encoded in many genomes that have been completely sequenced recently is the main challenge for biologists in the post-genomics era. Since the function of a protein is determined by its exact three-dimensional structure it is paramount to determine the 3D structures of all proteins. This need has driven structural biologists to undertake the structural genomics project aimed at determining the structures of all known proteins. Several centers for structural genomics studies have been established throughout the world. Nuclear magnetic resonance (NMR) spectroscopy has played a major role in determining protein structures in atomic details and in a physiologically relevant solution state. Since the number of new genes being discovered daily far exceeds the number of structures determined by both NMR and X-ray crystallography, a high-throughput method for speeding up the process of protein structure determination is essential for the success of the structural genomics effort. In this article we will describe NMR methods currently being employed for protein structure determination. We will also describe methods under development which may drastically increase the throughput, as well as point out areas where opportunities exist for biophysicists to make significant contribution in this important field.

  9. Human genetics and genomics a decade after the release of the draft sequence of the human genome.

    PubMed

    Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng

    2011-10-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.

  10. Human genetics and genomics a decade after the release of the draft sequence of the human genome

    PubMed Central

    2011-01-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade. PMID:22155605

  11. Epigenetics and Epigenomics of Plants.

    PubMed

    Yadav, Chandra Bhan; Pandey, Garima; Muthamilarasan, Mehanathan; Prasad, Manoj

    2018-01-23

    The genetic material DNA in association with histone proteins forms the complex structure called chromatin, which is prone to undergo modification through certain epigenetic mechanisms including cytosine DNA methylation, histone modifications, and small RNA-mediated methylation. Alterations in chromatin structure lead to inaccessibility of genomic DNA to various regulatory proteins such as transcription factors, which eventually modulates gene expression. Advancements in high-throughput sequencing technologies have provided the opportunity to study the epigenetic mechanisms at genome-wide levels. Epigenomic studies using high-throughput technologies will widen the understanding of mechanisms as well as functions of regulatory pathways in plant genomes, which will further help in manipulating these pathways using genetic and biochemical approaches. This technology could be a potential research tool for displaying the systematic associations of genetic and epigenetic variations, especially in terms of cytosine methylation onto the genomic region in a specific cell or tissue. A comprehensive study of plant populations to correlate genotype to epigenotype and to phenotype, and also the study of methyl quantitative trait loci (QTL) or epiGWAS, is possible by using high-throughput sequencing methods, which will further accelerate molecular breeding programs for crop improvement. Graphical Abstract.

  12. Automatic Segmentation of High-Throughput RNAi Fluorescent Cellular Images

    PubMed Central

    Yan, Pingkum; Zhou, Xiaobo; Shah, Mubarak; Wong, Stephen T. C.

    2010-01-01

    High-throughput genome-wide RNA interference (RNAi) screening is emerging as an essential tool to assist biologists in understanding complex cellular processes. The large number of images produced in each study make manual analysis intractable; hence, automatic cellular image analysis becomes an urgent need, where segmentation is the first and one of the most important steps. In this paper, a fully automatic method for segmentation of cells from genome-wide RNAi screening images is proposed. Nuclei are first extracted from the DNA channel by using a modified watershed algorithm. Cells are then extracted by modeling the interaction between them as well as combining both gradient and region information in the Actin and Rac channels. A new energy functional is formulated based on a novel interaction model for segmenting tightly clustered cells with significant intensity variance and specific phenotypes. The energy functional is minimized by using a multiphase level set method, which leads to a highly effective cell segmentation method. Promising experimental results demonstrate that automatic segmentation of high-throughput genome-wide multichannel screening can be achieved by using the proposed method, which may also be extended to other multichannel image segmentation problems. PMID:18270043

  13. [Current applications of high-throughput DNA sequencing technology in antibody drug research].

    PubMed

    Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong

    2012-03-01

    Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.

  14. High-Throughput Cloning and Expression Library Creation for Functional Proteomics

    PubMed Central

    Festa, Fernanda; Steel, Jason; Bian, Xiaofang; Labaer, Joshua

    2013-01-01

    The study of protein function usually requires the use of a cloned version of the gene for protein expression and functional assays. This strategy is particular important when the information available regarding function is limited. The functional characterization of the thousands of newly identified proteins revealed by genomics requires faster methods than traditional single gene experiments, creating the need for fast, flexible and reliable cloning systems. These collections of open reading frame (ORF) clones can be coupled with high-throughput proteomics platforms, such as protein microarrays and cell-based assays, to answer biological questions. In this tutorial we provide the background for DNA cloning, discuss the major high-throughput cloning systems (Gateway® Technology, Flexi® Vector Systems, and Creator™ DNA Cloning System) and compare them side-by-side. We also report an example of high-throughput cloning study and its application in functional proteomics. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP12). Details can be found at http://www.proteomicstutorials.org. PMID:23457047

  15. High-throughput cloning and expression library creation for functional proteomics.

    PubMed

    Festa, Fernanda; Steel, Jason; Bian, Xiaofang; Labaer, Joshua

    2013-05-01

    The study of protein function usually requires the use of a cloned version of the gene for protein expression and functional assays. This strategy is particularly important when the information available regarding function is limited. The functional characterization of the thousands of newly identified proteins revealed by genomics requires faster methods than traditional single-gene experiments, creating the need for fast, flexible, and reliable cloning systems. These collections of ORF clones can be coupled with high-throughput proteomics platforms, such as protein microarrays and cell-based assays, to answer biological questions. In this tutorial, we provide the background for DNA cloning, discuss the major high-throughput cloning systems (Gateway® Technology, Flexi® Vector Systems, and Creator(TM) DNA Cloning System) and compare them side-by-side. We also report an example of high-throughput cloning study and its application in functional proteomics. This tutorial is part of the International Proteomics Tutorial Programme (IPTP12). © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. High-throughput metagenomic technologies for complex microbial community analysis: open and closed formats.

    PubMed

    Zhou, Jizhong; He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G; Alvarez-Cohen, Lisa

    2015-01-27

    Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied "open-format" and "closed-format" detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. Copyright © 2015 Zhou et al.

  17. High-Throughput Metagenomic Technologies for Complex Microbial Community Analysis: Open and Closed Formats

    PubMed Central

    He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G.; Alvarez-Cohen, Lisa

    2015-01-01

    ABSTRACT   Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. PMID:25626903

  18. High-throughput metagenomic technologies for complex microbial community analysis. Open and closed formats

    DOE PAGES

    Zhou, Jizhong; He, Zhili; Yang, Yunfeng; ...

    2015-01-27

    Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications andmore » focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions.« less

  19. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

    PubMed

    Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

    2015-05-27

    Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.

  20. Functional annotation of rare gene aberration drivers of pancreatic cancer | Office of Cancer Genomics

    Cancer.gov

    As we enter the era of precision medicine, characterization of cancer genomes will directly influence therapeutic decisions in the clinic. Here we describe a platform enabling functionalization of rare gene mutations through their high-throughput construction, molecular barcoding and delivery to cancer models for in vivo tumour driver screens. We apply these technologies to identify oncogenic drivers of pancreatic ductal adenocarcinoma (PDAC).

  1. FSPP: A Tool for Genome-Wide Prediction of smORF-Encoded Peptides and Their Functions

    PubMed Central

    Li, Hui; Xiao, Li; Zhang, Lili; Wu, Jiarui; Wei, Bin; Sun, Ninghui; Zhao, Yi

    2018-01-01

    smORFs are small open reading frames of less than 100 codons. Recent low throughput experiments showed a lot of smORF-encoded peptides (SEPs) played crucial rule in processes such as regulation of transcription or translation, transportation through membranes and the antimicrobial activity. In order to gather more functional SEPs, it is necessary to have access to genome-wide prediction tools to give profound directions for low throughput experiments. In this study, we put forward a functional smORF-encoded peptides predictor (FSPP) which tended to predict authentic SEPs and their functions in a high throughput method. FSPP used the overlap of detected SEPs from Ribo-seq and mass spectrometry as target objects. With the expression data on transcription and translation levels, FSPP built two co-expression networks. Combing co-location relations, FSPP constructed a compound network and then annotated SEPs with functions of adjacent nodes. Tested on 38 sequenced samples of 5 human cell lines, FSPP successfully predicted 856 out of 960 annotated proteins. Interestingly, FSPP also highlighted 568 functional SEPs from these samples. After comparison, the roles predicted by FSPP were consistent with known functions. These results suggest that FSPP is a reliable tool for the identification of functional small peptides. FSPP source code can be acquired at https://www.bioinfo.org/FSPP. PMID:29675032

  2. UCLA's Molecular Screening Shared Resource: enhancing small molecule discovery with functional genomics and new technology.

    PubMed

    Damoiseaux, Robert

    2014-05-01

    The Molecular Screening Shared Resource (MSSR) offers a comprehensive range of leading-edge high throughput screening (HTS) services including drug discovery, chemical and functional genomics, and novel methods for nano and environmental toxicology. The MSSR is an open access environment with investigators from UCLA as well as from the entire globe. Industrial clients are equally welcome as are non-profit entities. The MSSR is a fee-for-service entity and does not retain intellectual property. In conjunction with the Center for Environmental Implications of Nanotechnology, the MSSR is unique in its dedicated and ongoing efforts towards high throughput toxicity testing of nanomaterials. In addition, the MSSR engages in technology development eliminating bottlenecks from the HTS workflow and enabling novel assays and readouts currently not available.

  3. Metagenomic analysis and functional characterization of the biogas microbiome using high throughput shotgun sequencing and a novel binning strategy.

    PubMed

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis G; De Francisci, Davide; Valle, Giorgio; Angelidaki, Irini

    2016-01-01

    Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which different members have distinct roles in the establishment of a collective organization. Deciphering the complex microbial community engaged in this process is interesting both for unraveling the network of bacterial interactions and for applicability potential to the derived knowledge. In this study, we dissect the bioma involved in anaerobic digestion by means of high throughput Illumina sequencing (~51 gigabases of sequence data), disclosing nearly one million genes and extracting 106 microbial genomes by a novel strategy combining two binning processes. Microbial phylogeny and putative taxonomy performed using >400 proteins revealed that the biogas community is a trove of new species. A new approach based on functional properties as per network representation was developed to assign roles to the microbial species. The organization of the anaerobic digestion microbiome is resembled by a funnel concept, in which the microbial consortium presents a progressive functional specialization while reaching the final step of the process (i.e., methanogenesis). Key microbial genomes encoding enzymes involved in specific metabolic pathways, such as carbohydrates utilization, fatty acids degradation, amino acids fermentation, and syntrophic acetate oxidation, were identified. Additionally, the analysis identified a new uncultured archaeon that was putatively related to Methanomassiliicoccales but surprisingly having a methylotrophic methanogenic pathway. This study is a pioneer research on the phylogenetic and functional characterization of the microbial community populating biogas reactors. By applying for the first time high-throughput sequencing and a novel binning strategy, the identified genes were anchored to single genomes providing a clear understanding of their metabolic pathways and highlighting their involvement in anaerobic digestion. The overall research established a reference catalog of biogas microbial genomes that will greatly simplify future genomic studies.

  4. Systems biology definition of the core proteome of metabolism and expression is consistent with high-throughput data.

    PubMed

    Yang, Laurence; Tan, Justin; O'Brien, Edward J; Monk, Jonathan M; Kim, Donghyuk; Li, Howard J; Charusanti, Pep; Ebrahim, Ali; Lloyd, Colton J; Yurkovich, James T; Du, Bin; Dräger, Andreas; Thomas, Alex; Sun, Yuekai; Saunders, Michael A; Palsson, Bernhard O

    2015-08-25

    Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed by using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the systems biology core proteome is significantly enriched in nondifferentially expressed genes and depleted in differentially expressed genes. Compared with the noncore, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation) and exhibit significantly more complex transcriptional and posttranscriptional regulatory features (40% more transcription start sites per gene, 22% longer 5'UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, validated by using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.

  5. Information management systems for pharmacogenomics.

    PubMed

    Thallinger, Gerhard G; Trajanoski, Slave; Stocker, Gernot; Trajanoski, Zlatko

    2002-09-01

    The value of high-throughput genomic research is dramatically enhanced by association with key patient data. These data are generally available but of disparate quality and not typically directly associated. A system that could bring these disparate data sources into a common resource connected with functional genomic data would be tremendously advantageous. However, the integration of clinical and accurate interpretation of the generated functional genomic data requires the development of information management systems capable of effectively capturing the data as well as tools to make that data accessible to the laboratory scientist or to the clinician. In this review these challenges and current information technology solutions associated with the management, storage and analysis of high-throughput data are highlighted. It is suggested that the development of a pharmacogenomic data management system which integrates public and proprietary databases, clinical datasets, and data mining tools embedded in a high-performance computing environment should include the following components: parallel processing systems, storage technologies, network technologies, databases and database management systems (DBMS), and application services.

  6. Identification and role of regulatory non-coding RNAs in Listeria monocytogenes.

    PubMed

    Izar, Benjamin; Mraheil, Mobarak Abu; Hain, Torsten

    2011-01-01

    Bacterial regulatory non-coding RNAs control numerous mRNA targets that direct a plethora of biological processes, such as the adaption to environmental changes, growth and virulence. Recently developed high-throughput techniques, such as genomic tiling arrays and RNA-Seq have allowed investigating prokaryotic cis- and trans-acting regulatory RNAs, including sRNAs, asRNAs, untranslated regions (UTR) and riboswitches. As a result, we obtained a more comprehensive view on the complexity and plasticity of the prokaryotic genome biology. Listeria monocytogenes was utilized as a model system for intracellular pathogenic bacteria in several studies, which revealed the presence of about 180 regulatory RNAs in the listerial genome. A regulatory role of non-coding RNAs in survival, virulence and adaptation mechanisms of L. monocytogenes was confirmed in subsequent experiments, thus, providing insight into a multifaceted modulatory function of RNA/mRNA interference. In this review, we discuss the identification of regulatory RNAs by high-throughput techniques and in their functional role in L. monocytogenes.

  7. Variant-aware saturating mutagenesis using multiple Cas9 nucleases identifies regulatory elements at trait-associated loci.

    PubMed

    Canver, Matthew C; Lessard, Samuel; Pinello, Luca; Wu, Yuxuan; Ilboudo, Yann; Stern, Emily N; Needleman, Austen J; Galactéros, Frédéric; Brugnara, Carlo; Kutlar, Abdullah; McKenzie, Colin; Reid, Marvin; Chen, Diane D; Das, Partha Pratim; A Cole, Mitchel; Zeng, Jing; Kurita, Ryo; Nakamura, Yukio; Yuan, Guo-Cheng; Lettre, Guillaume; Bauer, Daniel E; Orkin, Stuart H

    2017-04-01

    Cas9-mediated, high-throughput, saturating in situ mutagenesis permits fine-mapping of function across genomic segments. Disease- and trait-associated variants identified in genome-wide association studies largely cluster at regulatory loci. Here we demonstrate the use of multiple designer nucleases and variant-aware library design to interrogate trait-associated regulatory DNA at high resolution. We developed a computational tool for the creation of saturating-mutagenesis libraries with single or multiple nucleases with incorporation of variants. We applied this methodology to the HBS1L-MYB intergenic region, which is associated with red-blood-cell traits, including fetal hemoglobin levels. This approach identified putative regulatory elements that control MYB expression. Analysis of genomic copy number highlighted potential false-positive regions, thus emphasizing the importance of off-target analysis in the design of saturating-mutagenesis experiments. Together, these data establish a widely applicable high-throughput and high-resolution methodology to identify minimal functional sequences within large disease- and trait-associated regions.

  8. Chemical genomics in plant biology.

    PubMed

    Sadhukhan, Ayan; Sahoo, Lingaraj; Panda, Sanjib Kumar

    2012-06-01

    Chemical genomics is a newly emerged and rapidly progressing field in biology, where small chemical molecules bind specifically and reversibly to protein(s) to modulate their function(s), leading to the delineation and subsequent unravelling of biological processes. This approach overcomes problems like lethality and redundancy of classical genetics. Armed with the powerful techniques of combinatorial synthesis, high-throughput screening and target discovery chemical genomics expands its scope to diverse areas in biology. The well-established genetic system of Arabidopsis model allows chemical genomics to enter into the realm of plant biology exploring signaling pathways of growth regulators, endomembrane signaling cascades, plant defense mechanisms and many more events.

  9. University of Texas MD Anderson Cancer Center: High-Throughput Screening Identifying Driving Mutations in Endometrial Cancer | Office of Cancer Genomics

    Cancer.gov

    Recent advances in next-generation sequencing technology have enabled the unprecedented characterization of a full spectrum of somatic alterations in cancer genomes. Given the large numbers of somatic mutations typically detected by this approach, a key challenge in the downstream analysis is to distinguish “drivers” that functionally contribute to tumorigenesis from “passengers” that occur as the consequence of genomic instability.

  10. Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens.

    PubMed

    Morgens, David W; Wainberg, Michael; Boyle, Evan A; Ursu, Oana; Araya, Carlos L; Tsui, C Kimberly; Haney, Michael S; Hess, Gaelen T; Han, Kyuho; Jeng, Edwin E; Li, Amy; Snyder, Michael P; Greenleaf, William J; Kundaje, Anshul; Bassik, Michael C

    2017-05-05

    CRISPR-Cas9 screens are powerful tools for high-throughput interrogation of genome function, but can be confounded by nuclease-induced toxicity at both on- and off-target sites, likely due to DNA damage. Here, to test potential solutions to this issue, we design and analyse a CRISPR-Cas9 library with 10 variable-length guides per gene and thousands of negative controls targeting non-functional, non-genic regions (termed safe-targeting guides), in addition to non-targeting controls. We find this library has excellent performance in identifying genes affecting growth and sensitivity to the ricin toxin. The safe-targeting guides allow for proper control of toxicity from on-target DNA damage. Using this toxicity as a proxy to measure off-target cutting, we demonstrate with tens of thousands of guides both the nucleotide position-dependent sensitivity to single mismatches and the reduction of off-target cutting using truncated guides. Our results demonstrate a simple strategy for high-throughput evaluation of target specificity and nuclease toxicity in Cas9 screens.

  11. Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens

    PubMed Central

    Morgens, David W.; Wainberg, Michael; Boyle, Evan A.; Ursu, Oana; Araya, Carlos L.; Tsui, C. Kimberly; Haney, Michael S.; Hess, Gaelen T.; Han, Kyuho; Jeng, Edwin E.; Li, Amy; Snyder, Michael P.; Greenleaf, William J.; Kundaje, Anshul; Bassik, Michael C.

    2017-01-01

    CRISPR-Cas9 screens are powerful tools for high-throughput interrogation of genome function, but can be confounded by nuclease-induced toxicity at both on- and off-target sites, likely due to DNA damage. Here, to test potential solutions to this issue, we design and analyse a CRISPR-Cas9 library with 10 variable-length guides per gene and thousands of negative controls targeting non-functional, non-genic regions (termed safe-targeting guides), in addition to non-targeting controls. We find this library has excellent performance in identifying genes affecting growth and sensitivity to the ricin toxin. The safe-targeting guides allow for proper control of toxicity from on-target DNA damage. Using this toxicity as a proxy to measure off-target cutting, we demonstrate with tens of thousands of guides both the nucleotide position-dependent sensitivity to single mismatches and the reduction of off-target cutting using truncated guides. Our results demonstrate a simple strategy for high-throughput evaluation of target specificity and nuclease toxicity in Cas9 screens. PMID:28474669

  12. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    PubMed

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  13. Advancements in zebrafish applications for 21st century toxicology.

    PubMed

    Garcia, Gloria R; Noyes, Pamela D; Tanguay, Robert L

    2016-05-01

    The zebrafish model is the only available high-throughput vertebrate assessment system, and it is uniquely suited for studies of in vivo cell biology. A sequenced and annotated genome has revealed a large degree of evolutionary conservation in comparison to the human genome. Due to our shared evolutionary history, the anatomical and physiological features of fish are highly homologous to humans, which facilitates studies relevant to human health. In addition, zebrafish provide a very unique vertebrate data stream that allows researchers to anchor hypotheses at the biochemical, genetic, and cellular levels to observations at the structural, functional, and behavioral level in a high-throughput format. In this review, we will draw heavily from toxicological studies to highlight advances in zebrafish high-throughput systems. Breakthroughs in transgenic/reporter lines and methods for genetic manipulation, such as the CRISPR-Cas9 system, will be comprised of reports across diverse disciplines. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Advancements in zebrafish applications for 21st century toxicology

    PubMed Central

    Garcia, Gloria R.; Noyes, Pamela D.; Tanguay, Robert L.

    2016-01-01

    The zebrafish model is the only available high-throughput vertebrate assessment system, and it is uniquely suited for studies of in vivo cell biology. A sequenced and annotated genome has revealed a large degree of evolutionary conservation in comparison to the human genome. Due to our shared evolutionary history, the anatomical and physiological features of fish are highly homologous to humans, which facilitates studies relevant to human health. In addition, zebrafish provide a very unique vertebrate data stream that allows researchers to anchor hypotheses at the biochemical, genetic, and cellular levels to observations at the structural, functional, and behavioral level in a high-throughput format. In this review, we will draw heavily from toxicological studies to highlight advances in zebrafish high-throughput systems. Breakthroughs in transgenic/reporter lines and methods for genetic manipulation, such as the CRISPR-Cas9 system, will be comprised of reports across diverse disciplines. PMID:27016469

  15. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    PubMed Central

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including streptothricins, borrelidin, two novel lipopeptides, and one unknown antibiotic from Streptomyces rochei Sal35. The transfer, expression, and screening of the library were all performed in a high-throughput way, so that this approach is scalable and adaptable to industrial automation for next-generation antibiotic discovery. PMID:27451447

  16. NCBI GEO: archive for functional genomics data sets--10 years on.

    PubMed

    Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Holko, Michelle; Ayanbule, Oluwabukunmi; Yefanov, Andrey; Soboleva, Alexandra

    2011-01-01

    A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

  17. University of Texas MD Anderson Cancer Center (UT-MDACC): High-Throughput Screening Identifying Driving Mutations in Endometrial Cancer | Office of Cancer Genomics

    Cancer.gov

    Recent advances in next-generation sequencing technology have enabled the unprecedented characterization of a full spectrum of somatic alterations in cancer genomes. Given the large numbers of somatic mutations typically detected by this approach, a key challenge in the downstream analysis is to distinguish “drivers” that functionally contribute to tumorigenesis from “passengers” that occur as the consequence of genomic instability.

  18. Genetic resources offer efficient tools for rice functional genomics research.

    PubMed

    Lo, Shuen-Fang; Fan, Ming-Jen; Hsing, Yue-Ie; Chen, Liang-Jwu; Chen, Shu; Wen, Ien-Chie; Liu, Yi-Lun; Chen, Ku-Ting; Jiang, Mirng-Jier; Lin, Ming-Kuang; Rao, Meng-Yen; Yu, Lin-Chih; Ho, Tuan-Hua David; Yu, Su-May

    2016-05-01

    Rice is an important crop and major model plant for monocot functional genomics studies. With the establishment of various genetic resources for rice genomics, the next challenge is to systematically assign functions to predicted genes in the rice genome. Compared with the robustness of genome sequencing and bioinformatics techniques, progress in understanding the function of rice genes has lagged, hampering the utilization of rice genes for cereal crop improvement. The use of transfer DNA (T-DNA) insertional mutagenesis offers the advantage of uniform distribution throughout the rice genome, but preferentially in gene-rich regions, resulting in direct gene knockout or activation of genes within 20-30 kb up- and downstream of the T-DNA insertion site and high gene tagging efficiency. Here, we summarize the recent progress in functional genomics using the T-DNA-tagged rice mutant population. We also discuss important features of T-DNA activation- and knockout-tagging and promoter-trapping of the rice genome in relation to mutant and candidate gene characterizations and how to more efficiently utilize rice mutant populations and datasets for high-throughput functional genomics and phenomics studies by forward and reverse genetics approaches. These studies may facilitate the translation of rice functional genomics research to improvements of rice and other cereal crops. © 2015 John Wiley & Sons Ltd.

  19. OncoBinder facilitates interpretation of proteomic interaction data by capturing coactivation pairs in cancer.

    PubMed

    Van Coillie, Samya; Liang, Lunxi; Zhang, Yao; Wang, Huanbin; Fang, Jing-Yuan; Xu, Jie

    2016-04-05

    High-throughput methods such as co-immunoprecipitationmass spectrometry (coIP-MS) and yeast 2 hybridization (Y2H) have suggested a broad range of unannotated protein-protein interactions (PPIs), and interpretation of these PPIs remains a challenging task. The advancements in cancer genomic researches allow for the inference of "coactivation pairs" in cancer, which may facilitate the identification of PPIs involved in cancer. Here we present OncoBinder as a tool for the assessment of proteomic interaction data based on the functional synergy of oncoproteins in cancer. This decision tree-based method combines gene mutation, copy number and mRNA expression information to infer the functional status of protein-coding genes. We applied OncoBinder to evaluate the potential binders of EGFR and ERK2 proteins based on the gastric cancer dataset of The Cancer Genome Atlas (TCGA). As a result, OncoBinder identified high confidence interactions (annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) or validated by low-throughput assays) more efficiently than co-expression based method. Taken together, our results suggest that evaluation of gene functional synergy in cancer may facilitate the interpretation of proteomic interaction data. The OncoBinder toolbox for Matlab is freely accessible online.

  20. Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing

    PubMed Central

    Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi

    2016-01-01

    Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039

  1. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells.

    PubMed

    Zhou, Yuexin; Zhu, Shiyou; Cai, Changzu; Yuan, Pengfei; Li, Chunmei; Huang, Yanyi; Wei, Wensheng

    2014-05-22

    Targeted genome editing technologies are powerful tools for studying biology and disease, and have a broad range of research applications. In contrast to the rapid development of toolkits to manipulate individual genes, large-scale screening methods based on the complete loss of gene expression are only now beginning to be developed. Here we report the development of a focused CRISPR/Cas-based (clustered regularly interspaced short palindromic repeats/CRISPR-associated) lentiviral library in human cells and a method of gene identification based on functional screening and high-throughput sequencing analysis. Using knockout library screens, we successfully identified the host genes essential for the intoxication of cells by anthrax and diphtheria toxins, which were confirmed by functional validation. The broad application of this powerful genetic screening strategy will not only facilitate the rapid identification of genes important for bacterial toxicity but will also enable the discovery of genes that participate in other biological processes.

  2. Read count-based method for high-throughput allelic genotyping of transposable elements and structural variants.

    PubMed

    Kuhn, Alexandre; Ong, Yao Min; Quake, Stephen R; Burkholder, William F

    2015-07-08

    Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed. We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate. This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

  3. Reverse Genetics and High Throughput Sequencing Methodologies for Plant Functional Genomics

    PubMed Central

    Ben-Amar, Anis; Daldoul, Samia; Reustle, Götz M.; Krczal, Gabriele; Mliki, Ahmed

    2016-01-01

    In the post-genomic era, increasingly sophisticated genetic tools are being developed with the long-term goal of understanding how the coordinated activity of genes gives rise to a complex organism. With the advent of the next generation sequencing associated with effective computational approaches, wide variety of plant species have been fully sequenced giving a wealth of data sequence information on structure and organization of plant genomes. Since thousands of gene sequences are already known, recently developed functional genomics approaches provide powerful tools to analyze plant gene functions through various gene manipulation technologies. Integration of different omics platforms along with gene annotation and computational analysis may elucidate a complete view in a system biology level. Extensive investigations on reverse genetics methodologies were deployed for assigning biological function to a specific gene or gene product. We provide here an updated overview of these high throughout strategies highlighting recent advances in the knowledge of functional genomics in plants. PMID:28217003

  4. Oncogenomics and the development of new cancer therapies.

    PubMed

    Strausberg, Robert L; Simpson, Andrew J G; Old, Lloyd J; Riggins, Gregory J

    2004-05-27

    Scientists have sequenced the human genome and identified most of its genes. Now it is time to use these genomic data, and the high-throughput technology developed to generate them, to tackle major health problems such as cancer. To accelerate our understanding of this disease and to produce targeted therapies, further basic mutational and functional genomic information is required. A systematic and coordinated approach, with the results freely available, should speed up progress. This will best be accomplished through an international academic and pharmaceutical oncogenomics initiative.

  5. Human Genomic Loci Important in Common Infectious Diseases: Role of High-Throughput Sequencing and Genome-Wide Association Studies

    PubMed Central

    Sserwadda, Ivan; Amujal, Marion; Namatovu, Norah

    2018-01-01

    HIV/AIDS, tuberculosis (TB), and malaria are 3 major global public health threats that undermine development in many resource-poor settings. Recently, the notion that positive selection during epidemics or longer periods of exposure to common infectious diseases may have had a major effect in modifying the constitution of the human genome is being interrogated at a large scale in many populations around the world. This positive selection from infectious diseases increases power to detect associations in genome-wide association studies (GWASs). High-throughput sequencing (HTS) has transformed both the management of infectious diseases and continues to enable large-scale functional characterization of host resistance/susceptibility alleles and loci; a paradigm shift from single candidate gene studies. Application of genome sequencing technologies and genomics has enabled us to interrogate the host-pathogen interface for improving human health. Human populations are constantly locked in evolutionary arms races with pathogens; therefore, identification of common infectious disease-associated genomic variants/markers is important in therapeutic, vaccine development, and screening susceptible individuals in a population. This review describes a range of host-pathogen genomic loci that have been associated with disease susceptibility and resistant patterns in the era of HTS. We further highlight potential opportunities for these genetic markers. PMID:29755620

  6. NCBI GEO: archive for functional genomics data sets—10 years on

    PubMed Central

    Barrett, Tanya; Troup, Dennis B.; Wilhite, Stephen E.; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F.; Tomashevsky, Maxim; Marshall, Kimberly A.; Phillippy, Katherine H.; Sherman, Patti M.; Muertter, Rolf N.; Holko, Michelle; Ayanbule, Oluwabukunmi; Yefanov, Andrey; Soboleva, Alexandra

    2011-01-01

    A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20 000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/. PMID:21097893

  7. Web-based visual analysis for high-throughput genomics

    PubMed Central

    2013-01-01

    Background Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. Results We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. Conclusions Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments. PMID:23758618

  8. Purdue ionomics information management system. An integrated functional genomics platform.

    PubMed

    Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S; Salt, David E

    2007-02-01

    The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.

  9. A High Throughput Barley Stripe Mosaic Virus Vector for Virus Induced Gene Silencing in Monocots and Dicots

    PubMed Central

    Yan, Lijie; Jackson, Andrew O.; Liu, Zhiyong; Han, Chenggui; Yu, Jialin; Li, Dawei

    2011-01-01

    Barley stripe mosaic virus (BSMV) is a single-stranded RNA virus with three genome components designated alpha, beta, and gamma. BSMV vectors have previously been shown to be efficient virus induced gene silencing (VIGS) vehicles in barley and wheat and have provided important information about host genes functioning during pathogenesis as well as various aspects of genes functioning in development. To permit more effective use of BSMV VIGS for functional genomics experiments, we have developed an Agrobacterium delivery system for BSMV and have coupled this with a ligation independent cloning (LIC) strategy to mediate efficient cloning of host genes. Infiltrated Nicotiana benthamiana leaves provided excellent sources of virus for secondary BSMV infections and VIGS in cereals. The Agro/LIC BSMV VIGS vectors were able to function in high efficiency down regulation of phytoene desaturase (PDS), magnesium chelatase subunit H (ChlH), and plastid transketolase (TK) gene silencing in N. benthamiana and in the monocots, wheat, barley, and the model grass, Brachypodium distachyon. Suppression of an Arabidopsis orthologue cloned from wheat (TaPMR5) also interfered with wheat powdery mildew (Blumeria graminis f. sp. tritici) infections in a manner similar to that of the A. thaliana PMR5 loss-of-function allele. These results imply that the PMR5 gene has maintained similar functions across monocot and dicot families. Our BSMV VIGS system provides substantial advantages in expense, cloning efficiency, ease of manipulation and ability to apply VIGS for high throughput genomics studies. PMID:22031834

  10. Competitive Genomic Screens of Barcoded Yeast Libraries

    PubMed Central

    Urbanus, Malene; Proctor, Michael; Heisler, Lawrence E.; Giaever, Guri; Nislow, Corey

    2011-01-01

    By virtue of advances in next generation sequencing technologies, we have access to new genome sequences almost daily. The tempo of these advances is accelerating, promising greater depth and breadth. In light of these extraordinary advances, the need for fast, parallel methods to define gene function becomes ever more important. Collections of genome-wide deletion mutants in yeasts and E. coli have served as workhorses for functional characterization of gene function, but this approach is not scalable, current gene-deletion approaches require each of the thousands of genes that comprise a genome to be deleted and verified. Only after this work is complete can we pursue high-throughput phenotyping. Over the past decade, our laboratory has refined a portfolio of competitive, miniaturized, high-throughput genome-wide assays that can be performed in parallel. This parallelization is possible because of the inclusion of DNA 'tags', or 'barcodes,' into each mutant, with the barcode serving as a proxy for the mutation and one can measure the barcode abundance to assess mutant fitness. In this study, we seek to fill the gap between DNA sequence and barcoded mutant collections. To accomplish this we introduce a combined transposon disruption-barcoding approach that opens up parallel barcode assays to newly sequenced, but poorly characterized microbes. To illustrate this approach we present a new Candida albicans barcoded disruption collection and describe how both microarray-based and next generation sequencing-based platforms can be used to collect 10,000 - 1,000,000 gene-gene and drug-gene interactions in a single experiment. PMID:21860376

  11. Research progress of plant population genomics based on high-throughput sequencing.

    PubMed

    Wang, Yun-sheng

    2016-08-01

    Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.

  12. Protein Structures Revealed at Record Pace

    ScienceCinema

    Hura, Greg

    2017-12-11

    The structure of a protein in days -- not months or years -- ushers in a new era in genomics research. Berkeley Lab scientists have developed a high-throughput protein pipeline that could expedite the development of biofuels and elucidate how proteins carry out lifes vital functions.

  13. Protein Structures Revealed at Record Pace

    ScienceCinema

    Greg Hura

    2017-12-09

    The structure of a protein in days -- not months or years -- ushers in a new era in genomics research. Berkeley Lab scientists have developed a high-throughput protein pipeline that could expedite the development of biofuels and elucidate how proteins carry out lifes vital functions.

  14. Customizing the Connectivity Map Approach for Functional Evaluation in Toxicogenomics Studies (SOT)

    EPA Science Inventory

    Evaluating effects on the transcriptome can provide insight on putative chemical-specific mechanisms of action (MOAs). With whole genome transcriptomics technologies becoming more amenable to high-throughput screening, libraries of chemicals can be evaluated in vitro to produce l...

  15. Development of a high-throughput SNP resource to advance genomic, genetic and breeding research in carrot (Daucus carota L.)

    USDA-ARS?s Scientific Manuscript database

    The rapid advancement in high-throughput SNP genotyping technologies along with next generation sequencing (NGS) platforms has decreased the cost, improved the quality of large-scale genome surveys, and allowed specialty crops with limited genomic resources such as carrot (Daucus carota) to access t...

  16. The Functional Genomics Network in the evolution of biological text mining over the past decade.

    PubMed

    Blaschke, Christian; Valencia, Alfonso

    2013-03-25

    Different programs of The European Science Foundation (ESF) have contributed significantly to connect researchers in Europe and beyond through several initiatives. This support was particularly relevant for the development of the areas related with extracting information from papers (text-mining) because it supported the field in its early phases long before it was recognized by the community. We review the historical development of text mining research and how it was introduced in bioinformatics. Specific applications in (functional) genomics are described like it's integration in genome annotation pipelines and the support to the analysis of high-throughput genomics experimental data, and we highlight the activities of evaluation of methods and benchmarking for which the ESF programme support was instrumental. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. Identification of functional modules using network topology and high-throughput data.

    PubMed

    Ulitsky, Igor; Shamir, Ron

    2007-01-26

    With the advent of systems biology, biological knowledge is often represented today by networks. These include regulatory and metabolic networks, protein-protein interaction networks, and many others. At the same time, high-throughput genomics and proteomics techniques generate very large data sets, which require sophisticated computational analysis. Usually, separate and different analysis methodologies are applied to each of the two data types. An integrated investigation of network and high-throughput information together can improve the quality of the analysis by accounting simultaneously for topological network properties alongside intrinsic features of the high-throughput data. We describe a novel algorithmic framework for this challenge. We first transform the high-throughput data into similarity values, (e.g., by computing pairwise similarity of gene expression patterns from microarray data). Then, given a network of genes or proteins and similarity values between some of them, we seek connected sub-networks (or modules) that manifest high similarity. We develop algorithms for this problem and evaluate their performance on the osmotic shock response network in S. cerevisiae and on the human cell cycle network. We demonstrate that focused, biologically meaningful and relevant functional modules are obtained. In comparison with extant algorithms, our approach has higher sensitivity and higher specificity. We have demonstrated that our method can accurately identify functional modules. Hence, it carries the promise to be highly useful in analysis of high throughput data.

  18. Genome-wide characterization of mammalian promoters with distal enhancer functions.

    PubMed

    Dao, Lan T M; Galindo-Albarrán, Ariel O; Castro-Mondragon, Jaime A; Andrieu-Soler, Charlotte; Medina-Rivera, Alejandra; Souaid, Charbel; Charbonnier, Guillaume; Griffon, Aurélien; Vanhille, Laurent; Stephen, Tharshana; Alomairi, Jaafar; Martin, David; Torres, Magali; Fernandez, Nicolas; Soler, Eric; van Helden, Jacques; Puthier, Denis; Spicuglia, Salvatore

    2017-07-01

    Gene expression in mammals is precisely regulated by the combination of promoters and gene-distal regulatory regions, known as enhancers. Several studies have suggested that some promoters might have enhancer functions. However, the extent of this type of promoters and whether they actually function to regulate the expression of distal genes have remained elusive. Here, by exploiting a high-throughput enhancer reporter assay, we unravel a set of mammalian promoters displaying enhancer activity. These promoters have distinct genomic and epigenomic features and frequently interact with other gene promoters. Extensive CRISPR-Cas9 genomic manipulation demonstrated the involvement of these promoters in the cis regulation of expression of distal genes in their natural loci. Our results have important implications for the understanding of complex gene regulation in normal development and disease.

  19. BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing

    PubMed Central

    Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph

    2011-01-01

    Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797

  20. New Era of Studying RNA Secondary Structure and Its Influence on Gene Regulation in Plants.

    PubMed

    Yang, Xiaofei; Yang, Minglei; Deng, Hongjing; Ding, Yiliang

    2018-01-01

    The dynamic structure of RNA plays a central role in post-transcriptional regulation of gene expression such as RNA maturation, degradation, and translation. With the rise of next-generation sequencing, the study of RNA structure has been transformed from in vitro low-throughput RNA structure probing methods to in vivo high-throughput RNA structure profiling. The development of these methods enables incremental studies on the function of RNA structure to be performed, revealing new insights of novel regulatory mechanisms of RNA structure in plants. Genome-wide scale RNA structure profiling allows us to investigate general RNA structural features over 10s of 1000s of mRNAs and to compare RNA structuromes between plant species. Here, we provide a comprehensive and up-to-date overview of: (i) RNA structure probing methods; (ii) the biological functions of RNA structure; (iii) genome-wide RNA structural features corresponding to their regulatory mechanisms; and (iv) RNA structurome evolution in plants.

  1. HTP-OligoDesigner: An Online Primer Design Tool for High-Throughput Gene Cloning and Site-Directed Mutagenesis.

    PubMed

    Camilo, Cesar M; Lima, Gustavo M A; Maluf, Fernando V; Guido, Rafael V C; Polikarpov, Igor

    2016-01-01

    Following burgeoning genomic and transcriptomic sequencing data, biochemical and molecular biology groups worldwide are implementing high-throughput cloning and mutagenesis facilities in order to obtain a large number of soluble proteins for structural and functional characterization. Since manual primer design can be a time-consuming and error-generating step, particularly when working with hundreds of targets, the automation of primer design process becomes highly desirable. HTP-OligoDesigner was created to provide the scientific community with a simple and intuitive online primer design tool for both laboratory-scale and high-throughput projects of sequence-independent gene cloning and site-directed mutagenesis and a Tm calculator for quick queries.

  2. Decoding genes with coexpression networks and metabolomics - 'majority report by precogs'.

    PubMed

    Saito, Kazuki; Hirai, Masami Y; Yonekura-Sakakibara, Keiko

    2008-01-01

    Following the sequencing of whole genomes of model plants, high-throughput decoding of gene function is a major challenge in modern plant biology. In view of remarkable technical advances in transcriptomics and metabolomics, integrated analysis of these 'omics' by data-mining informatics is an excellent tool for prediction and identification of gene function, particularly for genes involved in complicated metabolic pathways. The availability of Arabidopsis public transcriptome datasets containing data of >1000 microarrays reinforces the potential for prediction of gene function by transcriptome coexpression analysis. Here, we review the strategy of combining transcriptome and metabolome as a powerful technology for studying the functional genomics of model plants and also crop and medicinal plants.

  3. Genomics Portals: integrative web-platform for mining genomics data.

    PubMed

    Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario

    2010-01-13

    A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  4. Genomics Portals: integrative web-platform for mining genomics data

    PubMed Central

    2010-01-01

    Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909

  5. The Gene Expression Omnibus Database.

    PubMed

    Clough, Emily; Barrett, Tanya

    2016-01-01

    The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome-protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/.

  6. MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing

    PubMed Central

    Diroma, Maria Angela; Santorsola, Mariangela; Guttà, Cristiano; Gasparre, Giuseppe; Picardi, Ernesto; Pesole, Graziano; Attimonelli, Marcella

    2014-01-01

    Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25028726

  7. High Throughput Computing Impact on Meta Genomics (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Gore, Brooklin

    2018-02-01

    This presentation includes a brief background on High Throughput Computing, correlating gene transcription factors, optical mapping, genotype to phenotype mapping via QTL analysis, and current work on next gen sequencing.

  8. University of Texas Southwestern Medical Center: High-Throughput siRNA Screening of a Non-Small Cell Lung Cancer (NSCLC) Cell Line Panel | Office of Cancer Genomics

    Cancer.gov

    The goal of this project is to use siRNA screens to identify NSCLC-selective siRNAs from two genome-wide libraries that will allow us to functionally define genetic dependencies of subtypes of NSCLC. Using bioinformatics tools, the CTD2 center at the University of Texas Southwestern Medical Center are discovering associations between this functional data (siRNAs) and NSCLC mutational status, methylation arrays, gene expression arrays, and copy number variation data that will help us identify new targets and enrollment biomarkers. 

  9. University of Texas Southwestern Medical Center (UTSW): High-Throughput siRNA Screening of a Non-Small Cell Lung Cancer (NSCLC) Cell Line Panel | Office of Cancer Genomics

    Cancer.gov

    The goal of this project is to use siRNA screens to identify NSCLC-selective siRNAs from two genome-wide libraries that will allow us to functionally define genetic dependencies of subtypes of NSCLC. Using bioinformatics tools, the CTD2 center at the University of Texas Southwestern Medical Center are discovering associations between this functional data (siRNAs) and NSCLC mutational status, methylation arrays, gene expression arrays, and copy number variation data that will help us identify new targets and enrollment biomarkers. 

  10. A computational genomics pipeline for prokaryotic sequencing projects.

    PubMed

    Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King

    2010-08-01

    New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.

  11. Genomic and Epigenomic Alterations in Cancer.

    PubMed

    Chakravarthi, Balabhadrapatruni V S K; Nepal, Saroj; Varambally, Sooryanarayana

    2016-07-01

    Multiple genetic and epigenetic events characterize tumor progression and define the identity of the tumors. Advances in high-throughput technologies, like gene expression profiling, next-generation sequencing, proteomics, and metabolomics, have enabled detailed molecular characterization of various tumors. The integration and analyses of these high-throughput data have unraveled many novel molecular aberrations and network alterations in tumors. These molecular alterations include multiple cancer-driving mutations, gene fusions, amplification, deletion, and post-translational modifications, among others. Many of these genomic events are being used in cancer diagnosis, whereas others are therapeutically targeted with small-molecule inhibitors. Multiple genes/enzymes that play a role in DNA and histone modifications are also altered in various cancers, changing the epigenomic landscape during cancer initiation and progression. Apart from protein-coding genes, studies are uncovering the critical regulatory roles played by noncoding RNAs and noncoding regions of the genome during cancer progression. Many of these genomic and epigenetic events function in tandem to drive tumor development and metastasis. Concurrent advances in genome-modulating technologies, like gene silencing and genome editing, are providing ability to understand in detail the process of cancer initiation, progression, and signaling as well as opening up avenues for therapeutic targeting. In this review, we discuss some of the recent advances in cancer genomic and epigenomic research. Copyright © 2016 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.

  12. A genome-wide CRISPR library for high-throughput genetic screening in Drosophila cells.

    PubMed

    Bassett, Andrew R; Kong, Lesheng; Liu, Ji-Long

    2015-06-20

    The simplicity of the CRISPR/Cas9 system of genome engineering has opened up the possibility of performing genome-wide targeted mutagenesis in cell lines, enabling screening for cellular phenotypes resulting from genetic aberrations. Drosophila cells have proven to be highly effective in identifying genes involved in cellular processes through similar screens using partial knockdown by RNAi. This is in part due to the lower degree of redundancy between genes in this organism, whilst still maintaining highly conserved gene networks and orthologs of many human disease-causing genes. The ability of CRISPR to generate genetic loss of function mutations not only increases the magnitude of any effect over currently employed RNAi techniques, but allows analysis over longer periods of time which can be critical for certain phenotypes. In this study, we have designed and built a genome-wide CRISPR library covering 13,501 genes, among which 8989 genes are targeted by three or more independent single guide RNAs (sgRNAs). Moreover, we describe strategies to monitor the population of guide RNAs by high throughput sequencing (HTS). We hope that this library will provide an invaluable resource for the community to screen loss of function mutations for cellular phenotypes, and as a source of guide RNA designs for future studies. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  13. High-throughput crystal-optimization strategies in the South Paris Yeast Structural Genomics Project: one size fits all?

    PubMed

    Leulliot, Nicolas; Trésaugues, Lionel; Bremang, Michael; Sorel, Isabelle; Ulryck, Nathalie; Graille, Marc; Aboulfath, Ilham; Poupon, Anne; Liger, Dominique; Quevillon-Cheruel, Sophie; Janin, Joël; van Tilbeurgh, Herman

    2005-06-01

    Crystallization has long been regarded as one of the major bottlenecks in high-throughput structural determination by X-ray crystallography. Structural genomics projects have addressed this issue by using robots to set up automated crystal screens using nanodrop technology. This has moved the bottleneck from obtaining the first crystal hit to obtaining diffraction-quality crystals, as crystal optimization is a notoriously slow process that is difficult to automatize. This article describes the high-throughput optimization strategies used in the Yeast Structural Genomics project, with selected successful examples.

  14. Purdue Ionomics Information Management System. An Integrated Functional Genomics Platform1[C][W][OA

    PubMed Central

    Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S.; Salt, David E.

    2007-01-01

    The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics. PMID:17189337

  15. Strategies to explore functional genomics data sets in NCBI's GEO database.

    PubMed

    Wilhite, Stephen E; Barrett, Tanya

    2012-01-01

    The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.

  16. Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database

    PubMed Central

    Wilhite, Stephen E.; Barrett, Tanya

    2012-01-01

    The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872

  17. FlyRNAi.org—the database of the Drosophila RNAi screening center and transgenic RNAi project: 2017 update

    PubMed Central

    Hu, Yanhui; Comjean, Aram; Roesel, Charles; Vinayagam, Arunachalam; Flockhart, Ian; Zirin, Jonathan; Perkins, Lizabeth; Perrimon, Norbert; Mohr, Stephanie E.

    2017-01-01

    The FlyRNAi database of the Drosophila RNAi Screening Center (DRSC) and Transgenic RNAi Project (TRiP) at Harvard Medical School and associated DRSC/TRiP Functional Genomics Resources website (http://fgr.hms.harvard.edu) serve as a reagent production tracking system, screen data repository, and portal to the community. Through this portal, we make available protocols, online tools, and other resources useful to researchers at all stages of high-throughput functional genomics screening, from assay design and reagent identification to data analysis and interpretation. In this update, we describe recent changes and additions to our website, database and suite of online tools. Recent changes reflect a shift in our focus from a single technology (RNAi) and model species (Drosophila) to the application of additional technologies (e.g. CRISPR) and support of integrated, cross-species approaches to uncovering gene function using functional genomics and other approaches. PMID:27924039

  18. From cancer genomes to cancer models: bridging the gaps

    PubMed Central

    Baudot, Anaïs; Real, Francisco X.; Izarzugaza, José M. G.; Valencia, Alfonso

    2009-01-01

    Cancer genome projects are now being expanded in an attempt to provide complete landscapes of the mutations that exist in tumours. Although the importance of cataloguing genome variations is well recognized, there are obvious difficulties in bridging the gaps between high-throughput resequencing information and the molecular mechanisms of cancer evolution. Here, we describe the current status of the high-throughput genomic technologies, and the current limitations of the associated computational analysis and experimental validation of cancer genetic variants. We emphasize how the current cancer-evolution models will be influenced by the high-throughput approaches, in particular through efforts devoted to monitoring tumour progression, and how, in turn, the integration of data and models will be translated into mechanistic knowledge and clinical applications. PMID:19305388

  19. Generalized schemes for high throughput manipulation of the Desulfovibrio vulgaris Hildenborough genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chhabra, S.R.; Butland, G.; Elias, D.

    The ability to conduct advanced functional genomic studies of the thousands of sequenced bacteria has been hampered by the lack of available tools for making high- throughput chromosomal manipulations in a systematic manner that can be applied across diverse species. In this work, we highlight the use of synthetic biological tools to assemble custom suicide vectors with reusable and interchangeable DNA “parts” to facilitate chromosomal modification at designated loci. These constructs enable an array of downstream applications including gene replacement and creation of gene fusions with affinity purification or localization tags. We employed this approach to engineer chromosomal modifications inmore » a bacterium that has previously proven difficult to manipulate genetically, Desulfovibrio vulgaris Hildenborough, to generate a library of over 700 strains. Furthermore, we demonstrate how these modifications can be used for examining metabolic pathways, protein-protein interactions, and protein localization. The ubiquity of suicide constructs in gene replacement throughout biology suggests that this approach can be applied to engineer a broad range of species for a diverse array of systems biological applications and is amenable to high-throughput implementation.« less

  20. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain

    PubMed Central

    Schrider, Daniel R.; Kern, Andrew D.

    2015-01-01

    The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods. PMID:26590212

  1. [Three-dimensional genome organization: a lesson from the Polycomb-Group proteins].

    PubMed

    Bantignies, Frédéric

    2013-01-01

    As more and more genomes are being explored and annotated, important features of three-dimensional (3D) genome organization are just being uncovered. In the light of what we know about Polycomb group (PcG) proteins, we will present the latest findings on this topic. The PcG proteins are well-conserved chromatin factors that repress transcription of numerous target genes. They bind the genome at specific sites, forming chromatin domains of associated histone modifications as well as higher-order chromatin structures. These 3D chromatin structures involve the interactions between PcG-bound regulatory regions at short- and long-range distances, and may significantly contribute to PcG function. Recent high throughput "Chromosome Conformation Capture" (3C) analyses have revealed many other higher order structures along the chromatin fiber, partitioning the genomes into well demarcated topological domains. This revealed an unprecedented link between linear epigenetic domains and chromosome architecture, which might be intimately connected to genome function. © Société de Biologie, 2013.

  2. Dana-Farber Cancer Institute: Identification of Therapeutic Targets Across Cancer Types | Office of Cancer Genomics

    Cancer.gov

    The Dana Farber Cancer Institute CTD2 Center focuses on the use of high-throughput genetic and bioinformatic approaches to identify and credential oncogenes and co-dependencies in cancers. This Center aims to provide the cancer research community with information that will facilitate the prioritization of targets based on both genomic and functional evidence, inform the most appropriate genetic context for downstream mechanistic and validation studies, and enable the translation of this information into therapeutics and diagnostics.

  3. The Giardia genome project database.

    PubMed

    McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L

    2000-08-15

    The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.

  4. Harnessing CRISPR-Cas systems for bacterial genome editing.

    PubMed

    Selle, Kurt; Barrangou, Rodolphe

    2015-04-01

    Manipulation of genomic sequences facilitates the identification and characterization of key genetic determinants in the investigation of biological processes. Genome editing via clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) constitutes a next-generation method for programmable and high-throughput functional genomics. CRISPR-Cas systems are readily reprogrammed to induce sequence-specific DNA breaks at target loci, resulting in fixed mutations via host-dependent DNA repair mechanisms. Although bacterial genome editing is a relatively unexplored and underrepresented application of CRISPR-Cas systems, recent studies provide valuable insights for the widespread future implementation of this technology. This review summarizes recent progress in bacterial genome editing and identifies fundamental genetic and phenotypic outcomes of CRISPR targeting in bacteria, in the context of tool development, genome homeostasis, and DNA repair. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Editor's Highlight: High-Throughput Functional Genomics Identifies Modulators of TCE Metabolite Genotoxicity and Candidate Susceptibility Genes.

    PubMed

    De La Rosa, Vanessa Y; Asfaha, Jonathan; Fasullo, Michael; Loguinov, Alex; Li, Peng; Moore, Lee E; Rothman, Nathaniel; Nakamura, Jun; Swenberg, James A; Scelo, Ghislaine; Zhang, Luoping; Smith, Martyn T; Vulpe, Chris D

    2017-11-01

    Trichloroethylene (TCE), an industrial chemical and environmental contaminant, is a human carcinogen. Reactive metabolites are implicated in renal carcinogenesis associated with TCE exposure, yet the toxicity mechanisms of these metabolites and their contribution to cancer and other adverse effects remain unclear. We employed an integrated functional genomics approach that combined functional profiling studies in yeast and avian DT40 cell models to provide new insights into the specific mechanisms contributing to toxicity associated with TCE metabolites. Genome-wide profiling studies in yeast identified the error-prone translesion synthesis (TLS) pathway as an import mechanism in response to TCE metabolites. The role of TLS DNA repair was further confirmed by functional profiling in DT40 avian cell lines, but also revealed that TLS and homologous recombination DNA repair likely play competing roles in cellular susceptibility to TCE metabolites in higher eukaryotes. These DNA repair pathways are highly conserved between yeast, DT40, and humans. We propose that in humans, mutagenic TLS is favored over homologous recombination repair in response to TCE metabolites. The results of these studies contribute to the body of evidence supporting a mutagenic mode of action for TCE-induced renal carcinogenesis mediated by reactive metabolites in humans. Our approach illustrates the potential for high-throughput in vitro functional profiling in yeast to elucidate toxicity pathways (molecular initiating events, key events) and candidate susceptibility genes for focused study. © The Author 2017. Published by Oxford University Press on behalf of the Society of Toxicology. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Molecular characterization of a novel Nucleorhabdovirus from black currant identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...

  7. iScreen: Image-Based High-Content RNAi Screening Analysis Tools.

    PubMed

    Zhong, Rui; Dong, Xiaonan; Levine, Beth; Xie, Yang; Xiao, Guanghua

    2015-09-01

    High-throughput RNA interference (RNAi) screening has opened up a path to investigating functional genomics in a genome-wide pattern. However, such studies are often restricted to assays that have a single readout format. Recently, advanced image technologies have been coupled with high-throughput RNAi screening to develop high-content screening, in which one or more cell image(s), instead of a single readout, were generated from each well. This image-based high-content screening technology has led to genome-wide functional annotation in a wider spectrum of biological research studies, as well as in drug and target discovery, so that complex cellular phenotypes can be measured in a multiparametric format. Despite these advances, data analysis and visualization tools are still largely lacking for these types of experiments. Therefore, we developed iScreen (image-Based High-content RNAi Screening Analysis Tool), an R package for the statistical modeling and visualization of image-based high-content RNAi screening. Two case studies were used to demonstrate the capability and efficiency of the iScreen package. iScreen is available for download on CRAN (http://cran.cnr.berkeley.edu/web/packages/iScreen/index.html). The user manual is also available as a supplementary document. © 2014 Society for Laboratory Automation and Screening.

  8. Large-Scale Comparative Phenotypic and Genomic Analyses Reveal Ecological Preferences of Shewanella Species and Identify Metabolic Pathways Conserved at the Genus Level ▿ †

    PubMed Central

    Rodrigues, Jorge L. M.; Serres, Margrethe H.; Tiedje, James M.

    2011-01-01

    The use of comparative genomics for the study of different microbiological species has increased substantially as sequence technologies become more affordable. However, efforts to fully link a genotype to its phenotype remain limited to the development of one mutant at a time. In this study, we provided a high-throughput alternative to this limiting step by coupling comparative genomics to the use of phenotype arrays for five sequenced Shewanella strains. Positive phenotypes were obtained for 441 nutrients (C, N, P, and S sources), with N-based compounds being the most utilized for all strains. Many genes and pathways predicted by genome analyses were confirmed with the comparative phenotype assay, and three degradation pathways believed to be missing in Shewanella were confirmed as missing. A number of previously unknown gene products were predicted to be parts of pathways or to have a function, expanding the number of gene targets for future genetic analyses. Ecologically, the comparative high-throughput phenotype analysis provided insights into niche specialization among the five different strains. For example, Shewanella amazonensis strain SB2B, isolated from the Amazon River delta, was capable of utilizing 60 C compounds, whereas Shewanella sp. strain W3-18-1, isolated from deep marine sediment, utilized only 25 of them. In spite of the large number of nutrient sources yielding positive results, our study indicated that except for the N sources, they were not sufficiently informative to predict growth phenotypes from increasing evolutionary distances. Our results indicate the importance of phenotypic evaluation for confirming genome predictions. This strategy will accelerate the functional discovery of genes and provide an ecological framework for microbial genome sequencing projects. PMID:21642407

  9. Combining high-throughput phenotyping and genome-wide association studies to reveal natural genetic variation in rice

    PubMed Central

    Yang, Wanneng; Guo, Zilong; Huang, Chenglong; Duan, Lingfeng; Chen, Guoxing; Jiang, Ni; Fang, Wei; Feng, Hui; Xie, Weibo; Lian, Xingming; Wang, Gongwei; Luo, Qingming; Zhang, Qifa; Liu, Qian; Xiong, Lizhong

    2014-01-01

    Even as the study of plant genomics rapidly develops through the use of high-throughput sequencing techniques, traditional plant phenotyping lags far behind. Here we develop a high-throughput rice phenotyping facility (HRPF) to monitor 13 traditional agronomic traits and 2 newly defined traits during the rice growth period. Using genome-wide association studies (GWAS) of the 15 traits, we identify 141 associated loci, 25 of which contain known genes such as the Green Revolution semi-dwarf gene, SD1. Based on a performance evaluation of the HRPF and GWAS results, we demonstrate that high-throughput phenotyping has the potential to replace traditional phenotyping techniques and can provide valuable gene identification information. The combination of the multifunctional phenotyping tools HRPF and GWAS provides deep insights into the genetic architecture of important traits. PMID:25295980

  10. Plant functional genomics

    NASA Astrophysics Data System (ADS)

    Holtorf, Hauke; Guitton, Marie-Christine; Reski, Ralf

    2002-04-01

    Functional genome analysis of plants has entered the high-throughput stage. The complete genome information from key species such as Arabidopsis thaliana and rice is now available and will further boost the application of a range of new technologies to functional plant gene analysis. To broadly assign functions to unknown genes, different fast and multiparallel approaches are currently used and developed. These new technologies are based on known methods but are adapted and improved to accommodate for comprehensive, large-scale gene analysis, i.e. such techniques are novel in the sense that their design allows researchers to analyse many genes at the same time and at an unprecedented pace. Such methods allow analysis of the different constituents of the cell that help to deduce gene function, namely the transcripts, proteins and metabolites. Similarly the phenotypic variations of entire mutant collections can now be analysed in a much faster and more efficient way than before. The different methodologies have developed to form their own fields within the functional genomics technological platform and are termed transcriptomics, proteomics, metabolomics and phenomics. Gene function, however, cannot solely be inferred by using only one such approach. Rather, it is only by bringing together all the information collected by different functional genomic tools that one will be able to unequivocally assign functions to unknown plant genes. This review focuses on current technical developments and their impact on the field of plant functional genomics. The lower plant Physcomitrella is introduced as a new model system for gene function analysis, owing to its high rate of homologous recombination.

  11. Pediatric Glioblastoma Therapies Based on Patient-Derived Stem Cell Resources

    DTIC Science & Technology

    2014-11-01

    genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate gene...and genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate...PRISM 7900 Sequence Detection System ( Genomics Resource, FHCRC). Relative transcript abundance was analyzed using the 2−ΔΔCt method. TRIzol (Invitrogen

  12. High-throughput Methods Redefine the Rumen Microbiome and Its Relationship with Nutrition and Metabolism

    PubMed Central

    McCann, Joshua C.; Wickersham, Tryon A.; Loor, Juan J.

    2014-01-01

    Diversity in the forestomach microbiome is one of the key features of ruminant animals. The diverse microbial community adapts to a wide array of dietary feedstuffs and management strategies. Understanding rumen microbiome composition, adaptation, and function has global implications ranging from climatology to applied animal production. Classical knowledge of rumen microbiology was based on anaerobic, culture-dependent methods. Next-generation sequencing and other molecular techniques have uncovered novel features of the rumen microbiome. For instance, pyrosequencing of the 16S ribosomal RNA gene has revealed the taxonomic identity of bacteria and archaea to the genus level, and when complemented with barcoding adds multiple samples to a single run. Whole genome shotgun sequencing generates true metagenomic sequences to predict the functional capability of a microbiome, and can also be used to construct genomes of isolated organisms. Integration of high-throughput data describing the rumen microbiome with classic fermentation and animal performance parameters has produced meaningful advances and opened additional areas for study. In this review, we highlight recent studies of the rumen microbiome in the context of cattle production focusing on nutrition, rumen development, animal efficiency, and microbial function. PMID:24940050

  13. Rice-Map: a new-generation rice genome browser.

    PubMed

    Wang, Jun; Kong, Lei; Zhao, Shuqi; Zhang, He; Tang, Liang; Li, Zhe; Gu, Xiaocheng; Luo, Jingchu; Gao, Ge

    2011-03-30

    The concurrent release of rice genome sequences for two subspecies (Oryza sativa L. ssp. japonica and Oryza sativa L. ssp. indica) facilitates rice studies at the whole genome level. Since the advent of high-throughput analysis, huge amounts of functional genomics data have been delivered rapidly, making an integrated online genome browser indispensable for scientists to visualize and analyze these data. Based on next-generation web technologies and high-throughput experimental data, we have developed Rice-Map, a novel genome browser for researchers to navigate, analyze and annotate rice genome interactively. More than one hundred annotation tracks (81 for japonica and 82 for indica) have been compiled and loaded into Rice-Map. These pre-computed annotations cover gene models, transcript evidences, expression profiling, epigenetic modifications, inter-species and intra-species homologies, genetic markers and other genomic features. In addition to these pre-computed tracks, registered users can interactively add comments and research notes to Rice-Map as User-Defined Annotation entries. By smoothly scrolling, dragging and zooming, users can browse various genomic features simultaneously at multiple scales. On-the-fly analysis for selected entries could be performed through dedicated bioinformatic analysis platforms such as WebLab and Galaxy. Furthermore, a BioMart-powered data warehouse "Rice Mart" is offered for advanced users to fetch bulk datasets based on complex criteria. Rice-Map delivers abundant up-to-date japonica and indica annotations, providing a valuable resource for both computational and bench biologists. Rice-Map is publicly accessible at http://www.ricemap.org/, with all data available for free downloading.

  14. A computational genomics pipeline for prokaryotic sequencing projects

    PubMed Central

    Kislyuk, Andrey O.; Katz, Lee S.; Agrawal, Sonia; Hagen, Matthew S.; Conley, Andrew B.; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C.; Sammons, Scott A.; Govil, Dhwani; Mair, Raydel D.; Tatti, Kathleen M.; Tondella, Maria L.; Harcourt, Brian H.; Mayer, Leonard W.; Jordan, I. King

    2010-01-01

    Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. Results: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems. Contact: king.jordan@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20519285

  15. NCBI GEO: archive for functional genomics data sets--update.

    PubMed

    Barrett, Tanya; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Holko, Michelle; Yefanov, Andrey; Lee, Hyeseung; Zhang, Naigong; Robertson, Cynthia L; Serova, Nadezhda; Davis, Sean; Soboleva, Alexandra

    2013-01-01

    The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

  16. Identifying Bacterial Immune Evasion Proteins Using Phage Display.

    PubMed

    Fevre, Cindy; Scheepmaker, Lisette; Haas, Pieter-Jan

    2017-01-01

    Methods aimed at identification of immune evasion proteins are mainly rely on in silico prediction of sequence, structural homology to known evasion proteins or use a proteomics driven approach. Although proven successful these methods are limited by a low efficiency and or lack of functional identification. Here we describe a high-throughput genomic strategy to functionally identify bacterial immune evasion proteins using phage display technology. Genomic bacterial DNA is randomly fragmented and ligated into a phage display vector that is used to create a phage display library expressing bacterial secreted and membrane bound proteins. This library is used to select displayed bacterial secretome proteins that interact with host immune components.

  17. A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

    PubMed

    Swain, Martin T; Tsai, Isheng J; Assefa, Samual A; Newbold, Chris; Berriman, Matthew; Otto, Thomas D

    2012-06-07

    Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.

  18. Molecular characterization of a novel Luteovirus from peach identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence homologies to Cherry-associated luteovirus were identified by high-throughput sequencing analysis of two peach accessions undergoing quarantine testing. The complete genomic sequences of the two isolates of this virus are 5,819 and 5,814 nucleotides. Their genome organization i...

  19. Genomics tools available for unravelling mechanisms underlying agronomical traits in strawberry with more to come

    USDA-ARS?s Scientific Manuscript database

    In the last few years, high-throughput genomics promised to bridge the gap between plant physiology and plant sciences. In addition, high-throughput genotyping technologies facilitate marker-based selection for better performing genotypes. In strawberry, Fragaria vesca was the first reference sequen...

  20. Perspectives on the mechanism of transcriptional regulation by long non-coding RNAs.

    PubMed

    Roberts, Thomas C; Morris, Kevin V; Weinberg, Marc S

    2014-01-01

    Long non-coding RNAs (lncRNAs) are increasingly being recognized as epigenetic regulators of gene transcription. The diversity and complexity of lncRNA genes means that they exert their regulatory effects by a variety of mechanisms. Although there is still much to be learned about the mechanism of lncRNA function, general principles are starting to emerge. In particular, the application of high throughput (deep) sequencing methodologies has greatly advanced our understanding of lncRNA gene function. lncRNAs function as adaptors that link specific chromatin loci with chromatin-remodeling complexes and transcription factors. lncRNAs can act in cis or trans to guide epigenetic-modifier complexes to distinct genomic sites, or act as scaffolds which recruit multiple proteins simultaneously, thereby coordinating their activities. In this review we discuss the genomic organization of lncRNAs, the importance of RNA secondary structure to lncRNA functionality, the multitude of ways in which they interact with the genome, and what evolutionary conservation tells us about their function.

  1. The high throughput biomedicine unit at the institute for molecular medicine Finland: high throughput screening meets precision medicine.

    PubMed

    Pietiainen, Vilja; Saarela, Jani; von Schantz, Carina; Turunen, Laura; Ostling, Paivi; Wennerberg, Krister

    2014-05-01

    The High Throughput Biomedicine (HTB) unit at the Institute for Molecular Medicine Finland FIMM was established in 2010 to serve as a national and international academic screening unit providing access to state of the art instrumentation for chemical and RNAi-based high throughput screening. The initial focus of the unit was multiwell plate based chemical screening and high content microarray-based siRNA screening. However, over the first four years of operation, the unit has moved to a more flexible service platform where both chemical and siRNA screening is performed at different scales primarily in multiwell plate-based assays with a wide range of readout possibilities with a focus on ultraminiaturization to allow for affordable screening for the academic users. In addition to high throughput screening, the equipment of the unit is also used to support miniaturized, multiplexed and high throughput applications for other types of research such as genomics, sequencing and biobanking operations. Importantly, with the translational research goals at FIMM, an increasing part of the operations at the HTB unit is being focused on high throughput systems biological platforms for functional profiling of patient cells in personalized and precision medicine projects.

  2. cncRNAs: Bi-functional RNAs with protein coding and non-coding functions

    PubMed Central

    Kumari, Pooja; Sampath, Karuna

    2015-01-01

    For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036

  3. Mapping DNA Methylation with High Throughput Nanopore Sequencing

    PubMed Central

    Rand, Arthur C.; Jain, Miten; Eizenga, Jordan M.; Musselman-Brown, Audrey; Olsen, Hugh E.; Akeson, Mark

    2017-01-01

    Chemical modifications to DNA regulate its biological function. We present a framework for mapping methylation to cytosine and adenosine with the Oxford Nanopore Technologies MinION using its ionic current signal. We map three cytosine variants and two adenine variants. The results show that our model is sensitive enough to detect changes in genomic DNA methylation levels as a function of growth phase in E. coli. PMID:28218897

  4. Decoding transcriptional enhancers: Evolving from annotation to functional interpretation

    PubMed Central

    Engel, Krysta L.; Mackiewicz, Mark; Hardigan, Andrew A.; Myers, Richard M.; Savic, Daniel

    2016-01-01

    Deciphering the intricate molecular processes that orchestrate the spatial and temporal regulation of genes has become an increasingly major focus of biological research. The differential expression of genes by diverse cell types with a common genome is a hallmark of complex cellular functions, as well as the basis for multicellular life. Importantly, a more coherent understanding of gene regulation is critical for defining developmental processes, evolutionary principles and disease etiologies. Here we present our current understanding of gene regulation by focusing on the role of enhancer elements in these complex processes. Although functional genomic methods have provided considerable advances to our understanding of gene regulation, these assays, which are usually performed on a genome-wide scale, typically provide correlative observations that lack functional interpretation. Recent innovations in genome editing technologies have placed gene regulatory studies at an exciting crossroads, as systematic, functional evaluation of enhancers and other transcriptional regulatory elements can now be performed in a coordinated, high-throughput manner across the entire genome. This review provides insights on transcriptional enhancer function, their role in development and disease, and catalogues experimental tools commonly used to study these elements. Additionally, we discuss the crucial role of novel techniques in deciphering the complex gene regulatory landscape and how these studies will shape future research. PMID:27224938

  5. Decoding transcriptional enhancers: Evolving from annotation to functional interpretation.

    PubMed

    Engel, Krysta L; Mackiewicz, Mark; Hardigan, Andrew A; Myers, Richard M; Savic, Daniel

    2016-09-01

    Deciphering the intricate molecular processes that orchestrate the spatial and temporal regulation of genes has become an increasingly major focus of biological research. The differential expression of genes by diverse cell types with a common genome is a hallmark of complex cellular functions, as well as the basis for multicellular life. Importantly, a more coherent understanding of gene regulation is critical for defining developmental processes, evolutionary principles and disease etiologies. Here we present our current understanding of gene regulation by focusing on the role of enhancer elements in these complex processes. Although functional genomic methods have provided considerable advances to our understanding of gene regulation, these assays, which are usually performed on a genome-wide scale, typically provide correlative observations that lack functional interpretation. Recent innovations in genome editing technologies have placed gene regulatory studies at an exciting crossroads, as systematic, functional evaluation of enhancers and other transcriptional regulatory elements can now be performed in a coordinated, high-throughput manner across the entire genome. This review provides insights on transcriptional enhancer function, their role in development and disease, and catalogues experimental tools commonly used to study these elements. Additionally, we discuss the crucial role of novel techniques in deciphering the complex gene regulatory landscape and how these studies will shape future research. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering.

    PubMed

    Garst, Andrew D; Bassalo, Marcelo C; Pines, Gur; Lynch, Sean A; Halweg-Edwards, Andrea L; Liu, Rongming; Liang, Liya; Wang, Zhiwen; Zeitoun, Ramsey; Alexander, William G; Gill, Ryan T

    2017-01-01

    Improvements in DNA synthesis and sequencing have underpinned comprehensive assessment of gene function in bacteria and eukaryotes. Genome-wide analyses require high-throughput methods to generate mutations and analyze their phenotypes, but approaches to date have been unable to efficiently link the effects of mutations in coding regions or promoter elements in a highly parallel fashion. We report that CRISPR-Cas9 gene editing in combination with massively parallel oligomer synthesis can enable trackable editing on a genome-wide scale. Our method, CRISPR-enabled trackable genome engineering (CREATE), links each guide RNA to homologous repair cassettes that both edit loci and function as barcodes to track genotype-phenotype relationships. We apply CREATE to site saturation mutagenesis for protein engineering, reconstruction of adaptive laboratory evolution experiments, and identification of stress tolerance and antibiotic resistance genes in bacteria. We provide preliminary evidence that CREATE will work in yeast. We also provide a webtool to design multiplex CREATE libraries.

  7. Genome sequence analysis of a flocculant-producing bacterium, Paenibacillus shenyangensis.

    PubMed

    Fu, Lili; Jiang, Binhui; Liu, Jinliang; Zhao, Xin; Liu, Qian; Hu, Xiaomin

    2016-03-01

    To explore the metabolic process of Paenibacillus shenyangensis that is an efficient bioflocculant-producing bacterium. The biosynthesis mechanism of bioflocculation was used to enrich the genome of Paenibacillus shenyangensis and provide a basis for molecular genetics and functional genomics analyses. According to the analysis of de novo assembly, a total of 5,501,467 bp clean reads were generated, and were assembled into 92 contigs. 4800 unigenes were predicted of which 4393 were annotated showing a specific gene function in the NCBI-Nr database. 3423 genes were found in the database of cluster of orthologous groups. Among the 168 Kyoto Encyclopedia of Genes and Genomes database, cell growth and metabolism were the main biological processes, and a potential metabolic pathway was predicted from glucose to exopolysaccharide within the starch and sucrose metabolism pathway. By using the high-throughput sequencing technology, we provide a genome analysis of Paenibacillus shenyangensis that predicts the main metabolic processes and a potential pathway of exopolysaccharide biosynthesis.

  8. The most common technologies and tools for functional genome analysis.

    PubMed

    Gasperskaja, Evelina; Kučinskas, Vaidutis

    2017-01-01

    Since the sequence of the human genome is complete, the main issue is how to understand the information written in the DNA sequence. Despite numerous genome-wide studies that have already been performed, the challenge to determine the function of genes, gene products, and also their interaction is still open. As changes in the human genome are highly likely to cause pathological conditions, functional analysis is vitally important for human health. For many years there have been a variety of technologies and tools used in functional genome analysis. However, only in the past decade there has been rapid revolutionizing progress and improvement in high-throughput methods, which are ranging from traditional real-time polymerase chain reaction to more complex systems, such as next-generation sequencing or mass spectrometry. Furthermore, not only laboratory investigation, but also accurate bioinformatic analysis is required for reliable scientific results. These methods give an opportunity for accurate and comprehensive functional analysis that involves various fields of studies: genomics, epigenomics, proteomics, and interactomics. This is essential for filling the gaps in the knowledge about dynamic biological processes at both cellular and organismal level. However, each method has both advantages and limitations that should be taken into account before choosing the right method for particular research in order to ensure successful study. For this reason, the present review paper aims to describe the most frequent and widely-used methods for the comprehensive functional analysis.

  9. Ion channel drug discovery and research: the automated Nano-Patch-Clamp technology.

    PubMed

    Brueggemann, A; George, M; Klau, M; Beckler, M; Steindl, J; Behrends, J C; Fertig, N

    2004-01-01

    Unlike the genomics revolution, which was largely enabled by a single technological advance (high throughput sequencing), rapid advancement in proteomics will require a broader effort to increase the throughput of a number of key tools for functional analysis of different types of proteins. In the case of ion channels -a class of (membrane) proteins of great physiological importance and potential as drug targets- the lack of adequate assay technologies is felt particularly strongly. The available, indirect, high throughput screening methods for ion channels clearly generate insufficient information. The best technology to study ion channel function and screen for compound interaction is the patch clamp technique, but patch clamping suffers from low throughput, which is not acceptable for drug screening. A first step towards a solution is presented here. The nano patch clamp technology, which is based on a planar, microstructured glass chip, enables automatic whole cell patch clamp measurements. The Port-a-Patch is an automated electrophysiology workstation, which uses planar patch clamp chips. This approach enables high quality and high content ion channel and compound evaluation on a one-cell-at-a-time basis. The presented automation of the patch process and its scalability to an array format are the prerequisites for any higher throughput electrophysiology instruments.

  10. Fulfilling the promise of the materials genome initiative with high-throughput experimental methodologies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Green, Martin L.; Choi, C. L.; Hattrick-Simpers, J. R.

    The Materials Genome Initiative, a national effort to introduce new materials into the market faster and at lower cost, has made significant progress in computational simulation and modeling of materials. To build on this progress, a large amount of experimental data for validating these models, and informing more sophisticated ones, will be required. High-throughput experimentation generates large volumes of experimental data using combinatorial materials synthesis and rapid measurement techniques, making it an ideal experimental complement to bring the Materials Genome Initiative vision to fruition. This paper reviews the state-of-the-art results, opportunities, and challenges in high-throughput experimentation for materials design. Asmore » a result, a major conclusion is that an effort to deploy a federated network of high-throughput experimental (synthesis and characterization) tools, which are integrated with a modern materials data infrastructure, is needed.« less

  11. Fulfilling the promise of the materials genome initiative with high-throughput experimental methodologies

    DOE PAGES

    Green, Martin L.; Choi, C. L.; Hattrick-Simpers, J. R.; ...

    2017-03-28

    The Materials Genome Initiative, a national effort to introduce new materials into the market faster and at lower cost, has made significant progress in computational simulation and modeling of materials. To build on this progress, a large amount of experimental data for validating these models, and informing more sophisticated ones, will be required. High-throughput experimentation generates large volumes of experimental data using combinatorial materials synthesis and rapid measurement techniques, making it an ideal experimental complement to bring the Materials Genome Initiative vision to fruition. This paper reviews the state-of-the-art results, opportunities, and challenges in high-throughput experimentation for materials design. Asmore » a result, a major conclusion is that an effort to deploy a federated network of high-throughput experimental (synthesis and characterization) tools, which are integrated with a modern materials data infrastructure, is needed.« less

  12. Mutant power: using mutant allele collections for yeast functional genomics.

    PubMed

    Norman, Kaitlyn L; Kumar, Anuj

    2016-03-01

    The budding yeast has long served as a model eukaryote for the functional genomic analysis of highly conserved signaling pathways, cellular processes and mechanisms underlying human disease. The collection of reagents available for genomics in yeast is extensive, encompassing a growing diversity of mutant collections beyond gene deletion sets in the standard wild-type S288C genetic background. We review here three main types of mutant allele collections: transposon mutagen collections, essential gene collections and overexpression libraries. Each collection provides unique and identifiable alleles that can be utilized in genome-wide, high-throughput studies. These genomic reagents are particularly informative in identifying synthetic phenotypes and functions associated with essential genes, including those modeled most effectively in complex genetic backgrounds. Several examples of genomic studies in filamentous/pseudohyphal backgrounds are provided here to illustrate this point. Additionally, the limitations of each approach are examined. Collectively, these mutant allele collections in Saccharomyces cerevisiae and the related pathogenic yeast Candida albicans promise insights toward an advanced understanding of eukaryotic molecular and cellular biology. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  13. Characterization and complete genome sequence of a previously uncharacterized panicovirus from Bermuda grass detected by high throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high throughput sequencing (HTS). The nearly full genome sequence of a previously uncharacterized Panicovirus was identified from...

  14. Multitrait, random regression, or simple repeatability model in high-throughput phenotyping data improve genomic prediction for wheat grain yield

    USDA-ARS?s Scientific Manuscript database

    High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat (Triticum aestivum L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect s...

  15. A high-throughput next-generation sequencing-based method for detecting the mutational fingerprint of carcinogens

    PubMed Central

    Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella

    2012-01-01

    Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue® mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents. PMID:22735701

  16. A high-throughput next-generation sequencing-based method for detecting the mutational fingerprint of carcinogens.

    PubMed

    Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella

    2012-08-01

    Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents.

  17. Construction of a minimal genome as a chassis for synthetic biology.

    PubMed

    Sung, Bong Hyun; Choe, Donghui; Kim, Sun Chang; Cho, Byung-Kwan

    2016-11-30

    Microbial diversity and complexity pose challenges in understanding the voluminous genetic information produced from whole-genome sequences, bioinformatics and high-throughput '-omics' research. These challenges can be overcome by a core blueprint of a genome drawn with a minimal gene set, which is essential for life. Systems biology and large-scale gene inactivation studies have estimated the number of essential genes to be ∼300-500 in many microbial genomes. On the basis of the essential gene set information, minimal-genome strains have been generated using sophisticated genome engineering techniques, such as genome reduction and chemical genome synthesis. Current size-reduced genomes are not perfect minimal genomes, but chemically synthesized genomes have just been constructed. Some minimal genomes provide various desirable functions for bioindustry, such as improved genome stability, increased transformation efficacy and improved production of biomaterials. The minimal genome as a chassis genome for synthetic biology can be used to construct custom-designed genomes for various practical and industrial applications. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.

  18. Computational Approaches to Phenotyping

    PubMed Central

    Lussier, Yves A.; Liu, Yang

    2007-01-01

    The recent completion of the Human Genome Project has made possible a high-throughput “systems approach” for accelerating the elucidation of molecular underpinnings of human diseases, and subsequent derivation of molecular-based strategies to more effectively prevent, diagnose, and treat these diseases. Although altered phenotypes are among the most reliable manifestations of altered gene functions, research using systematic analysis of phenotype relationships to study human biology is still in its infancy. This article focuses on the emerging field of high-throughput phenotyping (HTP) phenomics research, which aims to capitalize on novel high-throughput computation and informatics technology developments to derive genomewide molecular networks of genotype–phenotype associations, or “phenomic associations.” The HTP phenomics research field faces the challenge of technological research and development to generate novel tools in computation and informatics that will allow researchers to amass, access, integrate, organize, and manage phenotypic databases across species and enable genomewide analysis to associate phenotypic information with genomic data at different scales of biology. Key state-of-the-art technological advancements critical for HTP phenomics research are covered in this review. In particular, we highlight the power of computational approaches to conduct large-scale phenomics studies. PMID:17202287

  19. Generalized Schemes for High Throughput Manipulation of the Desulfovibrio vulgaris Hildenborough Genome.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chhabra, Swapnil; Butland, Gareth; Elias, Dwayne A

    The ability to conduct advanced functional genomic studies of the thousands of 38 sequenced bacteria has been hampered by the lack of available tools for making high39 throughput chromosomal manipulations in a systematic manner that can be applied across 40 diverse species. In this work, we highlight the use of synthetic biological tools to 41 assemble custom suicide vectors with reusable and interchangeable DNA parts to 42 facilitate chromosomal modification at designated loci. These constructs enable an array 43 of downstream applications including gene replacement and creation of gene fusions with 44 affinity purification or localization tags. We employed thismore » approach to engineer 45 chromosomal modifications in a bacterium that has previously proven difficult to 46 manipulate genetically, Desulfovibrio vulgaris Hildenborough, to generate a library of 47 662 strains. Furthermore, we demonstrate how these modifications can be used for 48 examining metabolic pathways, protein-protein interactions, and protein localization. The 49 ubiquity of suicide constructs in gene replacement throughout biology suggests that this 50 approach can be applied to engineer a broad range of species for a diverse array of 51 systems biological applications and is amenable to high-throughput implementation.« less

  20. Functional Assays to Screen and Dissect Genomic Hits: Doubling Down on the National Investment in Genomic Research.

    PubMed

    Musunuru, Kiran; Bernstein, Daniel; Cole, F Sessions; Khokha, Mustafa K; Lee, Frank S; Lin, Shin; McDonald, Thomas V; Moskowitz, Ivan P; Quertermous, Thomas; Sankaran, Vijay G; Schwartz, David A; Silverman, Edwin K; Zhou, Xiaobo; Hasan, Ahmed A K; Luo, Xiao-Zhong James

    2018-04-01

    The National Institutes of Health have made substantial investments in genomic studies and technologies to identify DNA sequence variants associated with human disease phenotypes. The National Heart, Lung, and Blood Institute has been at the forefront of these commitments to ascertain genetic variation associated with heart, lung, blood, and sleep diseases and related clinical traits. Genome-wide association studies, exome- and genome-sequencing studies, and exome-genotyping studies of the National Heart, Lung, and Blood Institute-funded epidemiological and clinical case-control studies are identifying large numbers of genetic variants associated with heart, lung, blood, and sleep phenotypes. However, investigators face challenges in identification of genomic variants that are functionally disruptive among the myriad of computationally implicated variants. Studies to define mechanisms of genetic disruption encoded by computationally identified genomic variants require reproducible, adaptable, and inexpensive methods to screen candidate variant and gene function. High-throughput strategies will permit a tiered variant discovery and genetic mechanism approach that begins with rapid functional screening of a large number of computationally implicated variants and genes for discovery of those that merit mechanistic investigation. As such, improved variant-to-gene and gene-to-function screens-and adequate support for such studies-are critical to accelerating the translation of genomic findings. In this White Paper, we outline the variety of novel technologies, assays, and model systems that are making such screens faster, cheaper, and more accurate, referencing published work and ongoing work supported by the National Heart, Lung, and Blood Institute's R21/R33 Functional Assays to Screen Genomic Hits program. We discuss priorities that can accelerate the impressive but incomplete progress represented by big data genomic research. © 2018 American Heart Association, Inc.

  1. Extensive Local Gene Duplication and Functional Divergence among Paralogs in Atlantic Salmon

    PubMed Central

    Warren, Ian A.; Ciborowski, Kate L.; Casadei, Elisa; Hazlerigg, David G.; Martin, Sam; Jordan, William C.; Sumner, Seirian

    2014-01-01

    Many organisms can generate alternative phenotypes from the same genome, enabling individuals to exploit diverse and variable environments. A prevailing hypothesis is that such adaptation has been favored by gene duplication events, which generate redundant genomic material that may evolve divergent functions. Vertebrate examples of recent whole-genome duplications are sparse although one example is the salmonids, which have undergone a whole-genome duplication event within the last 100 Myr. The life-cycle of the Atlantic salmon, Salmo salar, depends on the ability to produce alternating phenotypes from the same genome, to facilitate migration and maintain its anadromous life history. Here, we investigate the hypothesis that genome-wide and local gene duplication events have contributed to the salmonid adaptation. We used high-throughput sequencing to characterize the transcriptomes of three key organs involved in regulating migration in S. salar: Brain, pituitary, and olfactory epithelium. We identified over 10,000 undescribed S. salar sequences and designed an analytic workflow to distinguish between paralogs originating from local gene duplication events or from whole-genome duplication events. These data reveal that substantial local gene duplications took place shortly after the whole-genome duplication event. Many of the identified paralog pairs have either diverged in function or become noncoding. Future functional genomics studies will reveal to what extent this rich source of divergence in genetic sequence is likely to have facilitated the evolution of extreme phenotypic plasticity required for an anadromous life-cycle. PMID:24951567

  2. Employing genome-wide SNP discovery and genotyping strategy to extrapolate the natural allelic diversity and domestication patterns in chickpea

    PubMed Central

    Kujur, Alice; Bajaj, Deepak; Upadhyaya, Hari D.; Das, Shouvik; Ranjan, Rajeev; Shree, Tanima; Saxena, Maneesha S.; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. L.; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.

    2015-01-01

    The genome-wide discovery and high-throughput genotyping of SNPs in chickpea natural germplasm lines is indispensable to extrapolate their natural allelic diversity, domestication, and linkage disequilibrium (LD) patterns leading to the genetic enhancement of this vital legume crop. We discovered 44,844 high-quality SNPs by sequencing of 93 diverse cultivated desi, kabuli, and wild chickpea accessions using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays that were physically mapped across eight chromosomes of desi and kabuli. Of these, 22,542 SNPs were structurally annotated in different coding and non-coding sequence components of genes. Genes with 3296 non-synonymous and 269 regulatory SNPs could functionally differentiate accessions based on their contrasting agronomic traits. A high experimental validation success rate (92%) and reproducibility (100%) along with strong sensitivity (93–96%) and specificity (99%) of GBS-based SNPs was observed. This infers the robustness of GBS as a high-throughput assay for rapid large-scale mining and genotyping of genome-wide SNPs in chickpea with sub-optimal use of resources. With 23,798 genome-wide SNPs, a relatively high intra-specific polymorphic potential (49.5%) and broader molecular diversity (13–89%)/functional allelic diversity (18–77%) was apparent among 93 chickpea accessions, suggesting their tremendous applicability in rapid selection of desirable diverse accessions/inter-specific hybrids in chickpea crossbred varietal improvement program. The genome-wide SNPs revealed complex admixed domestication pattern, extensive LD estimates (0.54–0.68) and extended LD decay (400–500 kb) in a structured population inclusive of 93 accessions. These findings reflect the utility of our identified SNPs for subsequent genome-wide association study (GWAS) and selective sweep-based domestication trait dissection analysis to identify potential genomic loci (gene-associated targets) specifically regulating important complex quantitative agronomic traits in chickpea. The numerous informative genome-wide SNPs, natural allelic diversity-led domestication pattern, and LD-based information generated in our study have got multidimensional applicability with respect to chickpea genomics-assisted breeding. PMID:25873920

  3. Dana-Farber Cancer Institute: Identification of Therapeutic Targets in KRAS Driven Lung Cancer | Office of Cancer Genomics

    Cancer.gov

    The CTD2 Center at Dana Farber Cancer Institute focuses on the use of high-throughput genetic and bioinformatic approaches to identify and credential oncogenes and co-dependencies in cancers. This Center aims to provide the cancer research community with information that will facilitate the prioritization of targets based on both genomic and functional evidence, inform the most appropriate genetic context for downstream mechanistic and validation studies, and enable the translation of this information into therapeutics and diagnostics.

  4. Systems genetics: a paradigm to improve discovery of candidate genes and mechanisms underlying complex traits.

    PubMed

    Feltus, F Alex

    2014-06-01

    Understanding the control of any trait optimally requires the detection of causal genes, gene interaction, and mechanism of action to discover and model the biochemical pathways underlying the expressed phenotype. Functional genomics techniques, including RNA expression profiling via microarray and high-throughput DNA sequencing, allow for the precise genome localization of biological information. Powerful genetic approaches, including quantitative trait locus (QTL) and genome-wide association study mapping, link phenotype with genome positions, yet genetics is less precise in localizing the relevant mechanistic information encoded in DNA. The coupling of salient functional genomic signals with genetically mapped positions is an appealing approach to discover meaningful gene-phenotype relationships. Techniques used to define this genetic-genomic convergence comprise the field of systems genetics. This short review will address an application of systems genetics where RNA profiles are associated with genetically mapped genome positions of individual genes (eQTL mapping) or as gene sets (co-expression network modules). Both approaches can be applied for knowledge independent selection of candidate genes (and possible control mechanisms) underlying complex traits where multiple, likely unlinked, genomic regions might control specific complex traits. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  5. Evaluation of Methods for de novo Genome assembly from High-throughput Sequencing Reads Reveals Dependencies that Affect the Quality of the Results

    USDA-ARS?s Scientific Manuscript database

    Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole...

  6. NCBI GEO: archive for functional genomics data sets—update

    PubMed Central

    Barrett, Tanya; Wilhite, Stephen E.; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F.; Tomashevsky, Maxim; Marshall, Kimberly A.; Phillippy, Katherine H.; Sherman, Patti M.; Holko, Michelle; Yefanov, Andrey; Lee, Hyeseung; Zhang, Naigong; Robertson, Cynthia L.; Serova, Nadezhda; Davis, Sean; Soboleva, Alexandra

    2013-01-01

    The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data. PMID:23193258

  7. Host gene targets for novel influenza therapies elucidated by high-throughput RNA interference screens

    PubMed Central

    Meliopoulos, Victoria A.; Andersen, Lauren E.; Birrer, Katherine F.; Simpson, Kaylene J.; Lowenthal, John W.; Bean, Andrew G. D.; Stambas, John; Stewart, Cameron R.; Tompkins, S. Mark; van Beusechem, Victor W.; Fraser, Iain; Mhlanga, Musa; Barichievy, Samantha; Smith, Queta; Leake, Devin; Karpilow, Jon; Buck, Amy; Jona, Ghil; Tripp, Ralph A.

    2012-01-01

    Influenza virus encodes only 11 viral proteins but replicates in a broad range of avian and mammalian species by exploiting host cell functions. Genome-wide RNA interference (RNAi) has proven to be a powerful tool for identifying the host molecules that participate in each step of virus replication. Meta-analysis of findings from genome-wide RNAi screens has shown influenza virus to be dependent on functional nodes in host cell pathways, requiring a wide variety of molecules and cellular proteins for replication. Because rapid evolution of the influenza A viruses persistently complicates the effectiveness of vaccines and therapeutics, a further understanding of the complex host cell pathways coopted by influenza virus for replication may provide new targets and strategies for antiviral therapy. RNAi genome screening technologies together with bioinformatics can provide the ability to rapidly identify specific host factors involved in resistance and susceptibility to influenza virus, allowing for novel disease intervention strategies.—Meliopoulos, V. A., Andersen, L. E., Birrer, K. F., Simpson, K. J., Lowenthal, J. W., Bean, A. G. D., Stambas, J., Stewart, C. R., Tompkins, S. M., van Beusechem, V. W., Fraser, I., Mhlanga, M., Barichievy, S., Smith, Q., Leake, D., Karpilow, J., Buck, A., Jona, G., Tripp, R. A. Host gene targets for novel influenza therapies elucidated by high-throughput RNA interference screens. PMID:22247330

  8. Transcriptionally active PCR for antigen identification and vaccine development: in vitro genome-wide screening and in vivo immunogenicity

    PubMed Central

    Regis, David P.; Dobaño, Carlota; Quiñones-Olson, Paola; Liang, Xiaowu; Graber, Norma L.; Stefaniak, Maureen E.; Campo, Joseph J.; Carucci, Daniel J.; Roth, David A.; He, Huaping; Felgner, Philip L.; Doolan, Denise L.

    2009-01-01

    We have evaluated a technology called Transcriptionally Active PCR (TAP) for high throughput identification and prioritization of novel target antigens from genomic sequence data using the Plasmodium parasite, the causative agent of malaria, as a model. First, we adapted the TAP technology for the highly AT-rich Plasmodium genome, using well-characterized P. falciparum and P. yoelii antigens and a small panel of uncharacterized open reading frames from the P. falciparum genome sequence database. We demonstrated that TAP fragments encoding six well-characterized P. falciparum antigens and five well-characterized P. yoelii antigens could be amplified in an equivalent manner from both plasmid DNA and genomic DNA templates, and that uncharacterized open reading frames could also be amplified from genomic DNA template. Second, we showed that the in vitro expression of the TAP fragments was equivalent or superior to that of supercoiled plasmid DNA encoding the same antigen. Third, we evaluated the in vivo immunogenicity of TAP fragments encoding a subset of the model P. falciparum and P. yoelii antigens. We found that antigen-specific antibody and cellular immune responses induced by the TAP fragments in mice were equivalent or superior to those induced by the corresponding plasmid DNA vaccines. Finally, we developed and demonstrated proof-of-principle for an in vitro humoral immunoscreening assay for down-selection of novel target antigens. These data support the potential of a TAP approach for rapid high throughput functional screening and identification of potential candidate vaccine antigens from genomic sequence data. PMID:18164079

  9. Transcriptionally active PCR for antigen identification and vaccine development: in vitro genome-wide screening and in vivo immunogenicity.

    PubMed

    Regis, David P; Dobaño, Carlota; Quiñones-Olson, Paola; Liang, Xiaowu; Graber, Norma L; Stefaniak, Maureen E; Campo, Joseph J; Carucci, Daniel J; Roth, David A; He, Huaping; Felgner, Philip L; Doolan, Denise L

    2008-03-01

    We have evaluated a technology called transcriptionally active PCR (TAP) for high throughput identification and prioritization of novel target antigens from genomic sequence data using the Plasmodium parasite, the causative agent of malaria, as a model. First, we adapted the TAP technology for the highly AT-rich Plasmodium genome, using well-characterized P. falciparum and P. yoelii antigens and a small panel of uncharacterized open reading frames from the P. falciparum genome sequence database. We demonstrated that TAP fragments encoding six well-characterized P. falciparum antigens and five well-characterized P. yoelii antigens could be amplified in an equivalent manner from both plasmid DNA and genomic DNA templates, and that uncharacterized open reading frames could also be amplified from genomic DNA template. Second, we showed that the in vitro expression of the TAP fragments was equivalent or superior to that of supercoiled plasmid DNA encoding the same antigen. Third, we evaluated the in vivo immunogenicity of TAP fragments encoding a subset of the model P. falciparum and P. yoelii antigens. We found that antigen-specific antibody and cellular immune responses induced by the TAP fragments in mice were equivalent or superior to those induced by the corresponding plasmid DNA vaccines. Finally, we developed and demonstrated proof-of-principle for an in vitro humoral immunoscreening assay for down-selection of novel target antigens. These data support the potential of a TAP approach for rapid high throughput functional screening and identification of potential candidate vaccine antigens from genomic sequence data.

  10. High-Throughput Silencing Using the CRISPR-Cas9 System: A Review of the Benefits and Challenges.

    PubMed

    Wade, Mark

    2015-09-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system has been seized upon with a fervor enjoyed previously by small interfering RNA (siRNA) and short hairpin RNA (shRNA) technologies and has enormous potential for high-throughput functional genomics studies. The decision to use this approach must be balanced with respect to adoption of existing platforms versus awaiting the development of more "mature" next-generation systems. Here, experience from siRNA and shRNA screening plays an important role, as issues such as targeting efficiency, pooling strategies, and off-target effects with those technologies are already framing debates in the CRISPR field. CRISPR/Cas can be exploited not only to knockout genes but also to up- or down-regulate gene transcription-in some cases in a multiplex fashion. This provides a powerful tool for studying the interaction among multiple signaling cascades in the same genetic background. Furthermore, the documented success of CRISPR/Cas-mediated gene correction (or the corollary, introduction of disease-specific mutations) provides proof of concept for the rapid generation of isogenic cell lines for high-throughput screening. In this review, the advantages and limitations of CRISPR/Cas are discussed and current and future applications are highlighted. It is envisaged that complementarities between CRISPR, siRNA, and shRNA will ensure that all three technologies remain critical to the success of future functional genomics projects. © 2015 Society for Laboratory Automation and Screening.

  11. Genomics and transcriptomics in drug discovery.

    PubMed

    Dopazo, Joaquin

    2014-02-01

    The popularization of genomic high-throughput technologies is causing a revolution in biomedical research and, particularly, is transforming the field of drug discovery. Systems biology offers a framework to understand the extensive human genetic heterogeneity revealed by genomic sequencing in the context of the network of functional, regulatory and physical protein-drug interactions. Thus, approaches to find biomarkers and therapeutic targets will have to take into account the complex system nature of the relationships of the proteins with the disease. Pharmaceutical companies will have to reorient their drug discovery strategies considering the human genetic heterogeneity. Consequently, modeling and computational data analysis will have an increasingly important role in drug discovery. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Compartmental genomics in living cells revealed by single-cell nanobiopsy.

    PubMed

    Actis, Paolo; Maalouf, Michelle M; Kim, Hyunsung John; Lohith, Akshar; Vilozny, Boaz; Seger, R Adam; Pourmand, Nader

    2014-01-28

    The ability to study the molecular biology of living single cells in heterogeneous cell populations is essential for next generation analysis of cellular circuitry and function. Here, we developed a single-cell nanobiopsy platform based on scanning ion conductance microscopy (SICM) for continuous sampling of intracellular content from individual cells. The nanobiopsy platform uses electrowetting within a nanopipette to extract cellular material from living cells with minimal disruption of the cellular milieu. We demonstrate the subcellular resolution of the nanobiopsy platform by isolating small subpopulations of mitochondria from single living cells, and quantify mutant mitochondrial genomes in those single cells with high throughput sequencing technology. These findings may provide the foundation for dynamic subcellular genomic analysis.

  13. Evaluation of Sequencing Approaches for High-Throughput Transcriptomics - (BOSC)

    EPA Science Inventory

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. The generation of high-throughput global gene expression...

  14. CRISPR-Cas9 for medical genetic screens: applications and future perspectives.

    PubMed

    Xue, Hui-Ying; Ji, Li-Juan; Gao, Ai-Mei; Liu, Ping; He, Jing-Dong; Lu, Xiao-Jie

    2016-02-01

    CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats-CRISPR associated nuclease 9) systems have emerged as versatile and convenient (epi)genome editing tools and have become an important player in medical genetic research. CRISPR-Cas9 and its variants such as catalytically inactivated Cas9 (dead Cas9, dCas9) and scaffold-incorporating single guide sgRNA (scRNA) have been applied in various genomic screen studies. CRISPR screens enable high-throughput interrogation of gene functions in health and diseases. Compared with conventional RNAi screens, CRISPR screens incur less off-target effects and are more versatile in that they can be used in multiple formats such as knockout, knockdown and activation screens, and can target coding and non-coding regions throughout the genome. This powerful screen platform holds the potential of revolutionising functional genomic studies in the near future. Herein, we introduce the mechanisms of (epi)genome editing mediated by CRISPR-Cas9 and its variants, introduce the procedures and applications of CRISPR screen in functional genomics, compare it with conventional screen tools and at last discuss current challenges and opportunities and propose future directions. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  15. Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms.

    PubMed

    Yamamoto, Toshio; Nagasaki, Hideki; Yonemaru, Jun-ichi; Ebana, Kaworu; Nakajima, Maiko; Shibaya, Taeko; Yano, Masahiro

    2010-04-27

    To create useful gene combinations in crop breeding, it is necessary to clarify the dynamics of the genome composition created by breeding practices. A large quantity of single-nucleotide polymorphism (SNP) data is required to permit discrimination of chromosome segments among modern cultivars, which are genetically related. Here, we used a high-throughput sequencer to conduct whole-genome sequencing of an elite Japanese rice cultivar, Koshihikari, which is closely related to Nipponbare, whose genome sequencing has been completed. Then we designed a high-throughput typing array based on the SNP information by comparison of the two sequences. Finally, we applied this array to analyze historical representative rice cultivars to understand the dynamics of their genome composition. The total 5.89-Gb sequence for Koshihikari, equivalent to 15.7 x the entire rice genome, was mapped using the Pseudomolecules 4.0 database for Nipponbare. The resultant Koshihikari genome sequence corresponded to 80.1% of the Nipponbare sequence and led to the identification of 67,051 SNPs. A high-throughput typing array consisting of 1917 SNP sites distributed throughout the genome was designed to genotype 151 representative Japanese cultivars that have been grown during the past 150 years. We could identify the ancestral origin of the pedigree haplotypes in 60.9% of the Koshihikari genome and 18 consensus haplotype blocks which are inherited from traditional landraces to current improved varieties. Moreover, it was predicted that modern breeding practices have generally decreased genetic diversity Detection of genome-wide SNPs by both high-throughput sequencer and typing array made it possible to evaluate genomic composition of genetically related rice varieties. With the aid of their pedigree information, we clarified the dynamics of chromosome recombination during the historical rice breeding process. We also found several genomic regions decreasing genetic diversity which might be caused by a recent human selection in rice breeding. The definition of pedigree haplotypes by means of genome-wide SNPs will facilitate next-generation breeding of rice and other crops.

  16. YeATS- a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut

    USDA-ARS?s Scientific Manuscript database

    The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves exist...

  17. Genome-derived vaccines.

    PubMed

    De Groot, Anne S; Rappuoli, Rino

    2004-02-01

    Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.

  18. Defect Genome of Cubic Perovskites for Fuel Cell Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balachandran, Janakiraman; Lin, Lianshan; Anchell, Jonathan S.

    Heterogeneities such as point defects, inherent to material systems, can profoundly influence material functionalities critical for numerous energy applications. This influence in principle can be identified and quantified through development of large defect data sets which we call the defect genome, employing high-throughput ab initio calculations. However, high-throughput screening of material models with point defects dramatically increases the computational complexity and chemical search space, creating major impediments toward developing a defect genome. In this paper, we overcome these impediments by employing computationally tractable ab initio models driven by highly scalable workflows, to study formation and interaction of various point defectsmore » (e.g., O vacancies, H interstitials, and Y substitutional dopant), in over 80 cubic perovskites, for potential proton-conducting ceramic fuel cell (PCFC) applications. The resulting defect data sets identify several promising perovskite compounds that can exhibit high proton conductivity. Furthermore, the data sets also enable us to identify and explain, insightful and novel correlations among defect energies, material identities, and defect-induced local structural distortions. Finally, such defect data sets and resultant correlations are necessary to build statistical machine learning models, which are required to accelerate discovery of new materials.« less

  19. Defect Genome of Cubic Perovskites for Fuel Cell Applications

    DOE PAGES

    Balachandran, Janakiraman; Lin, Lianshan; Anchell, Jonathan S.; ...

    2017-10-10

    Heterogeneities such as point defects, inherent to material systems, can profoundly influence material functionalities critical for numerous energy applications. This influence in principle can be identified and quantified through development of large defect data sets which we call the defect genome, employing high-throughput ab initio calculations. However, high-throughput screening of material models with point defects dramatically increases the computational complexity and chemical search space, creating major impediments toward developing a defect genome. In this paper, we overcome these impediments by employing computationally tractable ab initio models driven by highly scalable workflows, to study formation and interaction of various point defectsmore » (e.g., O vacancies, H interstitials, and Y substitutional dopant), in over 80 cubic perovskites, for potential proton-conducting ceramic fuel cell (PCFC) applications. The resulting defect data sets identify several promising perovskite compounds that can exhibit high proton conductivity. Furthermore, the data sets also enable us to identify and explain, insightful and novel correlations among defect energies, material identities, and defect-induced local structural distortions. Finally, such defect data sets and resultant correlations are necessary to build statistical machine learning models, which are required to accelerate discovery of new materials.« less

  20. High-Throughput Cryopreservation of Plant Cell Cultures for Functional Genomics

    PubMed Central

    Ogawa, Yoichi; Sakurai, Nozomu; Oikawa, Akira; Kai, Kosuke; Morishita, Yoshihiko; Mori, Kumiko; Moriya, Kanami; Fujii, Fumiko; Aoki, Koh; Suzuki, Hideyuki; Ohta, Daisaku; Saito, Kazuki; Shibata, Daisuke

    2012-01-01

    Suspension-cultured cell lines from plant species are useful for genetic engineering. However, maintenance of these lines is laborious, involves routine subculturing and hampers wider use of transgenic lines, especially when many lines are required for a high-throughput functional genomics application. Cryopreservation of these lines may reduce the need for subculturing. Here, we established a simple protocol for cryopreservation of cell lines from five commonly used plant species, Arabidopsis thaliana, Daucus carota, Lotus japonicus, Nicotiana tabacum and Oryza sativa. The LSP solution (2 M glycerol, 0.4 M sucrose and 86.9 mM proline) protected cells from damage during freezing and was only mildly toxic to cells kept at room temperature for at least 2 h. More than 100 samples were processed for freezing simultaneously. Initially, we determined the conditions for cryopreservation using a programmable freezer; we then developed a modified simple protocol that did not require a programmable freezer. In the simple protocol, a thick expanded polystyrene (EPS) container containing the vials with the cell–LSP solution mixtures was kept at −30°C for 6 h to cool the cells slowly (pre-freezing); samples from the EPS containers were then plunged into liquid nitrogen before long-term storage. Transgenic Arabidopsis cells were subjected to cryopreservation, thawed and then re-grown in culture; transcriptome and metabolome analyses indicated that there was no significant difference in gene expression or metabolism between cryopreserved cells and control cells. The simplicity of the protocol will accelerate the pace of research in functional plant genomics. PMID:22437846

  1. KEGG Bioinformatics Resource for Plant Genomics and Metabolomics.

    PubMed

    Kanehisa, Minoru

    2016-01-01

    In the era of high-throughput biology it is necessary to develop not only elaborate computational methods but also well-curated databases that can be used as reference for data interpretation. KEGG ( http://www.kegg.jp/ ) is such a reference knowledge base with two specific aims. One is to compile knowledge on high-level functions of the cell and the organism in terms of the molecular interaction and reaction networks, which is implemented in KEGG pathway maps, BRITE functional hierarchies, and KEGG modules. The other is to expand knowledge on genes and proteins involved in the molecular networks from experimentally observed organisms to other organisms using the concept of orthologs, which is implemented in the KEGG Orthology (KO) system. Thus, KEGG is a generic resource applicable to all organisms and enables interpretation of high-level functions from genomic and molecular data. Here we first present a brief overview of the entire KEGG resource, and then give an introduction of how to use KEGG in plant genomics and metabolomics research.

  2. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing.

    PubMed

    Hu, Jiazhi; Meyers, Robin M; Dong, Junchao; Panchakshari, Rohit A; Alt, Frederick W; Frock, Richard L

    2016-05-01

    Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.

  3. RNA regulatory networks in animals and plants: a long noncoding RNA perspective.

    PubMed

    Bai, Youhuang; Dai, Xiaozhuan; Harrison, Andrew P; Chen, Ming

    2015-03-01

    A recent highlight of genomics research has been the discovery of many families of transcripts which have function but do not code for proteins. An important group is long noncoding RNAs (lncRNAs), which are typically longer than 200 nt, and whose members originate from thousands of loci across genomes. We review progress in understanding the biogenesis and regulatory mechanisms of lncRNAs. We describe diverse computational and high throughput technologies for identifying and studying lncRNAs. We discuss the current knowledge of functional elements embedded in lncRNAs as well as insights into the lncRNA-based regulatory network in animals. We also describe genome-wide studies of large amount of lncRNAs in plants, as well as knowledge of selected plant lncRNAs with a focus on biotic/abiotic stress-responsive lncRNAs. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  4. A High-Throughput Arabidopsis Reverse Genetics System

    PubMed Central

    Sessions, Allen; Burke, Ellen; Presting, Gernot; Aux, George; McElver, John; Patton, David; Dietrich, Bob; Ho, Patrick; Bacwaden, Johana; Ko, Cynthia; Clarke, Joseph D.; Cotton, David; Bullis, David; Snell, Jennifer; Miguel, Trini; Hutchison, Don; Kimmerly, Bill; Mitzel, Theresa; Katagiri, Fumiaki; Glazebrook, Jane; Law, Marc; Goff, Stephen A.

    2002-01-01

    A collection of Arabidopsis lines with T-DNA insertions in known sites was generated to increase the efficiency of functional genomics. A high-throughput modified thermal asymetric interlaced (TAIL)-PCR protocol was developed and used to amplify DNA fragments flanking the T-DNA left borders from ∼100,000 transformed lines. A total of 85,108 TAIL-PCR products from 52,964 T-DNA lines were sequenced and compared with the Arabidopsis genome to determine the positions of T-DNAs in each line. Predicted T-DNA insertion sites, when mapped, showed a bias against predicted coding sequences. Predicted insertion mutations in genes of interest can be identified using Arabidopsis Gene Index name searches or by BLAST (Basic Local Alignment Search Tool) search. Insertions can be confirmed by simple PCR assays on individual lines. Predicted insertions were confirmed in 257 of 340 lines tested (76%). This resource has been named SAIL (Syngenta Arabidopsis Insertion Library) and is available to the scientific community at www.tmri.org. PMID:12468722

  5. The Gene Expression Omnibus database

    PubMed Central

    Clough, Emily; Barrett, Tanya

    2016-01-01

    The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome–protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/. PMID:27008011

  6. Genome measures used for quality control are dependent on gene function and ancestry.

    PubMed

    Wang, Jing; Raskin, Leon; Samuels, David C; Shyr, Yu; Guo, Yan

    2015-02-01

    The transition/transversion (Ti/Tv) ratio and heterozygous/nonreference-homozygous (het/nonref-hom) ratio have been commonly computed in genetic studies as a quality control (QC) measurement. Additionally, these two ratios are helpful in our understanding of the patterns of DNA sequence evolution. To thoroughly understand these two genomic measures, we performed a study using 1000 Genomes Project (1000G) released genotype data (N=1092). An additional two datasets (N=581 and N=6) were used to validate our findings from the 1000G dataset. We compared the two ratios among continental ancestry, genome regions and gene functionality. We found that the Ti/Tv ratio can be used as a quality indicator for single nucleotide polymorphisms inferred from high-throughput sequencing data. The Ti/Tv ratio varies greatly by genome region and functionality, but not by ancestry. The het/nonref-hom ratio varies greatly by ancestry, but not by genome regions and functionality. Furthermore, extreme guanine + cytosine content (either high or low) is negatively associated with the Ti/Tv ratio magnitude. Thus, when performing QC assessment using these two measures, care must be taken to apply the correct thresholds based on ancestry and genome region. Failure to take these considerations into account at the QC stage will bias any following analysis. yan.guo@vanderbilt.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. The detailed 3D multi-loop aggregate/rosette chromatin architecture and functional dynamic organization of the human and mouse genomes.

    PubMed

    Knoch, Tobias A; Wachsmuth, Malte; Kepper, Nick; Lesnussa, Michael; Abuseiris, Anis; Ali Imam, A M; Kolovos, Petros; Zuin, Jessica; Kockx, Christel E M; Brouwer, Rutger W W; van de Werken, Harmen J G; van IJcken, Wilfred F J; Wendt, Kerstin S; Grosveld, Frank G

    2016-01-01

    The dynamic three-dimensional chromatin architecture of genomes and its co-evolutionary connection to its function-the storage, expression, and replication of genetic information-is still one of the central issues in biology. Here, we describe the much debated 3D architecture of the human and mouse genomes from the nucleosomal to the megabase pair level by a novel approach combining selective high-throughput high-resolution chromosomal interaction capture ( T2C ), polymer simulations, and scaling analysis of the 3D architecture and the DNA sequence. The genome is compacted into a chromatin quasi-fibre with ~5 ± 1 nucleosomes/11 nm, folded into stable ~30-100 kbp loops forming stable loop aggregates/rosettes connected by similar sized linkers. Minor but significant variations in the architecture are seen between cell types and functional states. The architecture and the DNA sequence show very similar fine-structured multi-scaling behaviour confirming their co-evolution and the above. This architecture, its dynamics, and accessibility, balance stability and flexibility ensuring genome integrity and variation enabling gene expression/regulation by self-organization of (in)active units already in proximity. Our results agree with the heuristics of the field and allow "architectural sequencing" at a genome mechanics level to understand the inseparable systems genomic properties.

  8. Insights into structural variations and genome rearrangements in prokaryotic genomes.

    PubMed

    Periwal, Vinita; Scaria, Vinod

    2015-01-01

    Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation.

    PubMed

    McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D; Rodionov, Dmitry A; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick

    2007-01-01

    The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.

  10. Functional genomic analysis of drug sensitivity pathways to guide adjuvant strategies in breast cancer

    PubMed Central

    Swanton, Charles; Szallasi, Zoltan; Brenton, James D; Downward, Julian

    2008-01-01

    The widespread introduction of high throughput RNA interference screening technology has revealed tumour drug sensitivity pathways to common cytotoxics such as paclitaxel, doxorubicin and 5-fluorouracil, targeted agents such as trastuzumab and inhibitors of AKT and Poly(ADP-ribose) polymerase (PARP) as well as endocrine therapies such as tamoxifen. Given the limited power of microarray signatures to predict therapeutic response in associative studies of small clinical trial cohorts, the use of functional genomic data combined with expression or sequence analysis of genes and microRNAs implicated in drug response in human tumours may provide a more robust method to guide adjuvant treatment strategies in breast cancer that are transferable across different expression platforms and patient cohorts. PMID:18986507

  11. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    PubMed

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  12. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome.

    PubMed

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-06-15

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development.

  13. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome

    PubMed Central

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-01-01

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development. PMID:22508961

  14. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource

    PubMed Central

    Seaver, Samuel M. D.; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M. T.; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D.; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D.; Henry, Christopher S.

    2014-01-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed. PMID:24927599

  15. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource.

    PubMed

    Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S

    2014-07-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.

  16. DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.

    PubMed

    Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

    2013-08-01

    High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.

  17. DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

    PubMed Central

    Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

    2013-01-01

    High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089

  18. Life in the fast lane for protein crystallization and X-ray crystallography

    NASA Technical Reports Server (NTRS)

    Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.

    2005-01-01

    The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high-rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today's high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).

  19. Life in the Fast Lane for Protein Crystallization and X-Ray Crystallography

    NASA Technical Reports Server (NTRS)

    Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.

    2004-01-01

    The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today s high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).

  20. A Primer on High-Throughput Computing for Genomic Selection

    PubMed Central

    Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel

    2011-01-01

    High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303

  1. Expanded microbial genome coverage and improved protein family annotation in the COG database

    PubMed Central

    Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. PMID:25428365

  2. [Modes of action of agrochemicals against plant pathogenic organisms].

    PubMed

    Leroux, Pierre

    2003-01-01

    The chemical control of plant pathogens concerns mainly fungal diseases of crops. Most of the available fungicides act directly on essential fungal functions such as respiration, sterol biosynthesis or cell division. Consequently, these compounds can exhibit undesirable toxicological and environmental effects and sometimes select fungal resistant strains. Plant activators are expected to provide sustainable disease management in several crops because the development of resistance is not expected. Considering the future, the discovery of novel antifungal molecules will reap advantage from throughput screening methodologies and functional genomics.

  3. Global Organization of a Positive-strand RNA Virus Genome

    PubMed Central

    Wu, Baodong; Grigull, Jörg; Ore, Moriam O.; Morin, Sylvie; White, K. Andrew

    2013-01-01

    The genomes of plus-strand RNA viruses contain many regulatory sequences and structures that direct different viral processes. The traditional view of these RNA elements are as local structures present in non-coding regions. However, this view is changing due to the discovery of regulatory elements in coding regions and functional long-range intra-genomic base pairing interactions. The ∼4.8 kb long RNA genome of the tombusvirus tomato bushy stunt virus (TBSV) contains these types of structural features, including six different functional long-distance interactions. We hypothesized that to achieve these multiple interactions this viral genome must utilize a large-scale organizational strategy and, accordingly, we sought to assess the global conformation of the entire TBSV genome. Atomic force micrographs of the genome indicated a mostly condensed structure composed of interconnected protrusions extending from a central hub. This configuration was consistent with the genomic secondary structure model generated using high-throughput selective 2′-hydroxyl acylation analysed by primer extension (i.e. SHAPE), which predicted different sized RNA domains originating from a central region. Known RNA elements were identified in both domain and inter-domain regions, and novel structural features were predicted and functionally confirmed. Interestingly, only two of the six long-range interactions known to form were present in the structural model. However, for those interactions that did not form, complementary partner sequences were positioned relatively close to each other in the structure, suggesting that the secondary structure level of viral genome structure could provide a basic scaffold for the formation of different long-range interactions. The higher-order structural model for the TBSV RNA genome provides a snapshot of the complex framework that allows multiple functional components to operate in concert within a confined context. PMID:23717202

  4. The Proteome Folding Project: Proteome-scale prediction of structure and function

    PubMed Central

    Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard

    2011-01-01

    The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions. PMID:21824995

  5. Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.

    PubMed

    Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R

    2017-07-01

    The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.

  6. High throughput SNP discovery and genotyping in hexaploid wheat.

    PubMed

    Rimbert, Hélène; Darrier, Benoît; Navarro, Julien; Kitt, Jonathan; Choulet, Frédéric; Leveugle, Magalie; Duarte, Jorge; Rivière, Nathalie; Eversole, Kellye; Le Gouis, Jacques; Davassi, Alessandro; Balfourier, François; Le Paslier, Marie-Christine; Berard, Aurélie; Brunel, Dominique; Feuillet, Catherine; Poncet, Charles; Sourdille, Pierre; Paux, Etienne

    2018-01-01

    Because of their abundance and their amenability to high-throughput genotyping techniques, Single Nucleotide Polymorphisms (SNPs) are powerful tools for efficient genetics and genomics studies, including characterization of genetic resources, genome-wide association studies and genomic selection. In wheat, most of the previous SNP discovery initiatives targeted the coding fraction, leaving almost 98% of the wheat genome largely unexploited. Here we report on the use of whole-genome resequencing data from eight wheat lines to mine for SNPs in the genic, the repetitive and non-repetitive intergenic fractions of the wheat genome. Eventually, we identified 3.3 million SNPs, 49% being located on the B-genome, 41% on the A-genome and 10% on the D-genome. We also describe the development of the TaBW280K high-throughput genotyping array containing 280,226 SNPs. Performance of this chip was examined by genotyping a set of 96 wheat accessions representing the worldwide diversity. Sixty-nine percent of the SNPs can be efficiently scored, half of them showing a diploid-like clustering. The TaBW280K was proven to be a very efficient tool for diversity analyses, as well as for breeding as it can discriminate between closely related elite varieties. Finally, the TaBW280K array was used to genotype a population derived from a cross between Chinese Spring and Renan, leading to the construction a dense genetic map comprising 83,721 markers. The results described here will provide the wheat community with powerful tools for both basic and applied research.

  7. Compartmental Genomics in Living Cells Revealed by Single-Cell Nanobiopsy

    PubMed Central

    Actis, Paolo; Maalouf, Michelle; Kim, Hyunsung John; Lohith, Akshar; Vilozny, Boaz; Seger, R. Adam; Pourmand, Nader

    2014-01-01

    The ability to study the molecular biology of living single cells in heterogeneous cell populations is essential for next generation analysis of cellular circuitry and function. Here, we developed a single-cell nanobiopsy platform based on scanning ion conductance microscopy (SICM) for continuous sampling of intracellular content from individual cells. The nanobiopsy platform uses electrowetting within a nanopipette to extract cellular material from living cells with minimal disruption of the cellular milieu. We demonstrate the subcellular resolution of the nanobiopsy platform by isolating small subpopulations of mitochondria from single living cells, and quantify mutant mitochondrial genomes in those single cells with high throughput sequencing technology. These findings may provide the foundation for dynamic subcellular genomic analysis. PMID:24279711

  8. Emory University: High-Throughput Protein-Protein Interaction Dataset for Lung Cancer-Associated Genes | Office of Cancer Genomics

    Cancer.gov

    To discover novel PPI signaling hubs for lung cancer, CTD2 Center at Emory utilized large-scale genomics datasets and literature to compile a set of lung cancer-associated genes. A library of expression vectors were generated for these genes and utilized for detecting pairwise PPIs with cell lysate-based TR-FRET assays in high-throughput screening format. Read the abstract.

  9. Flow cytometry sorting of nuclei enables the first global characterization of Paramecium germline DNA and transposable elements.

    PubMed

    Guérin, Frédéric; Arnaiz, Olivier; Boggetto, Nicole; Denby Wilkes, Cyril; Meyer, Eric; Sperling, Linda; Duharcourt, Sandra

    2017-04-26

    DNA elimination is developmentally programmed in a wide variety of eukaryotes, including unicellular ciliates, and leads to the generation of distinct germline and somatic genomes. The ciliate Paramecium tetraurelia harbors two types of nuclei with different functions and genome structures. The transcriptionally inactive micronucleus contains the complete germline genome, while the somatic macronucleus contains a reduced genome streamlined for gene expression. During development of the somatic macronucleus, the germline genome undergoes massive and reproducible DNA elimination events. Availability of both the somatic and germline genomes is essential to examine the genome changes that occur during programmed DNA elimination and ultimately decipher the mechanisms underlying the specific removal of germline-limited sequences. We developed a novel experimental approach that uses flow cell imaging and flow cytometry to sort subpopulations of nuclei to high purity. We sorted vegetative micronuclei and macronuclei during development of P. tetraurelia. We validated the method by flow cell imaging and by high throughput DNA sequencing. Our work establishes the proof of principle that developing somatic macronuclei can be sorted from a complex biological sample to high purity based on their size, shape and DNA content. This method enabled us to sequence, for the first time, the germline DNA from pure micronuclei and to identify novel transposable elements. Sequencing the germline DNA confirms that the Pgm domesticated transposase is required for the excision of all ~45,000 Internal Eliminated Sequences. Comparison of the germline DNA and unrearranged DNA obtained from PGM-silenced cells reveals that the latter does not provide a faithful representation of the germline genome. We developed a flow cytometry-based method to purify P. tetraurelia nuclei to high purity and provided quality control with flow cell imaging and high throughput DNA sequencing. We identified 61 germline transposable elements including the first Paramecium retrotransposons. This approach paves the way to sequence the germline genomes of P. aurelia sibling species for future comparative genomic studies.

  10. GeoChip 3.0: A High Throughput Tool for Analyzing Microbial Community, Composition, Structure, and Functional Activity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Zhili; Deng, Ye; Nostrand, Joy Van

    2010-05-17

    Microarray-based genomic technology has been widely used for microbial community analysis, and it is expected that microarray-based genomic technologies will revolutionize the analysis of microbial community structure, function and dynamics. A new generation of functional gene arrays (GeoChip 3.0) has been developed, with 27,812 probes covering 56,990 gene variants from 292 functional gene families involved in carbon, nitrogen, phosphorus and sulfur cycles, energy metabolism, antibiotic resistance, metal resistance, and organic contaminant degradation. Those probes were derived from 2,744, 140, and 262 species for bacteria, archaea, and fungi, respectively. GeoChip 3.0 has several other distinct features, such as a common oligomore » reference standard (CORS) for data normalization and comparison, a software package for data management and future updating, and the gyrB gene for phylogenetic analysis. Our computational evaluation of probe specificity indicated that all designed probes had a high specificity to their corresponding targets. Also, experimental analysis with synthesized oligonucleotides and genomic DNAs showed that only 0.0036percent-0.025percent false positive rates were observed, suggesting that the designed probes are highly specific under the experimental conditions examined. In addition, GeoChip 3.0 was applied to analyze soil microbial communities in a multifactor grassland ecosystem in Minnesota, USA, which demonstrated that the structure, composition, and potential activity of soil microbial communities significantly changed with the plant species diversity. All results indicate that GeoChip 3.0 is a high throughput powerful tool for studying microbial community functional structure, and linking microbial communities to ecosystem processes and functioning. To our knowledge, GeoChip 3.0 is the most comprehensive microarrays currently available for studying microbial communities associated with geobiochemical cycling, global climate change, bioenergy, agricuture, land use, ecosystem management, environmental cleanup and restoration, bioreactor systems, and human health.« less

  11. Preparation of Protein Samples for NMR Structure, Function, and Small Molecule Screening Studies

    PubMed Central

    Acton, Thomas B.; Xiao, Rong; Anderson, Stephen; Aramini, James; Buchwald, William A.; Ciccosanti, Colleen; Conover, Ken; Everett, John; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Kornhaber, Gregory; Lau, Jessica; Lee, Dong Yup; Liu, Gaohua; Maglaqui, Melissa; Ma, Lichung; Mao, Lei; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Shastry, Ritu; Swapna, G.V.T.; Tang, Yeufeng; Tong, Saichiu; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T.

    2014-01-01

    In this chapter, we concentrate on the production of high quality protein samples for NMR studies. In particular, we provide an in-depth description of recent advances in the production of NMR samples and their synergistic use with recent advancements in NMR hardware. We describe the protein production platform of the Northeast Structural Genomics Consortium, and outline our high-throughput strategies for producing high quality protein samples for nuclear magnetic resonance (NMR) studies. Our strategy is based on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems and isotope enrichment in minimal media. We describe 96-well ligation-independent cloning and analytical expression systems, parallel preparative scale fermentation, and high-throughput purification protocols. The 6X-His affinity tag allows for a similar two-step purification procedure implemented in a parallel high-throughput fashion that routinely results in purity levels sufficient for NMR studies (> 97% homogeneity). Using this platform, the protein open reading frames of over 17,500 different targeted proteins (or domains) have been cloned as over 28,000 constructs. Nearly 5,000 of these proteins have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html), resulting in more than 950 new protein structures, including more than 400 NMR structures, deposited in the Protein Data Bank. The Northeast Structural Genomics Consortium pipeline has been effective in producing protein samples of both prokaryotic and eukaryotic origin. Although this paper describes our entire pipeline for producing isotope-enriched protein samples, it focuses on the major updates introduced during the last 5 years (Phase 2 of the National Institute of General Medical Sciences Protein Structure Initiative). Our advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are suitable for implementation in a large individual laboratory or by a small group of collaborating investigators for structural biology, functional proteomics, ligand screening and structural genomics research. PMID:21371586

  12. Integrated genome browser: visual analytics platform for genomics.

    PubMed

    Freese, Nowlan H; Norris, David C; Loraine, Ann E

    2016-07-15

    Genome browsers that support fast navigation through vast datasets and provide interactive visual analytics functions can help scientists achieve deeper insight into biological systems. Toward this end, we developed Integrated Genome Browser (IGB), a highly configurable, interactive and fast open source desktop genome browser. Here we describe multiple updates to IGB, including all-new capabilities to display and interact with data from high-throughput sequencing experiments. To demonstrate, we describe example visualizations and analyses of datasets from RNA-Seq, ChIP-Seq and bisulfite sequencing experiments. Understanding results from genome-scale experiments requires viewing the data in the context of reference genome annotations and other related datasets. To facilitate this, we enhanced IGB's ability to consume data from diverse sources, including Galaxy, Distributed Annotation and IGB-specific Quickload servers. To support future visualization needs as new genome-scale assays enter wide use, we transformed the IGB codebase into a modular, extensible platform for developers to create and deploy all-new visualizations of genomic data. IGB is open source and is freely available from http://bioviz.org/igb aloraine@uncc.edu. © The Author 2016. Published by Oxford University Press.

  13. Assembly and diploid architecture of an individual human genome via single-molecule technologies

    PubMed Central

    Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali

    2015-01-01

    We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality. PMID:26121404

  14. Assembly and diploid architecture of an individual human genome via single-molecule technologies.

    PubMed

    Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali

    2015-08-01

    We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.

  15. Silicon Era of Carbon-Based Life: Application of Genomics and Bioinformatics in Crop Stress Research

    PubMed Central

    Li, Man-Wah; Qi, Xinpeng; Ni, Meng; Lam, Hon-Ming

    2013-01-01

    Abiotic and biotic stresses lead to massive reprogramming of different life processes and are the major limiting factors hampering crop productivity. Omics-based research platforms allow for a holistic and comprehensive survey on crop stress responses and hence may bring forth better crop improvement strategies. Since high-throughput approaches generate considerable amounts of data, bioinformatics tools will play an essential role in storing, retrieving, sharing, processing, and analyzing them. Genomic and functional genomic studies in crops still lag far behind similar studies in humans and other animals. In this review, we summarize some useful genomics and bioinformatics resources available to crop scientists. In addition, we also discuss the major challenges and advancements in the “-omics” studies, with an emphasis on their possible impacts on crop stress research and crop improvement. PMID:23759993

  16. Genome Engineering and Agriculture: Opportunities and Challenges.

    PubMed

    Baltes, Nicholas J; Gil-Humanes, Javier; Voytas, Daniel F

    2017-01-01

    In recent years, plant biotechnology has witnessed unprecedented technological change. Advances in high-throughput sequencing technologies have provided insight into the location and structure of functional elements within plant DNA. At the same time, improvements in genome engineering tools have enabled unprecedented control over genetic material. These technologies, combined with a growing understanding of plant systems biology, will irrevocably alter the way we create new crop varieties. As the first wave of genome-edited products emerge, we are just getting a glimpse of the immense opportunities the technology provides. We are also seeing its challenges and limitations. It is clear that genome editing will play an increased role in crop improvement and will help us to achieve food security in the coming decades; however, certain challenges and limitations must be overcome to realize the technology's full potential. © 2017 Elsevier Inc. All rights reserved.

  17. Advances in CRISPR-Cas9 genome engineering: lessons learned from RNA interference

    PubMed Central

    Barrangou, Rodolphe; Birmingham, Amanda; Wiemann, Stefan; Beijersbergen, Roderick L.; Hornung, Veit; Smith, Anja van Brabant

    2015-01-01

    The discovery that the machinery of the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 bacterial immune system can be re-purposed to easily create deletions, insertions and replacements in the mammalian genome has revolutionized the field of genome engineering and re-invigorated the field of gene therapy. Many parallels have been drawn between the newly discovered CRISPR-Cas9 system and the RNA interference (RNAi) pathway in terms of their utility for understanding and interrogating gene function in mammalian cells. Given this similarity, the CRISPR-Cas9 field stands to benefit immensely from lessons learned during the development of RNAi technology. We examine how the history of RNAi can inform today's challenges in CRISPR-Cas9 genome engineering such as efficiency, specificity, high-throughput screening and delivery for in vivo and therapeutic applications. PMID:25800748

  18. Bacterial CRISPR: Accomplishments and Prospects

    PubMed Central

    Peters, Jason M.; Silvis, Melanie R.; Zhao, Dehua; Hawkins, John S.; Gross, Carol A.; Qi, Lei S.

    2015-01-01

    In this review we briefly describe the development of CRISPR tools for genome editing and control of transcription in bacteria. We focus on the Type II CRISPR/Cas9 system, provide specific examples for use of the system, and highlight the advantages and disadvantages of CRISPR versus other techniques. We suggest potential strategies for combining CRISPR tools with high-throughput approaches to elucidate gene function in bacteria. PMID:26363124

  19. Genome-wide high-throughput SNP discovery and genotyping for understanding natural (functional) allelic diversity and domestication patterns in wild chickpea

    PubMed Central

    Bajaj, Deepak; Das, Shouvik; Badoni, Saurabh; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

    2015-01-01

    We identified 82489 high-quality genome-wide SNPs from 93 wild and cultivated Cicer accessions through integrated reference genome- and de novo-based GBS assays. High intra- and inter-specific polymorphic potential (66–85%) and broader natural allelic diversity (6–64%) detected by genome-wide SNPs among accessions signify their efficacy for monitoring introgression and transferring target trait-regulating genomic (gene) regions/allelic variants from wild to cultivated Cicer gene pools for genetic improvement. The population-specific assignment of wild Cicer accessions pertaining to the primary gene pool are more influenced by geographical origin/phenotypic characteristics than species/gene-pools of origination. The functional significance of allelic variants (non-synonymous and regulatory SNPs) scanned from transcription factors and stress-responsive genes in differentiating wild accessions (with potential known sources of yield-contributing and stress tolerance traits) from cultivated desi and kabuli accessions, fine-mapping/map-based cloning of QTLs and determination of LD patterns across wild and cultivated gene-pools are suitably elucidated. The correlation between phenotypic (agromorphological traits) and molecular diversity-based admixed domestication patterns within six structured populations of wild and cultivated accessions via genome-wide SNPs was apparent. This suggests utility of whole genome SNPs as a potential resource for identifying naturally selected trait-regulating genomic targets/functional allelic variants adaptive to diverse agroclimatic regions for genetic enhancement of cultivated gene-pools. PMID:26208313

  20. Mobile element biology – new possibilities with high-throughput sequencing

    PubMed Central

    Xing, Jinchuan; Witherspoon, David J.; Jorde, Lynn B.

    2014-01-01

    Mobile elements compose more than half of the human genome, but until recently their large-scale detection was time-consuming and challenging. With the development of new high-throughput sequencing technologies, the complete spectrum of mobile element variation in humans can now be identified and analyzed. Thousands of new mobile element insertions have been discovered, yielding new insights into mobile element biology, evolution, and genomic variation. We review several high-throughput methods, with an emphasis on techniques that specifically target mobile element insertions in humans, and we highlight recent applications of these methods in evolutionary studies and in the analysis of somatic alterations in human cancers. PMID:23312846

  1. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

  2. Molecular characterization of a novel rhabdovirus infecting blackcurrant identified by high-throughput sequencing.

    PubMed

    Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R

    2018-05-01

    A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.

  3. High throughput platforms for structural genomics of integral membrane proteins.

    PubMed

    Mancia, Filippo; Love, James

    2011-08-01

    Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.

  4. Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

    PubMed

    Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

    2017-01-01

    Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.

  5. Complete Genome Sequence of Treponema paraluiscuniculi, Strain Cuniculi A: The Loss of Infectivity to Humans Is Associated with Genome Decay

    PubMed Central

    Šmajs, David; Zobaníková, Marie; Strouhal, Michal; Čejková, Darina; Dugan-Rocha, Shannon; Pospíšilová, Petra; Norris, Steven J.; Albert, Tom; Qin, Xiang; Hallsworth-Pepin, Kym; Buhay, Christian; Muzny, Donna M.; Chen, Lei; Gibbs, Richard A.; Weinstock, George M.

    2011-01-01

    Treponema paraluiscuniculi is the causative agent of rabbit venereal spirochetosis. It is not infectious to humans, although its genome structure is very closely related to other pathogenic Treponema species including Treponema pallidum subspecies pallidum, the etiological agent of syphilis. In this study, the genome sequence of Treponema paraluiscuniculi, strain Cuniculi A, was determined by a combination of several high-throughput sequencing strategies. Whereas the overall size (1,133,390 bp), arrangement, and gene content of the Cuniculi A genome closely resembled those of the T. pallidum genome, the T. paraluiscuniculi genome contained a markedly higher number of pseudogenes and gene fragments (51). In addition to pseudogenes, 33 divergent genes were also found in the T. paraluiscuniculi genome. A set of 32 (out of 84) affected genes encoded proteins of known or predicted function in the Nichols genome. These proteins included virulence factors, gene regulators and components of DNA repair and recombination. The majority (52 or 61.9%) of the Cuniculi A pseudogenes and divergent genes were of unknown function. Our results indicate that T. paraluiscuniculi has evolved from a T. pallidum-like ancestor and adapted to a specialized host-associated niche (rabbits) during loss of infectivity to humans. The genes that are inactivated or altered in T. paraluiscuniculi are candidates for virulence factors important in the infectivity and pathogenesis of T. pallidum subspecies. PMID:21655244

  6. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Yu-Wei; Simmons, Blake A.; Singer, Steven W.

    The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning amore » single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.« less

  7. A genome-wide 3C-method for characterizing the three-dimensional architectures of genomes.

    PubMed

    Duan, Zhijun; Andronescu, Mirela; Schutz, Kevin; Lee, Choli; Shendure, Jay; Fields, Stanley; Noble, William S; Anthony Blau, C

    2012-11-01

    Accumulating evidence demonstrates that the three-dimensional (3D) organization of chromosomes within the eukaryotic nucleus reflects and influences genomic activities, including transcription, DNA replication, recombination and DNA repair. In order to uncover structure-function relationships, it is necessary first to understand the principles underlying the folding and the 3D arrangement of chromosomes. Chromosome conformation capture (3C) provides a powerful tool for detecting interactions within and between chromosomes. A high throughput derivative of 3C, chromosome conformation capture on chip (4C), executes a genome-wide interrogation of interaction partners for a given locus. We recently developed a new method, a derivative of 3C and 4C, which, similar to Hi-C, is capable of comprehensively identifying long-range chromosome interactions throughout a genome in an unbiased fashion. Hence, our method can be applied to decipher the 3D architectures of genomes. Here, we provide a detailed protocol for this method. Published by Elsevier Inc.

  8. Genome-Wide SNP Detection, Validation, and Development of an 8K SNP Array for Apple

    PubMed Central

    Chagné, David; Crowhurst, Ross N.; Troggio, Michela; Davey, Mark W.; Gilmore, Barbara; Lawley, Cindy; Vanderzande, Stijn; Hellens, Roger P.; Kumar, Satish; Cestaro, Alessandro; Velasco, Riccardo; Main, Dorrie; Rees, Jasper D.; Iezzoni, Amy; Mockler, Todd; Wilhelm, Larry; Van de Weg, Eric; Gardiner, Susan E.; Bassil, Nahla; Peace, Cameron

    2012-01-01

    As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica) breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of ‘Golden Delicious’, SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional), and genomic selection in apple. PMID:22363718

  9. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas

    The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as amore » supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.« less

  10. Synthetic biology: Novel approaches for microbiology.

    PubMed

    Padilla-Vaca, Felipe; Anaya-Velázquez, Fernando; Franco, Bernardo

    2015-06-01

    In the past twenty years, molecular genetics has created powerful tools for genetic manipulation of living organisms. Whole genome sequencing has provided necessary information to assess knowledge on gene function and protein networks. In addition, new tools permit to modify organisms to perform desired tasks. Gene function analysis is speed up by novel approaches that couple both high throughput data generation and mining. Synthetic biology is an emerging field that uses tools for generating novel gene networks, whole genome synthesis and engineering. New applications in biotechnological, pharmaceutical and biomedical research are envisioned for synthetic biology. In recent years these new strategies have opened up the possibilities to study gene and genome editing, creation of novel tools for functional studies in virus, parasites and pathogenic bacteria. There is also the possibility to re-design organisms to generate vaccine subunits or produce new pharmaceuticals to combat multi-drug resistant pathogens. In this review we provide our opinion on the applicability of synthetic biology strategies for functional studies of pathogenic organisms and some applications such as genome editing and gene network studies to further comprehend virulence factors and determinants in pathogenic organisms. We also discuss what we consider important ethical issues for this field of molecular biology, especially for potential misuse of the new technologies. Copyright© by the Spanish Society for Microbiology and Institute for Catalan Studies.

  11. Functional Profiling Using the Saccharomyces Genome Deletion Project Collections.

    PubMed

    Nislow, Corey; Wong, Lai Hong; Lee, Amy Huei-Yi; Giaever, Guri

    2016-09-01

    The ability to measure and quantify the fitness of an entire organism requires considerably more complex approaches than simply using traditional "omic" methods that examine, for example, the abundance of RNA transcripts, proteins, or metabolites. The yeast deletion collections represent the only systematic, comprehensive set of null alleles for any organism in which such fitness measurements can be assayed. Generated by the Saccharomyces Genome Deletion Project, these collections allow the systematic and parallel analysis of gene functions using any measurable phenotype. The unique 20-bp molecular barcodes engineered into the genome of each deletion strain facilitate the massively parallel analysis of individual fitness. Here, we present functional genomic protocols for use with the yeast deletion collections. We describe how to maintain, propagate, and store the deletion collections and how to perform growth fitness assays on single and parallel screening platforms. Phenotypic fitness analyses of the yeast mutants, described in brief here, provide important insights into biological functions, mechanisms of drug action, and response to environmental stresses. It is important to bear in mind that the specific assays described in this protocol represent some of the many ways in which these collections can be assayed, and in this description particular attention is paid to maximizing throughput using growth as the phenotypic measure. © 2016 Cold Spring Harbor Laboratory Press.

  12. Microbial Ecology and Evolution in the Acid Mine Drainage Model System.

    PubMed

    Huang, Li-Nan; Kuang, Jia-Liang; Shu, Wen-Sheng

    2016-07-01

    Acid mine drainage (AMD) is a unique ecological niche for acid- and toxic-metals-adapted microorganisms. These low-complexity systems offer a special opportunity for the ecological and evolutionary analyses of natural microbial assemblages. The last decade has witnessed an unprecedented interest in the study of AMD communities using 16S rRNA high-throughput sequencing and community genomic and postgenomic methodologies, significantly advancing our understanding of microbial diversity, community function, and evolution in acidic environments. This review describes new data on AMD microbial ecology and evolution, especially dynamics of microbial diversity, community functions, and population genomes, and further identifies gaps in our current knowledge that future research, with integrated applications of meta-omics technologies, will fill. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. High throughput SNP discovery and genotyping in hexaploid wheat

    PubMed Central

    Navarro, Julien; Kitt, Jonathan; Choulet, Frédéric; Leveugle, Magalie; Duarte, Jorge; Rivière, Nathalie; Eversole, Kellye; Le Gouis, Jacques; Davassi, Alessandro; Balfourier, François; Le Paslier, Marie-Christine; Berard, Aurélie; Brunel, Dominique; Feuillet, Catherine; Poncet, Charles; Sourdille, Pierre

    2018-01-01

    Because of their abundance and their amenability to high-throughput genotyping techniques, Single Nucleotide Polymorphisms (SNPs) are powerful tools for efficient genetics and genomics studies, including characterization of genetic resources, genome-wide association studies and genomic selection. In wheat, most of the previous SNP discovery initiatives targeted the coding fraction, leaving almost 98% of the wheat genome largely unexploited. Here we report on the use of whole-genome resequencing data from eight wheat lines to mine for SNPs in the genic, the repetitive and non-repetitive intergenic fractions of the wheat genome. Eventually, we identified 3.3 million SNPs, 49% being located on the B-genome, 41% on the A-genome and 10% on the D-genome. We also describe the development of the TaBW280K high-throughput genotyping array containing 280,226 SNPs. Performance of this chip was examined by genotyping a set of 96 wheat accessions representing the worldwide diversity. Sixty-nine percent of the SNPs can be efficiently scored, half of them showing a diploid-like clustering. The TaBW280K was proven to be a very efficient tool for diversity analyses, as well as for breeding as it can discriminate between closely related elite varieties. Finally, the TaBW280K array was used to genotype a population derived from a cross between Chinese Spring and Renan, leading to the construction a dense genetic map comprising 83,721 markers. The results described here will provide the wheat community with powerful tools for both basic and applied research. PMID:29293495

  14. Ionomics: The functional genomics of elements.

    PubMed

    Baxter, Ivan

    2010-03-01

    Ionomics is the study of elemental accumulation in living systems using high-throughput elemental profiling. This approach has been applied extensively in plants for forward and reverse genetics, screening diversity panels, and modeling of physiological states. In this review, I will discuss some of the advantages and limitations of the ionomics approach as well as the important parameters to consider when designing ionomics experiments, and how to evaluate ionomics data.

  15. Candidate Cancer Allele cDNA Collection | Office of Cancer Genomics

    Cancer.gov

    CTD2 researchers at the Broad Institute/DFCI have developed a collection of plasmids including mutant alleles found in sequencing studies of cancer. It includes somatic variants found in lung adenocarcinoma and across other cancer types. The clones enable researchers to characterize the function of the cancer variants in a high throughput experiments. These plasmids are collectively called the “Broad Target Accelerator Plasmid Collections”.

  16. cDNA Clones with Rare and Recurrent Mutations Found in Cancers | Office of Cancer Genomics

    Cancer.gov

    The CTD2 Center at UT- MD Anderson Cancer Center has developed High-Throughput Mutagenesis and Molecular Barcoding (HiTMMoB)1,2 pipeline to construct mutant alleles open reading frame expression clones that are either recurrent or rare in cancers. These barcoded genes can be used for context-specific functional validation, detection of novel biomarkers (pathway activation) and targets (drug sensitivity).

  17. Breeding nursery tissue collection for possible genomic analysis

    USDA-ARS?s Scientific Manuscript database

    Phenotyping is considered a major bottleneck in breeding programs. With new genomic technologies, high throughput genotype schemes are constantly being developed. However, every genomic technology requires phenotypic data to inform prediction models generated from the technology. Forage breeders con...

  18. Translational bioinformatics in the cloud: an affordable alternative

    PubMed Central

    2010-01-01

    With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073

  19. VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data.

    PubMed

    Peterson, Elena S; McCue, Lee Ann; Schrimpe-Rutledge, Alexandra C; Jensen, Jeffrey L; Walker, Hyunjoo; Kobold, Markus A; Webb, Samantha R; Payne, Samuel H; Ansong, Charles; Adkins, Joshua N; Cannon, William R; Webb-Robertson, Bobbie-Jo M

    2012-04-05

    The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.

  20. VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

    PubMed Central

    2012-01-01

    Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php. PMID:22480257

  1. High-Throughput resequencing of maize landraces at genomic regions associated with flowering time

    USDA-ARS?s Scientific Manuscript database

    Despite the reduction in the price of sequencing, it remains expensive to sequence and assemble whole, complex genomes of multiple samples for population studies, particularly for large genomes like those of many crop species. Enrichment of target genome regions coupled with next generation sequenci...

  2. Characterizing visible and invisible cell wall mutant phenotypes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carpita, Nicholas C.; McCann, Maureen C.

    2015-04-06

    About 10% of a plant's genome is devoted to generating the protein machinery to synthesize, remodel, and deconstruct the cell wall. High-throughput genome sequencing technologies have enabled a reasonably complete inventory of wall-related genes that can be assembled into families of common evolutionary origin. Assigning function to each gene family member has been aided immensely by identification of mutants with visible phenotypes or by chemical and spectroscopic analysis of mutants with ‘invisible’ phenotypes of modified cell wall composition and architecture that do not otherwise affect plant growth or development. This review connects the inference of gene function on the basismore » of deviation from the wild type in genetic functional analyses to insights provided by modern analytical techniques that have brought us ever closer to elucidating the sequence structures of the major polysaccharide components of the plant cell wall.« less

  3. Virus-induced gene silencing offers a functional genomics platform for studying plant cell wall formation.

    PubMed

    Zhu, Xiaohong; Pattathil, Sivakumar; Mazumder, Koushik; Brehm, Amanda; Hahn, Michael G; Dinesh-Kumar, S P; Joshi, Chandrashekhar P

    2010-09-01

    Virus-induced gene silencing (VIGS) is a powerful genetic tool for rapid assessment of plant gene functions in the post-genomic era. Here, we successfully implemented a Tobacco Rattle Virus (TRV)-based VIGS system to study functions of genes involved in either primary or secondary cell wall formation in Nicotiana benthamiana plants. A 3-week post-VIGS time frame is sufficient to observe phenotypic alterations in the anatomical structure of stems and chemical composition of the primary and secondary cell walls. We used cell wall glycan-directed monoclonal antibodies to demonstrate that alteration of cell wall polymer synthesis during the secondary growth phase of VIGS plants has profound effects on the extractability of components from woody stem cell walls. Therefore, TRV-based VIGS together with cell wall component profiling methods provide a high-throughput gene discovery platform for studying plant cell wall formation from a bioenergy perspective.

  4. Epigenetics: the language of the cell?

    PubMed

    Huang, Biao; Jiang, Cizhong; Zhang, Rongxin

    2014-02-01

    Epigenetics is one of the most rapidly developing fields of biological research. Breakthroughs in several technologies have enabled the possibility of genome-wide epigenetic research, for example the mapping of human genome-wide DNA methylation. In addition, with the development of various high-throughput and high-resolution sequencing technologies, a large number of functional noncoding RNAs have been identified. Massive studies indicated that these functional ncRNA also play an important role in epigenetics. In this review, we gain inspiration from the recent proposal of the ceRNAs hypothesis. This hypothesis proposes that miRNAs act as a language of communication. Accordingly, we further deduce that all of epigenetics may functionally acquire such a unique language characteristic. In summary, various epigenetic markers may not only participate in regulating cellular processes, but they may also act as the intracellular 'language' of communication and are involved in extensive information exchanges within cell.

  5. Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides

    PubMed Central

    Geiselman, Gina M; Ito, Masakazu; Mondo, Stephen J; Reilly, Morgann C; Cheng, Ya-Fang; Bauer, Stefan; Grigoriev, Igor V; Gladden, John M; Simmons, Blake A; Brem, Rachel B

    2018-01-01

    The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted function in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. These results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi. PMID:29521624

  6. The genomic landscape of chronic lymphocytic leukaemia: biological and clinical implications.

    PubMed

    Strefford, Jonathan C

    2015-04-01

    Chronic lymphocytic leukaemia (CLL) remains at the forefront of the genetic analysis of human tumours, principally due its prevalence, protracted natural history and accessibility to suitable material for analysis. With the application of high-throughput genetic technologies, we have an unbridled view of the architecture of the CLL genome, including a comprehensive description of the copy number and mutational landscape of the disease, a detailed picture of clonal evolution during pathogenesis, and the molecular mechanisms that drive genomic instability and therapeutic resistance. This work has nuanced the prognostic importance of established copy number alterations, and identified novel prognostically relevant gene mutations that function within biological pathways that are attractive treatment targets. Herein, an overview of recent genomic discoveries will be reviewed, with associated biological and clinical implications, and a view into how clinical implementation may be facilitated. © 2014 John Wiley & Sons Ltd.

  7. Issues with RNA-seq analysis in non-model organisms: A salmonid example.

    PubMed

    Sundaram, Arvind; Tengs, Torstein; Grimholt, Unni

    2017-10-01

    High throughput sequencing (HTS) is useful for many purposes as exemplified by the other topics included in this special issue. The purpose of this paper is to look into the unique challenges of using this technology in non-model organisms where resources such as genomes, functional genome annotations or genome complexity provide obstacles not met in model organisms. To describe these challenges, we narrow our scope to RNA sequencing used to study differential gene expression in response to pathogen challenge. As a demonstration species we chose Atlantic salmon, which has a sequenced genome with poor annotation and an added complexity due to many duplicated genes. We find that our RNA-seq analysis pipeline deciphers between duplicates despite high sequence identity. However, annotation issues provide problems in linking differentially expressed genes to pathways. Also, comparing results between approaches and species are complicated due to lack of standardized annotation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Signal Processing for Metagenomics: Extracting Information from the Soup

    PubMed Central

    Rosen, Gail L.; Sokhansanj, Bahrad A.; Polikar, Robi; Bruns, Mary Ann; Russell, Jacob; Garbarine, Elaine; Essinger, Steve; Yok, Non

    2009-01-01

    Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology. PMID:20436876

  9. An integrative model for in-silico clinical-genomics discovery science.

    PubMed

    Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael

    2002-01-01

    Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.

  10. Using a Fluorescent PCR-capillary Gel Electrophoresis Technique to Genotype CRISPR/Cas9-mediated Knockout Mutants in a High-throughput Format.

    PubMed

    Ramlee, Muhammad Khairul; Wang, Jing; Cheung, Alice M S; Li, Shang

    2017-04-08

    The development of programmable genome-editing tools has facilitated the use of reverse genetics to understand the roles specific genomic sequences play in the functioning of cells and whole organisms. This cause has been tremendously aided by the recent introduction of the CRISPR/Cas9 system-a versatile tool that allows researchers to manipulate the genome and transcriptome in order to, among other things, knock out, knock down, or knock in genes in a targeted manner. For the purpose of knocking out a gene, CRISPR/Cas9-mediated double-strand breaks recruit the non-homologous end-joining DNA repair pathway to introduce the frameshift-causing insertion or deletion of nucleotides at the break site. However, an individual guide RNA may cause undesirable off-target effects, and to rule these out, the use of multiple guide RNAs is necessary. This multiplicity of targets also means that a high-volume screening of clones is required, which in turn begs the use of an efficient high-throughput technique to genotype the knockout clones. Current genotyping techniques either suffer from inherent limitations or incur high cost, hence rendering them unsuitable for high-throughput purposes. Here, we detail the protocol for using fluorescent PCR, which uses genomic DNA from crude cell lysate as a template, and then resolving the PCR fragments via capillary gel electrophoresis. This technique is accurate enough to differentiate one base-pair difference between fragments and hence is adequate in indicating the presence or absence of a frameshift in the coding sequence of the targeted gene. This precise knowledge effectively precludes the need for a confirmatory sequencing step and allows users to save time and cost in the process. Moreover, this technique has proven to be versatile in genotyping various mammalian cells of various tissue origins targeted by guide RNAs against numerous genes, as shown here and elsewhere.

  11. Precision Medicine: Functional Advancements.

    PubMed

    Caskey, Thomas

    2018-01-29

    Precision medicine was conceptualized on the strength of genomic sequence analysis. High-throughput functional metrics have enhanced sequence interpretation and clinical precision. These technologies include metabolomics, magnetic resonance imaging, and I rhythm (cardiac monitoring), among others. These technologies are discussed and placed in clinical context for the medical specialties of internal medicine, pediatrics, obstetrics, and gynecology. Publications in these fields support the concept of a higher level of precision in identifying disease risk. Precise disease risk identification has the potential to enable intervention with greater specificity, resulting in disease prevention-an important goal of precision medicine.

  12. The ChIP-exo Method: Identifying Protein-DNA Interactions with Near Base Pair Precision.

    PubMed

    Perreault, Andrea A; Venters, Bryan J

    2016-12-23

    Chromatin immunoprecipitation (ChIP) is an indispensable tool in the fields of epigenetics and gene regulation that isolates specific protein-DNA interactions. ChIP coupled to high throughput sequencing (ChIP-seq) is commonly used to determine the genomic location of proteins that interact with chromatin. However, ChIP-seq is hampered by relatively low mapping resolution of several hundred base pairs and high background signal. The ChIP-exo method is a refined version of ChIP-seq that substantially improves upon both resolution and noise. The key distinction of the ChIP-exo methodology is the incorporation of lambda exonuclease digestion in the library preparation workflow to effectively footprint the left and right 5' DNA borders of the protein-DNA crosslink site. The ChIP-exo libraries are then subjected to high throughput sequencing. The resulting data can be leveraged to provide unique and ultra-high resolution insights into the functional organization of the genome. Here, we describe the ChIP-exo method that we have optimized and streamlined for mammalian systems and next-generation sequencing-by-synthesis platform.

  13. Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters

    PubMed Central

    Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

    2011-01-01

    The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. PMID:22144874

  14. Comparative microbial modules resource: generation and visualization of multi-species biclusters.

    PubMed

    Kacmarczyk, Thadeous; Waltman, Peter; Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

    2011-12-01

    The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures - results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. © 2011 Kacmarczyk et al.

  15. Identification and profiling of novel microRNAs in the Brassica rapa genome based on small RNA deep sequencing

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are one of the functional non-coding small RNAs involved in the epigenetic control of the plant genome. Although plants contain both evolutionary conserved miRNAs and species-specific miRNAs within their genomes, computational methods often only identify evolutionary conserved miRNAs. The recent sequencing of the Brassica rapa genome enables us to identify miRNAs and their putative target genes. In this study, we sought to provide a more comprehensive prediction of B. rapa miRNAs based on high throughput small RNA deep sequencing. Results We sequenced small RNAs from five types of tissue: seedlings, roots, petioles, leaves, and flowers. By analyzing 2.75 million unique reads that mapped to the B. rapa genome, we identified 216 novel and 196 conserved miRNAs that were predicted to target approximately 20% of the genome’s protein coding genes. Quantitative analysis of miRNAs from the five types of tissue revealed that novel miRNAs were expressed in diverse tissues but their expression levels were lower than those of the conserved miRNAs. Comparative analysis of the miRNAs between the B. rapa and Arabidopsis thaliana genomes demonstrated that redundant copies of conserved miRNAs in the B. rapa genome may have been deleted after whole genome triplication. Novel miRNA members seemed to have spontaneously arisen from the B. rapa and A. thaliana genomes, suggesting the species-specific expansion of miRNAs. We have made this data publicly available in a miRNA database of B. rapa called BraMRs. The database allows the user to retrieve miRNA sequences, their expression profiles, and a description of their target genes from the five tissue types investigated here. Conclusions This is the first report to identify novel miRNAs from Brassica crops using genome-wide high throughput techniques. The combination of computational methods and small RNA deep sequencing provides robust predictions of miRNAs in the genome. The finding of numerous novel miRNAs, many with few target genes and low expression levels, suggests the rapid evolution of miRNA genes. The development of a miRNA database, BraMRs, enables us to integrate miRNA identification, target prediction, and functional annotation of target genes. BraMRs will represent a valuable public resource with which to study the epigenetic control of B. rapa and other closely related Brassica species. The database is available at the following link: http://bramrs.rna.kr [1]. PMID:23163954

  16. Rapid construction of a whole-genome transposon insertion collection for Shewanella oneidensis by Knockout Sudoku.

    PubMed

    Baym, Michael; Shaket, Lev; Anzai, Isao A; Adesina, Oluwakemi; Barstow, Buz

    2016-11-10

    Whole-genome knockout collections are invaluable for connecting gene sequence to function, yet traditionally, their construction has required an extraordinary technical effort. Here we report a method for the construction and purification of a curated whole-genome collection of single-gene transposon disruption mutants termed Knockout Sudoku. Using simple combinatorial pooling, a highly oversampled collection of mutants is condensed into a next-generation sequencing library in a single day, a 30- to 100-fold improvement over prior methods. The identities of the mutants in the collection are then solved by a probabilistic algorithm that uses internal self-consistency within the sequencing data set, followed by rapid algorithmically guided condensation to a minimal representative set of mutants, validation, and curation. Starting from a progenitor collection of 39,918 mutants, we compile a quality-controlled knockout collection of the electroactive microbe Shewanella oneidensis MR-1 containing representatives for 3,667 genes that is functionally validated by high-throughput kinetic measurements of quinone reduction.

  17. Disease modeling in genetic kidney diseases: zebrafish.

    PubMed

    Schenk, Heiko; Müller-Deile, Janina; Kinast, Mark; Schiffer, Mario

    2017-07-01

    Growing numbers of translational genomics studies are based on the highly efficient and versatile zebrafish (Danio rerio) vertebrate model. The increasing types of zebrafish models have improved our understanding of inherited kidney diseases, since they not only display pathophysiological changes but also give us the opportunity to develop and test novel treatment options in a high-throughput manner. New paradigms in inherited kidney diseases have been developed on the basis of the distinct genome conservation of approximately 70 % between zebrafish and humans in terms of existing gene orthologs. Several options are available to determine the functional role of a specific gene or gene sets. Permanent genome editing can be induced via complete gene knockout by using the CRISPR/Cas-system, among others, or via transient modification by using various morpholino techniques. Cross-species rescues succeeding knockdown techniques are employed to determine the functional significance of a target gene or a specific mutation. This article summarizes the current techniques and discusses their perspectives.

  18. Outlook for Development of High-throughput Cryopreservation for Small-bodied Biomedical Model Fishes★

    PubMed Central

    Tiersch, Terrence R.; Yang, Huiping; Hu, E.

    2011-01-01

    With the development of genomic research technologies, comparative genome studies among vertebrate species are becoming commonplace for human biomedical research. Fish offer unlimited versatility for biomedical research. Extensive studies are done using these fish models, yielding tens of thousands of specific strains and lines, and the number is increasing every day. Thus, high-throughput sperm cryopreservation is urgently needed to preserve these genetic resources. Although high-throughput processing has been widely applied for sperm cryopreservation in livestock for decades, application in biomedical model fishes is still in the concept-development stage because of the limited sample volumes and the biological characteristics of fish sperm. High-throughput processing in livestock was developed based on advances made in the laboratory and was scaled up for increased processing speed, capability for mass production, and uniformity and quality assurance. Cryopreserved germplasm combined with high-throughput processing constitutes an independent industry encompassing animal breeding, preservation of genetic diversity, and medical research. Currently, there is no specifically engineered system available for high-throughput of cryopreserved germplasm for aquatic species. This review is to discuss the concepts and needs for high-throughput technology for model fishes, propose approaches for technical development, and overview future directions of this approach. PMID:21440666

  19. Bioconductor | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. R/Bioconductor will be enhanced to meet the increasing complexity of multiassay cancer genomics experiments.

  20. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

    PubMed Central

    Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay

    2004-01-01

    Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175

  1. Creating a RAW264.7 CRISPR-Cas9 Genome Wide Library

    PubMed Central

    Napier, Brooke A; Monack, Denise M

    2017-01-01

    The bacterial clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 genome editing tools are used in mammalian cells to knock-out specific genes of interest to elucidate gene function. The CRISPR-Cas9 system requires that the mammalian cell expresses Cas9 endonuclease, guide RNA (gRNA) to lead the endonuclease to the gene of interest, and the PAM sequence that links the Cas9 to the gRNA. CRISPR-Cas9 genome wide libraries are used to screen the effect of each gene in the genome on the cellular phenotype of interest, in an unbiased high-throughput manner. In this protocol, we describe our method of creating a CRISPR-Cas9 genome wide library in a transformed murine macrophage cell-line (RAW264.7). We have employed this library to identify novel mediators in the caspase-11 cell death pathway (Napier et al., 2016); however, this library can then be used to screen the importance of specific genes in multiple murine macrophage cellular pathways. PMID:28868328

  2. Multi-scale structural community organisation of the human genome.

    PubMed

    Boulos, Rasha E; Tremblay, Nicolas; Arneodo, Alain; Borgnat, Pierre; Audit, Benjamin

    2017-04-11

    Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.

  3. Genomic and oncogenic preference of HBV integration in hepatocellular carcinoma

    PubMed Central

    Zhao, Ling-Hao; Liu, Xiao; Yan, He-Xin; Li, Wei-Yang; Zeng, Xi; Yang, Yuan; Zhao, Jie; Liu, Shi-Ping; Zhuang, Xue-Han; Lin, Chuan; Qin, Chen-Jie; Zhao, Yi; Pan, Ze-Ya; Huang, Gang; Liu, Hui; Zhang, Jin; Wang, Ruo-Yu; Yang, Yun; Wen, Wen; Lv, Gui-Shuai; Zhang, Hui-Lu; Wu, Han; Huang, Shuai; Wang, Ming-Da; Tang, Liang; Cao, Hong-Zhi; Wang, Ling; Lee, Tin-Lap; Jiang, Hui; Tan, Ye-Xiong; Yuan, Sheng-Xian; Hou, Guo-Jun; Tao, Qi-Fei; Xu, Qin-Guo; Zhang, Xiu-Qing; Wu, Meng-Chao; Xu, Xun; Wang, Jun; Yang, Huan-Ming; Zhou, Wei-Ping; Wang, Hong-Yang

    2016-01-01

    Hepatitis B virus (HBV) can integrate into the human genome, contributing to genomic instability and hepatocarcinogenesis. Here by conducting high-throughput viral integration detection and RNA sequencing, we identify 4,225 HBV integration events in tumour and adjacent non-tumour samples from 426 patients with HCC. We show that HBV is prone to integrate into rare fragile sites and functional genomic regions including CpG islands. We observe a distinct pattern in the preferential sites of HBV integration between tumour and non-tumour tissues. HBV insertional sites are significantly enriched in the proximity of telomeres in tumours. Recurrent HBV target genes are identified with few that overlap. The overall HBV integration frequency is much higher in tumour genomes of males than in females, with a significant enrichment of integration into chromosome 17. Furthermore, a cirrhosis-dependent HBV integration pattern is observed, affecting distinct targeted genes. Our data suggest that HBV integration has a high potential to drive oncogenic transformation. PMID:27703150

  4. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes.

    PubMed

    Lowe, Todd M; Chan, Patricia P

    2016-07-08

    High-throughput genome sequencing continues to grow the need for rapid, accurate genome annotation and tRNA genes constitute the largest family of essential, ever-present non-coding RNA genes. Newly developed tRNAscan-SE 2.0 has advanced the state-of-the-art methodology in tRNA gene detection and functional prediction, captured by rich new content of the companion Genomic tRNA Database. Previously, web-server tRNA detection was isolated from knowledge of existing tRNAs and their annotation. In this update of the tRNAscan-SE On-line resource, we tie together improvements in tRNA classification with greatly enhanced biological context via dynamically generated links between web server search results, the most relevant genes in the GtRNAdb and interactive, rich genome context provided by UCSC genome browsers. The tRNAscan-SE On-line web server can be accessed at http://trna.ucsc.edu/tRNAscan-SE/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Ecological roles of dominant and rare prokaryotes in acid mine drainage revealed by metagenomics and metatranscriptomics.

    PubMed

    Hua, Zheng-Shuang; Han, Yu-Jiao; Chen, Lin-Xing; Liu, Jun; Hu, Min; Li, Sheng-Jin; Kuang, Jia-Liang; Chain, Patrick S G; Huang, Li-Nan; Shu, Wen-Sheng

    2015-06-01

    High-throughput sequencing is expanding our knowledge of microbial diversity in the environment. Still, understanding the metabolic potentials and ecological roles of rare and uncultured microbes in natural communities remains a major challenge. To this end, we applied a 'divide and conquer' strategy that partitioned a massive metagenomic data set (>100 Gbp) into subsets based on K-mer frequency in sequence assembly to a low-diversity acid mine drainage (AMD) microbial community and, by integrating with an additional metatranscriptomic assembly, successfully obtained 11 draft genomes most of which represent yet uncultured and/or rare taxa (relative abundance <1%). We report the first genome of a naturally occurring Ferrovum population (relative abundance >90%) and its metabolic potentials and gene expression profile, providing initial molecular insights into the ecological role of these lesser known, but potentially important, microorganisms in the AMD environment. Gene transcriptional analysis of the active taxa revealed major metabolic capabilities executed in situ, including carbon- and nitrogen-related metabolisms associated with syntrophic interactions, iron and sulfur oxidation, which are key in energy conservation and AMD generation, and the mechanisms of adaptation and response to the environmental stresses (heavy metals, low pH and oxidative stress). Remarkably, nitrogen fixation and sulfur oxidation were performed by the rare taxa, indicating their critical roles in the overall functioning and assembly of the AMD community. Our study demonstrates the potential of the 'divide and conquer' strategy in high-throughput sequencing data assembly for genome reconstruction and functional partitioning analysis of both dominant and rare species in natural microbial assemblages.

  6. Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis, and Open Problems.

    PubMed

    Wei, Yingying; Wu, George; Ji, Hongkai

    2013-05-01

    Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites. The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.

  7. Orchestrating high-throughput genomic analysis with Bioconductor

    PubMed Central

    Huber, Wolfgang; Carey, Vincent J.; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S.; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D.; Irizarry, Rafael A.; Lawrence, Michael; Love, Michael I.; MacDonald, James; Obenchain, Valerie; Oleś, Andrzej K.; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K.; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin

    2015-01-01

    Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors. PMID:25633503

  8. Multiplexed fragaria chloroplast genome sequencing

    Treesearch

    W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

    2010-01-01

    A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...

  9. solGS: a web-based tool for genomic selection

    USDA-ARS?s Scientific Manuscript database

    Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, ana...

  10. High-throughput materials discovery and development: breakthroughs and challenges in the mapping of the materials genome

    NASA Astrophysics Data System (ADS)

    Buongiorno Nardelli, Marco

    High-Throughput Quantum-Mechanics computation of materials properties by ab initio methods has become the foundation of an effective approach to materials design, discovery and characterization. This data driven approach to materials science currently presents the most promising path to the development of advanced technological materials that could solve or mitigate important social and economic challenges of the 21st century. In particular, the rapid proliferation of computational data on materials properties presents the possibility to complement and extend materials property databases where the experimental data is lacking and difficult to obtain. Enhanced repositories such as AFLOWLIB open novel opportunities for structure discovery and optimization, including uncovering of unsuspected compounds, metastable structures and correlations between various properties. The practical realization of these opportunities depends almost exclusively on the the design of efficient algorithms for electronic structure simulations of realistic material systems beyond the limitations of the current standard theories. In this talk, I will review recent progress in theoretical and computational tools, and in particular, discuss the development and validation of novel functionals within Density Functional Theory and of local basis representations for effective ab-initio tight-binding schemes. Marco Buongiorno Nardelli is a pioneer in the development of computational platforms for theory/data/applications integration rooted in his profound and extensive expertise in the design of electronic structure codes and in his vision for sustainable and innovative software development for high-performance materials simulations. His research activities range from the design and discovery of novel materials for 21st century applications in renewable energy, environment, nano-electronics and devices, the development of advanced electronic structure theories and high-throughput techniques in materials genomics and computational materials design, to an active role as community scientific software developer (QUANTUM ESPRESSO, WanT, AFLOWpi)

  11. First TILLING Platform in Cucurbita pepo: A New Mutant Resource for Gene Function and Crop Improvement

    PubMed Central

    Vicente-Dólera, Nelly; Troadec, Christelle; Moya, Manuel; del Río-Celestino, Mercedes; Pomares-Viciana, Teresa; Bendahmane, Abdelhafid; Picó, Belén; Román, Belén; Gómez, Pedro

    2014-01-01

    Although the availability of genetic and genomic resources for Cucurbita pepo has increased significantly, functional genomic resources are still limited for this crop. In this direction, we have developed a high throughput reverse genetic tool: the first TILLING (Targeting Induced Local Lesions IN Genomes) resource for this species. Additionally, we have used this resource to demonstrate that the previous EMS mutant population we developed has the highest mutation density compared with other cucurbits mutant populations. The overall mutation density in this first C. pepo TILLING platform was estimated to be 1/133 Kb by screening five additional genes. In total, 58 mutations confirmed by sequencing were identified in the five targeted genes, thirteen of which were predicted to have an impact on the function of the protein. The genotype/phenotype correlation was studied in a peroxidase gene, revealing that the phenotype of seedling homozygous for one of the isolated mutant alleles was albino. These results indicate that the TILLING approach in this species was successful at providing new mutations and can address the major challenge of linking sequence information to biological function and also the identification of novel variation for crop breeding. PMID:25386735

  12. Analysis of Illumina Microbial Assemblies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clum, Alicia; Foster, Brian; Froula, Jeff

    2010-05-28

    Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less

  13. Expanded microbial genome coverage and improved protein family annotation in the COG database.

    PubMed

    Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.

  14. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE PAGES

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  15. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  16. The CRISPR/Cas Genome-Editing Tool: Application in Improvement of Crops

    PubMed Central

    Khatodia, Surender; Bhatotia, Kirti; Passricha, Nishat; Khurana, S. M. P.; Tuteja, Narendra

    2016-01-01

    The Clustered Regularly Interspaced Short Palindromic Repeats associated Cas9/sgRNA system is a novel targeted genome-editing technique derived from bacterial immune system. It is an inexpensive, easy, most user friendly and rapidly adopted genome editing tool transforming to revolutionary paradigm. This technique enables precise genomic modifications in many different organisms and tissues. Cas9 protein is an RNA guided endonuclease utilized for creating targeted double-stranded breaks with only a short RNA sequence to confer recognition of the target in animals and plants. Development of genetically edited (GE) crops similar to those developed by conventional or mutation breeding using this potential technique makes it a promising and extremely versatile tool for providing sustainable productive agriculture for better feeding of rapidly growing population in a changing climate. The emerging areas of research for the genome editing in plants include interrogating gene function, rewiring the regulatory signaling networks and sgRNA library for high-throughput loss-of-function screening. In this review, we have described the broad applicability of the Cas9 nuclease mediated targeted plant genome editing for development of designer crops. The regulatory uncertainty and social acceptance of plant breeding by Cas9 genome editing have also been described. With this powerful and innovative technique the designer GE non-GM plants could further advance climate resilient and sustainable agriculture in the future and maximizing yield by combating abiotic and biotic stresses. PMID:27148329

  17. Structuring intuition with theory: The high-throughput way

    NASA Astrophysics Data System (ADS)

    Fornari, Marco

    2015-03-01

    First principles methodologies have grown in accuracy and applicability to the point where large databases can be built, shared, and analyzed with the goal of predicting novel compositions, optimizing functional properties, and discovering unexpected relationships between the data. In order to be useful to a large community of users, data should be standardized, validated, and distributed. In addition, tools to easily manage large datasets should be made available to effectively lead to materials development. Within the AFLOW consortium we have developed a simple frame to expand, validate, and mine data repositories: the MTFrame. Our minimalistic approach complement AFLOW and other existing high-throughput infrastructures and aims to integrate data generation with data analysis. We present few examples from our work on materials for energy conversion. Our intent s to pinpoint the usefulness of high-throughput methodologies to guide the discovery process by quantitatively structuring the scientific intuition. This work was supported by ONR-MURI under Contract N00014-13-1-0635 and the Duke University Center for Materials Genomics.

  18. Robustness encoded across essential and accessory replicons of the ecologically versatile bacterium Sinorhizobium meliloti

    PubMed Central

    Walker, Graham C.; Finan, Turlough M.; Mengoni, Alessio; Griffitts, Joel S.

    2018-01-01

    Bacterial genome evolution is characterized by gains, losses, and rearrangements of functional genetic segments. The extent to which large-scale genomic alterations influence genotype-phenotype relationships has not been investigated in a high-throughput manner. In the symbiotic soil bacterium Sinorhizobium meliloti, the genome is composed of a chromosome and two large extrachromosomal replicons (pSymA and pSymB, which together constitute 45% of the genome). Massively parallel transposon insertion sequencing (Tn-seq) was employed to evaluate the contributions of chromosomal genes to growth fitness in both the presence and absence of these extrachromosomal replicons. Ten percent of chromosomal genes from diverse functional categories are shown to genetically interact with pSymA and pSymB. These results demonstrate the pervasive robustness provided by the extrachromosomal replicons, which is further supported by constraint-based metabolic modeling. A comprehensive picture of core S. meliloti metabolism was generated through a Tn-seq-guided in silico metabolic network reconstruction, producing a core network encompassing 726 genes. This integrated approach facilitated functional assignments for previously uncharacterized genes, while also revealing that Tn-seq alone missed over a quarter of wild-type metabolism. This work highlights the many functional dependencies and epistatic relationships that may arise between bacterial replicons and across a genome, while also demonstrating how Tn-seq and metabolic modeling can be used together to yield insights not obtainable by either method alone. PMID:29672509

  19. Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

    PubMed Central

    Conte, Matthieu G; Gaillard, Sylvain; Droc, Gaetan; Perin, Christophe

    2008-01-01

    Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods. PMID:18426584

  20. Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coradetti, Samuel T.; Pinel, Dominic; Geiselman, Gina M.

    The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted functionmore » in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. Lastly, these results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi.« less

  1. Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides

    DOE PAGES

    Coradetti, Samuel T.; Pinel, Dominic; Geiselman, Gina M.; ...

    2018-03-09

    The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted functionmore » in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. Lastly, these results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi.« less

  2. High-throughput gene mapping in Caenorhabditis elegans.

    PubMed

    Swan, Kathryn A; Curtis, Damian E; McKusick, Kathleen B; Voinov, Alexander V; Mapa, Felipa A; Cancilla, Michael R

    2002-07-01

    Positional cloning of mutations in model genetic systems is a powerful method for the identification of targets of medical and agricultural importance. To facilitate the high-throughput mapping of mutations in Caenorhabditis elegans, we have identified a further 9602 putative new single nucleotide polymorphisms (SNPs) between two C. elegans strains, Bristol N2 and the Hawaiian mapping strain CB4856, by sequencing inserts from a CB4856 genomic DNA library and using an informatics pipeline to compare sequences with the canonical N2 genomic sequence. When combined with data from other laboratories, our marker set of 17,189 SNPs provides even coverage of the complete worm genome. To date, we have confirmed >1099 evenly spaced SNPs (one every 91 +/- 56 kb) across the six chromosomes and validated the utility of our SNP marker set and new fluorescence polarization-based genotyping methods for systematic and high-throughput identification of genes in C. elegans by cloning several proprietary genes. We illustrate our approach by recombination mapping and confirmation of the mutation in the cloned gene, dpy-18.

  3. Harvesting Legume Genomes: Plant Genetic Resources

    USDA-ARS?s Scientific Manuscript database

    Genomics and high through-put phenotyping are ushering in a new era of accessing genetic diversity held in plant genetic resources, the cornerstone of both traditional and genomics-assisted breeding efforts of food legume crops. Acknowledged or not, yield plateaus must be broken given the daunting ...

  4. Evaluating imputation algorithms for low-depth genotyping-by-sequencing (GBS) data

    USDA-ARS?s Scientific Manuscript database

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordabl...

  5. GAP Final Technical Report 12-14-04

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andrew J. Bordner, PhD, Senior Research Scientist

    2004-12-14

    The Genomics Annotation Platform (GAP) was designed to develop new tools for high throughput functional annotation and characterization of protein sequences and structures resulting from genomics and structural proteomics, benchmarking and application of those tools. Furthermore, this platform integrated the genomic scale sequence and structural analysis and prediction tools with the advanced structure prediction and bioinformatics environment of ICM. The development of GAP was primarily oriented towards the annotation of new biomolecular structures using both structural and sequence data. Even though the amount of protein X-ray crystal data is growing exponentially, the volume of sequence data is growing even moremore » rapidly. This trend was exploited by leveraging the wealth of sequence data to provide functional annotation for protein structures. The additional information provided by GAP is expected to assist the majority of the commercial users of ICM, who are involved in drug discovery, in identifying promising drug targets as well in devising strategies for the rational design of therapeutics directed at the protein of interest. The GAP also provided valuable tools for biochemistry education, and structural genomics centers. In addition, GAP incorporates many novel prediction and analysis methods not available in other molecular modeling packages. This development led to signing the first Molsoft agreement in the structural genomics annotation area with the University of oxford Structural Genomics Center. This commercial agreement validated the Molsoft efforts under the GAP project and provided the basis for further development of the large scale functional annotation platform.« less

  6. Genetic Complexity and Quantitative Trait Loci Mapping of Yeast Morphological Traits

    PubMed Central

    Nogami, Satoru; Ohya, Yoshikazu; Yvert, Gaël

    2007-01-01

    Functional genomics relies on two essential parameters: the sensitivity of phenotypic measures and the power to detect genomic perturbations that cause phenotypic variations. In model organisms, two types of perturbations are widely used. Artificial mutations can be introduced in virtually any gene and allow the systematic analysis of gene function via mutants fitness. Alternatively, natural genetic variations can be associated to particular phenotypes via genetic mapping. However, the access to genome manipulation and breeding provided by model organisms is sometimes counterbalanced by phenotyping limitations. Here we investigated the natural genetic diversity of Saccharomyces cerevisiae cellular morphology using a very sensitive high-throughput imaging platform. We quantified 501 morphological parameters in over 50,000 yeast cells from a cross between two wild-type divergent backgrounds. Extensive morphological differences were found between these backgrounds. The genetic architecture of the traits was complex, with evidence of both epistasis and transgressive segregation. We mapped quantitative trait loci (QTL) for 67 traits and discovered 364 correlations between traits segregation and inheritance of gene expression levels. We validated one QTL by the replacement of a single base in the genome. This study illustrates the natural diversity and complexity of cellular traits among natural yeast strains and provides an ideal framework for a genetical genomics dissection of multiple traits. Our results did not overlap with results previously obtained from systematic deletion strains, showing that both approaches are necessary for the functional exploration of genomes. PMID:17319748

  7. Silencing GhNDR1 and GhMKK2 compromised cotton resistance to Verticillium wilt

    PubMed Central

    Gao, Xiquan; Wheeler, Terry; Li, Zhaohu; Kenerley, Charles M.; He, Ping; Shan, Libo

    2011-01-01

    SUMMARY Cotton is an important cash crop worldwide and serves as a significant source of fiber, feed, foodstuff, oil and biofuel products. Considerable effort in genetics and genomics has been expended to increase sustainable yield and quality through molecular breeding and genetic engineering of new cotton cultivars. With the effort of whole genome sequencing of cotton, it is essential to develop molecular tools and resources for large-scale analysis of gene functions at the genome-wide level. We have successfully established an Agrobacterium-mediated virus-induced gene silencing (VIGS) assay in several cotton cultivars with different genetic backgrounds. The genes of interest were potently and readily silenced within 2 weeks after inoculation at the seedling stage. Importantly, we showed that silencing GhNDR1 and GhMKK2 compromised cotton resistance to the infection by Verticillium dahliae, a fungal pathogen causing Verticillium wilt. Furthermore, we established a cotton protoplast system for transient gene expression to study gene functions by a gain-of-function approach. The viable protoplasts were isolated from green cotyledons, etiolated cotyledons, and true leaves, and responded to a wide range of pathogen elicitors and phytohormones. Remarkably, cotton plants possess conserved, but also distinct MAP kinase activation with Arabidopsis upon bacterial elicitor flagellin perception. Thus, we demonstrated that GhNDR1 and GhMKK2 are required for Verticillium resistance in cotton using gene silencing assays, and established the high throughput loss-of-function and gain-of-function assays for functional genomic studies in cotton. PMID:21219508

  8. A multi-tissue type genome-scale metabolic network for analysis of whole-body systems physiology

    PubMed Central

    2011-01-01

    Background Genome-scale metabolic reconstructions provide a biologically meaningful mechanistic basis for the genotype-phenotype relationship. The global human metabolic network, termed Recon 1, has recently been reconstructed allowing the systems analysis of human metabolic physiology and pathology. Utilizing high-throughput data, Recon 1 has recently been tailored to different cells and tissues, including the liver, kidney, brain, and alveolar macrophage. These models have shown utility in the study of systems medicine. However, no integrated analysis between human tissues has been done. Results To describe tissue-specific functions, Recon 1 was tailored to describe metabolism in three human cells: adipocytes, hepatocytes, and myocytes. These cell-specific networks were manually curated and validated based on known cellular metabolic functions. To study intercellular interactions, a novel multi-tissue type modeling approach was developed to integrate the metabolic functions for the three cell types, and subsequently used to simulate known integrated metabolic cycles. In addition, the multi-tissue model was used to study diabetes: a pathology with systemic properties. High-throughput data was integrated with the network to determine differential metabolic activity between obese and type II obese gastric bypass patients in a whole-body context. Conclusion The multi-tissue type modeling approach presented provides a platform to study integrated metabolic states. As more cell and tissue-specific models are released, it is critical to develop a framework in which to study their interdependencies. PMID:22041191

  9. Precise, High-throughput Analysis of Bacterial Growth.

    PubMed

    Kurokawa, Masaomi; Ying, Bei-Wen

    2017-09-19

    Bacterial growth is a central concept in the development of modern microbial physiology, as well as in the investigation of cellular dynamics at the systems level. Recent studies have reported correlations between bacterial growth and genome-wide events, such as genome reduction and transcriptome reorganization. Correctly analyzing bacterial growth is crucial for understanding the growth-dependent coordination of gene functions and cellular components. Accordingly, the precise quantitative evaluation of bacterial growth in a high-throughput manner is required. Emerging technological developments offer new experimental tools that allow updates of the methods used for studying bacterial growth. The protocol introduced here employs a microplate reader with a highly optimized experimental procedure for the reproducible and precise evaluation of bacterial growth. This protocol was used to evaluate the growth of several previously described Escherichia coli strains. The main steps of the protocol are as follows: the preparation of a large number of cell stocks in small vials for repeated tests with reproducible results, the use of 96-well plates for high-throughput growth evaluation, and the manual calculation of two major parameters (i.e., maximal growth rate and population density) representing the growth dynamics. In comparison to the traditional colony-forming unit (CFU) assay, which counts the cells that are cultured in glass tubes over time on agar plates, the present method is more efficient and provides more detailed temporal records of growth changes, but has a stricter detection limit at low population densities. In summary, the described method is advantageous for the precise and reproducible high-throughput analysis of bacterial growth, which can be used to draw conceptual conclusions or to make theoretical observations.

  10. Characterization and complete genome sequence of a panicovirus from Bermuda grass by high-throughput sequencing.

    PubMed

    Tahir, Muhammad N; Lockhart, Ben; Grinstead, Samuel; Mollov, Dimitre

    2017-04-01

    Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high-throughput sequencing (HTS). The nearly full genome sequence of a panicovirus was identified from one HTS scaffold. Sanger sequencing was used to confirm the HTS results and complete the genome sequence of 4404 nt. This virus was provisionally named Bermuda grass latent virus (BGLV). Its predicted open reading frames follow the typical arrangement of the genus Panicovirus. Based on sequence comparisons and phylogenetic analyses BGLV differs from other viruses and therefore taxonomically it is a new member of the genus Panicovirus, family Tombusviridae.

  11. High-Throughput Sequencing Reveals Principles of Adeno-Associated Virus Serotype 2 Integration

    PubMed Central

    Janovitz, Tyler; Klein, Isaac A.; Oliveira, Thiago; Mukherjee, Piali; Nussenzweig, Michel C.; Sadelain, Michel

    2013-01-01

    Viral integrations are important in human biology, yet genome-wide integration profiles have not been determined for many viruses. Adeno-associated virus (AAV) infects most of the human population and is a prevalent gene therapy vector. AAV integrates into the human genome with preference for a single locus, termed AAVS1. However, the genome-wide integration of AAV has not been defined, and the principles underlying this recombination remain unclear. Using a novel high-throughput approach, integrant capture sequencing, nearly 12 million AAV junctions were recovered from a human cell line, providing five orders of magnitude more data than were previously available. Forty-five percent of integrations occurred near AAVS1, and several thousand novel integration hotspots were identified computationally. Most of these occurred in genes, with dozens of hotspots targeting known oncogenes. Viral replication protein binding sites (RBS) and transcriptional activity were major factors favoring integration. In a first for eukaryotic viruses, the data reveal a unique asymmetric integration profile with distinctive directional orientation of viral genomes. These studies provide a new understanding of AAV integration biology through the use of unbiased high-throughput data acquisition and bioinformatics. PMID:23720718

  12. MPD: a pathogen genome and metagenome database

    PubMed Central

    Zhang, Tingting; Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Wen

    2018-01-01

    Abstract Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes. Database URL: http://data.mypathogen.org PMID:29917040

  13. Curated protein information in the Saccharomyces genome database.

    PubMed

    Hellerstedt, Sage T; Nash, Robert S; Weng, Shuai; Paskov, Kelley M; Wong, Edith D; Karra, Kalpana; Engel, Stacia R; Cherry, J Michael

    2017-01-01

    Due to recent advancements in the production of experimental proteomic data, the Saccharomyces genome database (SGD; www.yeastgenome.org ) has been expanding our protein curation activities to make new data types available to our users. Because of broad interest in post-translational modifications (PTM) and their importance to protein function and regulation, we have recently started incorporating expertly curated PTM information on individual protein pages. Here we also present the inclusion of new abundance and protein half-life data obtained from high-throughput proteome studies. These new data types have been included with the aim to facilitate cellular biology research. : www.yeastgenome.org. © The Author(s) 2017. Published by Oxford University Press.

  14. Identification of structural variation in mouse genomes.

    PubMed

    Keane, Thomas M; Wong, Kim; Adams, David J; Flint, Jonathan; Reymond, Alexandre; Yalcin, Binnaz

    2014-01-01

    Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.

  15. Single-cell genomics for the masses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tringe, Susannah G.

    In this issue of Nature Biotechnology, Lan et al. describe a new tool in the toolkit for studying uncultivated microbial communities, enabling orders of magnitude higher single cell genome throughput than previous methods. This is achieved by a complex droplet microfluidics workflow encompassing steps from physical cell isolation through genome sequencing, producing tens of thousands of lowcoverage genomes from individual cells.

  16. Single-cell genomics for the masses

    DOE PAGES

    Tringe, Susannah G.

    2017-07-12

    In this issue of Nature Biotechnology, Lan et al. describe a new tool in the toolkit for studying uncultivated microbial communities, enabling orders of magnitude higher single cell genome throughput than previous methods. This is achieved by a complex droplet microfluidics workflow encompassing steps from physical cell isolation through genome sequencing, producing tens of thousands of lowcoverage genomes from individual cells.

  17. GenomicTools: a computational platform for developing high-throughput analytics in genomics.

    PubMed

    Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo

    2012-01-15

    Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.

  18. StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

    PubMed

    Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A

    2017-10-15

    Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. favorov@sensi.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  19. Evaluation of sequencing approaches for high-throughput toxicogenomics (SOT)

    EPA Science Inventory

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platfo...

  20. A high-quality annotated transcriptome of swine peripheral blood

    USDA-ARS?s Scientific Manuscript database

    Background: High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes an...

  1. Picking Cell Lines for High-Throughput Transcriptomic Toxicity Screening (SOT)

    EPA Science Inventory

    High throughput, whole genome transcriptomic profiling is a promising approach to comprehensively evaluate chemicals for potential biological effects. To be useful for in vitro toxicity screening, gene expression must be quantified in a set of representative cell types that captu...

  2. TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.

    PubMed

    Menges, Fabian; Narzisi, Giuseppe; Mishra, Bud

    2011-09-01

    Currently, re-sequencing approaches use multiple modules serially to interpret raw sequencing data from next-generation sequencing platforms, while remaining oblivious to the genomic information until the final alignment step. Such approaches fail to exploit the full information from both raw sequencing data and the reference genome that can yield better quality sequence reads, SNP-calls, variant detection, as well as an alignment at the best possible location in the reference genome. Thus, there is a need for novel reference-guided bioinformatics algorithms for interpreting analog signals representing sequences of the bases ({A, C, G, T}), while simultaneously aligning possible sequence reads to a source reference genome whenever available. Here, we propose a new base-calling algorithm, TotalReCaller, to achieve improved performance. A linear error model for the raw intensity data and Burrows-Wheeler transform (BWT) based alignment are combined utilizing a Bayesian score function, which is then globally optimized over all possible genomic locations using an efficient branch-and-bound approach. The algorithm has been implemented in soft- and hardware [field-programmable gate array (FPGA)] to achieve real-time performance. Empirical results on real high-throughput Illumina data were used to evaluate TotalReCaller's performance relative to its peers-Bustard, BayesCall, Ibis and Rolexa-based on several criteria, particularly those important in clinical and scientific applications. Namely, it was evaluated for (i) its base-calling speed and throughput, (ii) its read accuracy and (iii) its specificity and sensitivity in variant calling. A software implementation of TotalReCaller as well as additional information, is available at: http://bioinformatics.nyu.edu/wordpress/projects/totalrecaller/ fabian.menges@nyu.edu.

  3. Using Genome-Wide Expression Profiling to Define Gene Networks Relevant to the Study of Complex Traits: From RNA Integrity to Network Topology

    PubMed Central

    O'Brien, M.A.; Costin, B.N.; Miles, M.F.

    2014-01-01

    Postgenomic studies of the function of genes and their role in disease have now become an area of intense study since efforts to define the raw sequence material of the genome have largely been completed. The use of whole-genome approaches such as microarray expression profiling and, more recently, RNA-sequence analysis of transcript abundance has allowed an unprecedented look at the workings of the genome. However, the accurate derivation of such high-throughput data and their analysis in terms of biological function has been critical to truly leveraging the postgenomic revolution. This chapter will describe an approach that focuses on the use of gene networks to both organize and interpret genomic expression data. Such networks, derived from statistical analysis of large genomic datasets and the application of multiple bioinformatics data resources, poten-tially allow the identification of key control elements for networks associated with human disease, and thus may lead to derivation of novel therapeutic approaches. However, as discussed in this chapter, the leveraging of such networks cannot occur without a thorough understanding of the technical and statistical factors influencing the derivation of genomic expression data. Thus, while the catch phrase may be “it's the network … stupid,” the understanding of factors extending from RNA isolation to genomic profiling technique, multivariate statistics, and bioinformatics are all critical to defining fully useful gene networks for study of complex biology. PMID:23195313

  4. Identifying genes that extend life span using a high-throughput screening system.

    PubMed

    Chen, Cuiying; Contreras, Roland

    2007-01-01

    We developed a high-throughput functional genomic screening system that allows identification of genes prolonging lifespan in the baker's yeast Saccharomyces cerevisiae. The method is based on isolating yeast mother cells with a higher than average number of cell divisions as indicated by the number of bud scars on their surface. Fluorescently labeled wheat germ agglutinin (WGA) was used for specific staining of chitin, a major component of bud scars. The critical new steps in our bud-scar-sorting system are the use of small microbeads, which allows successive rounds of purification and regrowth of the mother cells (M-cell), and utilization of flow cytometry to sort and isolate cells with a longer lifespan based on the number of bud scars specifically labeled with WGA.

  5. Lessons from high-throughput protein crystallization screening: 10 years of practical experience

    PubMed Central

    JR, Luft; EH, Snell; GT, DeTitta

    2011-01-01

    Introduction X-ray crystallography provides the majority of our structural biological knowledge at a molecular level and in terms of pharmaceutical design is a valuable tool to accelerate discovery. It is the premier technique in the field, but its usefulness is significantly limited by the need to grow well-diffracting crystals. It is for this reason that high-throughput crystallization has become a key technology that has matured over the past 10 years through the field of structural genomics. Areas covered The authors describe their experiences in high-throughput crystallization screening in the context of structural genomics and the general biomedical community. They focus on the lessons learnt from the operation of a high-throughput crystallization screening laboratory, which to date has screened over 12,500 biological macromolecules. They also describe the approaches taken to maximize the success while minimizing the effort. Through this, the authors hope that the reader will gain an insight into the efficient design of a laboratory and protocols to accomplish high-throughput crystallization on a single-, multiuser-laboratory or industrial scale. Expert Opinion High-throughput crystallization screening is readily available but, despite the power of the crystallographic technique, getting crystals is still not a solved problem. High-throughput approaches can help when used skillfully; however, they still require human input in the detailed analysis and interpretation of results to be more successful. PMID:22646073

  6. Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

    PubMed Central

    Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

    2007-01-01

    While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434

  7. ARTS: a web-based tool for the set-up of high-throughput genome-wide mapping panels for the SNP genotyping of mouse mutants.

    PubMed

    Klaften, Matthias; Hrabé de Angelis, Martin

    2005-07-01

    Genome-wide mapping in the identification of novel candidate genes has always been the standard method in genetics and genomics to correlate a clinically interesting phenotypic trait with a genotype. However, the performance of a mapping experiment using classical microsatellite approaches can be very time consuming. The high-throughput analysis of single-nucleotide polymorphisms (SNPs) has the potential of being the successor of microsatellite analysis routinely used for these mapping approaches, where one of the major obstacles is the design of the appropriate SNP marker set itself. Here we report on ARTS, an advanced retrieval tool for SNPs, which allows researchers to comb freely the public mouse dbSNP database for multiple reference and test strains. Several filters can be applied in order to improve the sensitivity and the specificity of the search results. By employing the panel generator function of this program, it is possible to abbreviate the extraction of reliable sequence data for a large marker panel including several different mouse strains from days to minutes. The concept of ARTS is easily adaptable to other species for which SNP databases are available, making it a versatile tool for the use of SNPs as markers for genotyping. The web interface is accessible at http://andromeda.gsf.de/arts.

  8. High-throughput molecular analysis in lung cancer: insights into biology and potential clinical applications.

    PubMed

    Ocak, S; Sos, M L; Thomas, R K; Massion, P P

    2009-08-01

    During the last decade, high-throughput technologies including genomic, epigenomic, transcriptomic and proteomic have been applied to further our understanding of the molecular pathogenesis of this heterogeneous disease, and to develop strategies that aim to improve the management of patients with lung cancer. Ultimately, these approaches should lead to sensitive, specific and noninvasive methods for early diagnosis, and facilitate the prediction of response to therapy and outcome, as well as the identification of potential novel therapeutic targets. Genomic studies were the first to move this field forward by providing novel insights into the molecular biology of lung cancer and by generating candidate biomarkers of disease progression. Lung carcinogenesis is driven by genetic and epigenetic alterations that cause aberrant gene function; however, the challenge remains to pinpoint the key regulatory control mechanisms and to distinguish driver from passenger alterations that may have a small but additive effect on cancer development. Epigenetic regulation by DNA methylation and histone modifications modulate chromatin structure and, in turn, either activate or silence gene expression. Proteomic approaches critically complement these molecular studies, as the phenotype of a cancer cell is determined by proteins and cannot be predicted by genomics or transcriptomics alone. The present article focuses on the technological platforms available and some proposed clinical applications. We illustrate herein how the "-omics" have revolutionised our approach to lung cancer biology and hold promise for personalised management of lung cancer.

  9. Cell Lines Models of Drug Response: Successes and Lessons from this Pharmacogenomic Model

    PubMed Central

    Jack, J.; Rotroff, D.; Motsinger-Reif, A.

    2015-01-01

    A new standard for medicine is emerging that aims to improve individual drug responses through studying associations with genetic variations. This field, pharmacogenomics, is undergoing a rapid expansion due to a variety of technological advancements that are enabling higher throughput with reductions in cost. Here we review the advantages, limitations, and opportunities for using lymphoblastoid cell lines (LCL) as a model system for human pharmacogenomic studies. There are a wide range of publicly available resources with genome-wide data available for LCLs from both related and unrelated populations, removing the cost of genotyping the data for drug response studies. Furthermore, in contrast to human clinical trials or in vivo model systems, with high-throughput in vitro screening technologies, pharmacogenomics studies can easily be scaled to accommodate large sample sizes. An important component to leveraging genome-wide data in LCL models is association mapping. Several methods are discussed herein, and include multivariate concentration response modeling, issues with multiple testing, and successful examples of the ‘triangle model’ to identify candidate variants. Once candidate gene variants have been determined, their biological roles can be elucidated using pathway analyses and functionally confirmed using siRNA knockdown experiments. The wealth of genomics data being produced using related and unrelated populations is creating many exciting opportunities leading to new insights into the genetic contribution and heritability of drug response. PMID:25109794

  10. High-throughput single-molecule telomere characterization.

    PubMed

    McCaffrey, Jennifer; Young, Eleanor; Lassahn, Katy; Sibert, Justin; Pastor, Steven; Riethman, Harold; Xiao, Ming

    2017-11-01

    We have developed a novel method that enables global subtelomere and haplotype-resolved analysis of telomere lengths at the single-molecule level. An in vitro CRISPR/Cas9 RNA-directed nickase system directs the specific labeling of human (TTAGGG)n DNA tracts in genomes that have also been barcoded using a separate nickase enzyme that recognizes a 7-bp motif genome-wide. High-throughput imaging and analysis of large DNA single molecules from genomes labeled in this fashion using a nanochannel array system permits mapping through subtelomere repeat element (SRE) regions to unique chromosomal DNA while simultaneously measuring the (TTAGGG)n tract length at the end of each large telomere-terminal DNA segment. The methodology also permits subtelomere and haplotype-resolved analyses of SRE organization and variation, providing a window into the population dynamics and potential functions of these complex and structurally variant telomere-adjacent DNA regions. At its current stage of development, the assay can be used to identify and characterize telomere length distributions of 30-35 discrete telomeres simultaneously and accurately. The assay's utility is demonstrated using early versus late passage and senescent human diploid fibroblasts, documenting the anticipated telomere attrition on a global telomere-by-telomere basis as well as identifying subtelomere-specific biases for critically short telomeres. Similarly, we present the first global single-telomere-resolved analyses of two cancer cell lines. © 2017 McCaffrey et al.; Published by Cold Spring Harbor Laboratory Press.

  11. SMM-system: A mining tool to identify specific markers in Salmonella enterica.

    PubMed

    Yu, Shuijing; Liu, Weibing; Shi, Chunlei; Wang, Dapeng; Dan, Xianlong; Li, Xiao; Shi, Xianming

    2011-03-01

    This report presents SMM-system, a software package that implements various personalized pre- and post-BLASTN tasks for mining specific markers of microbial pathogens. The main functionalities of SMM-system are summarized as follows: (i) converting multi-FASTA file, (ii) cutting interesting genomic sequence, (iii) automatic high-throughput BLASTN searches, and (iv) screening target sequences. The utility of SMM-system was demonstrated by using it to identify 214 Salmonella enterica-specific protein-coding sequences (CDSs). Eighteen primer pairs were designed based on eighteen S. enterica-specific CDSs, respectively. Seven of these primer pairs were validated with PCR assay, which showed 100% inclusivity for the 101 S. enterica genomes and 100% exclusivity of 30 non-S. enterica genomes. Three specific primer pairs were chosen to develop a multiplex PCR assay, which generated specific amplicons with a size of 180bp (SC1286), 238bp (SC1598) and 405bp (SC4361), respectively. This study demonstrates that SMM-system is a high-throughput specific marker generation tool that can be used to identify genus-, species-, serogroup- and even serovar-specific DNA sequences of microbial pathogens, which has a potential to be applied in food industries, diagnostics and taxonomic studies. SMM-system is freely available and can be downloaded from http://foodsafety.sjtu.edu.cn/SMM-system.html. Copyright © 2011 Elsevier B.V. All rights reserved.

  12. Structural analysis of the α subunit of Na(+)/K(+) ATPase genes in invertebrates.

    PubMed

    Thabet, Rahma; Rouault, J-D; Ayadi, Habib; Leignel, Vincent

    2016-01-01

    The Na(+)/K(+) ATPase is a ubiquitous pump coordinating the transport of Na(+) and K(+) across the membrane of cells and its role is fundamental to cellular functions. It is heteromer in eukaryotes including two or three subunits (α, β and γ which is specific to the vertebrates). The catalytic functions of the enzyme have been attributed to the α subunit. Several complete α protein sequences are available, but only few gene structures were characterized. We identified the genomic sequences coding the α-subunit of the Na(+)/K(+) ATPase, from the whole-genome shotgun contigs (WGS), NCBI Genomes (chromosome), Genomic Survey Sequences (GSS) and High Throughput Genomic Sequences (HTGS) databases across distinct phyla. One copy of the α subunit gene was found in Annelida, Arthropoda, Cnidaria, Echinodermata, Hemichordata, Mollusca, Placozoa, Porifera, Platyhelminthes, Urochordata, but the nematodes seem to possess 2 to 4 copies. The number of introns varied from 0 (Platyhelminthes) to 26 (Porifera); and their localization and length are also highly variable. Molecular phylogenies (Maximum Likelihood and Maximum Parsimony methods) showed some clusters constituted by (Chordata/(Echinodermata/Hemichordata)) or (Plathelminthes/(Annelida/Mollusca)) and a basal position for Porifera. These structural analyses increase our knowledge about the evolutionary events of the α subunit genes in the invertebrates. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  14. DArT Markers Effectively Target Gene Space in the Rye Genome

    PubMed Central

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625

  15. DArT Markers Effectively Target Gene Space in the Rye Genome.

    PubMed

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.

  16. Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

    PubMed Central

    Menon, Rajasree; Wen, Yuchen; Omenn, Gilbert S.; Kretzler, Matthias; Guan, Yuanfang

    2013-01-01

    Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions. PMID:24244129

  17. Maize HapMap2 identifies extant variation from a genome in flux

    USDA-ARS?s Scientific Manuscript database

    The maize genome is the largest, most diverse and complex plant genome sequenced to date. Using high-throughput sequencing to access genetic variation and a population genetics model to score the polymorphisms, we characterize and unite the diversity of the world’s key breeding germplasm, wild rela...

  18. A Systems Biology Approach to Link Nuclear Factor Kappa B Activation with Lethal Prostate Cancer

    DTIC Science & Technology

    2014-05-01

    developed as a routine clinical assay. 12 Task 1B: Perform protein profiling of circulating blood proteins and determine whether a protein...or set of proteins indicative of NFκB activation are associated with lethal prostate cancer. Circulating proteins will be assessed in two cohorts of...throughput functional genomic data. Nucleic acids research 2009;37:D885-90. 3. Parkinson H, Kapushesky M, Kolesnikov N, et al. ArrayExpress update--from

  19. Development of a novel set of Gateway-compatible vectors for live imaging in insect cells.

    PubMed

    Maroniche, G A; Mongelli, V C; Alfonso, V; Llauger, G; Taboga, O; del Vas, Mariana

    2011-10-01

    Insect genomics is a growing area of research. To exploit fully the genomic data that are being generated, high-throughput systems for the functional characterization of insect proteins and their interactomes are required. In this work, a Gateway-compatible vector set for expression of fluorescent fusion proteins in insect cells was developed. The vector set was designed to express a protein of interest fused to any of four different fluorescent proteins [green fluorescent protein (GFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP) and mCherry] by either the C-terminal or the N-terminal ends. Additionally, a collection of organelle-specific fluorescent markers was assembled for colocalization with fluorescent recombinant proteins of interest. Moreover, the vector set was proven to be suitable for simultaneously detecting up to three proteins by multiple labelling. The use of the vector set was exemplified by defining the subcellular distribution of Mal de Río Cuarto virus (MRCV) outer coat protein P10 and by analysing the in vivo self-interaction of the MRCV viroplasm matrix protein P9-1 in Förster resonance energy transfer (FRET) experiments. In conclusion, we have developed a valuable tool for high-throughput studies of protein subcellular localization that will aid in the elucidation of the function of newly described insect and virus proteins. © 2011 The Authors. Insect Molecular Biology © 2011 The Royal Entomological Society.

  20. A high-throughput and quantitative method to assess the mutagenic potential of translesion DNA synthesis

    PubMed Central

    Taggart, David J.; Camerlengo, Terry L.; Harrison, Jason K.; Sherrer, Shanen M.; Kshetry, Ajay K.; Taylor, John-Stephen; Huang, Kun; Suo, Zucai

    2013-01-01

    Cellular genomes are constantly damaged by endogenous and exogenous agents that covalently and structurally modify DNA to produce DNA lesions. Although most lesions are mended by various DNA repair pathways in vivo, a significant number of damage sites persist during genomic replication. Our understanding of the mutagenic outcomes derived from these unrepaired DNA lesions has been hindered by the low throughput of existing sequencing methods. Therefore, we have developed a cost-effective high-throughput short oligonucleotide sequencing assay that uses next-generation DNA sequencing technology for the assessment of the mutagenic profiles of translesion DNA synthesis catalyzed by any error-prone DNA polymerase. The vast amount of sequencing data produced were aligned and quantified by using our novel software. As an example, the high-throughput short oligonucleotide sequencing assay was used to analyze the types and frequencies of mutations upstream, downstream and at a site-specifically placed cis–syn thymidine–thymidine dimer generated individually by three lesion-bypass human Y-family DNA polymerases. PMID:23470999

  1. Extensive genome rearrangements and multiple horizontal gene transfers in a population of pyrococcus isolates from Vulcano Island, Italy.

    PubMed

    White, James R; Escobar-Paramo, Patricia; Mongodin, Emmanuel F; Nelson, Karen E; DiRuggiero, Jocelyne

    2008-10-01

    The extent of chromosome rearrangements in Pyrococcus isolates from marine hydrothermal vents in Vulcano Island, Italy, was evaluated by high-throughput genomic methods. The results illustrate the dynamic nature of the genomes of the genus Pyrococcus and raise the possibility of a connection between rapidly changing environmental conditions and adaptive genomic properties.

  2. Comparison of Burrows-Wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: application to Illumina data for livestock genomes

    USDA-ARS?s Scientific Manuscript database

    Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Eff...

  3. Evolutionary dynamics of retrotransposons assessed by high-throughput sequencing in wild relatives of wheat.

    PubMed

    Senerchia, Natacha; Wicker, Thomas; Felber, François; Parisod, Christian

    2013-01-01

    Transposable elements (TEs) represent a major fraction of plant genomes and drive their evolution. An improved understanding of genome evolution requires the dynamics of a large number of TE families to be considered. We put forward an approach bypassing the required step of a complete reference genome to assess the evolutionary trajectories of high copy number TE families from genome snapshot with high-throughput sequencing. Low coverage sequencing of the complex genomes of Aegilops cylindrica and Ae. geniculata using 454 identified more than 70% of the sequences as known TEs, mainly long terminal repeat (LTR) retrotransposons. Comparing the abundance of reads as well as patterns of sequence diversity and divergence within and among genomes assessed the dynamics of 44 major LTR retrotransposon families of the 165 identified. In particular, molecular population genetics on individual TE copies distinguished recently active from quiescent families and highlighted different evolutionary trajectories of retrotransposons among related species. This work presents a suite of tools suitable for current sequencing data, allowing to address the genome-wide evolutionary dynamics of TEs at the family level and advancing our understanding of the evolution of nonmodel genomes.

  4. High-throughput microscopy must re-invent the microscope rather than speed up its functions

    PubMed Central

    Oheim, M

    2007-01-01

    Knowledge gained from the revolutions in genomics and proteomics has helped to identify many of the key molecules involved in cellular signalling. Researchers, both in academia and in the pharmaceutical industry, now screen, at a sub-cellular level, where and when these proteins interact. Fluorescence imaging and molecular labelling combine to provide a powerful tool for real-time functional biochemistry with molecular resolution. However, they traditionally have been work-intensive, required trained personnel, and suffered from low through-put due to sample preparation, loading and handling. The need for speeding up microscopy is apparent from the tremendous complexity of cellular signalling pathways, the inherent biological variability, as well as the possibility that the same molecule plays different roles in different sub-cellular compartments. Research institutes and companies have teamed up to develop imaging cytometers of ever-increasing complexity. However, to truly go high-speed, sub-cellular imaging must free itself from the rigid framework of current microscopes. PMID:17603553

  5. Computer applications making rapid advances in high throughput microbial proteomics (HTMP).

    PubMed

    Anandkumar, Balakrishna; Haga, Steve W; Wu, Hui-Fen

    2014-02-01

    The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as protein-protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Genomics Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery.

  6. [Complete genome sequencing of polymalic acid-producing strain Aureobasidium pullulans CCTCC M2012223].

    PubMed

    Wang, Yongkang; Song, Xiaodan; Li, Xiaorong; Yang, Sang-tian; Zou, Xiang

    2017-01-04

    To explore the genome sequence of Aureobasidium pullulans CCTCC M2012223, analyze the key genes related to the biosynthesis of important metabolites, and provide genetic background for metabolic engineering. Complete genome of A. pullulans CCTCC M2012223 was sequenced by Illumina HiSeq high throughput sequencing platform. Then, fragment assembly, gene prediction, functional annotation, and GO/COG cluster were analyzed in comparison with those of other five A. pullulans varieties. The complete genome sequence of A. pullulans CCTCC M2012223 was 30756831 bp with an average GC content of 47.49%, and 9452 genes were successfully predicted. Genome-wide analysis showed that A. pullulans CCTCC M2012223 had the biggest genome assembly size. Protein sequences involved in the pullulan and polymalic acid pathway were highly conservative in all of six A. pullulans varieties. Although both A. pullulans CCTCC M2012223 and A. pullulans var. melanogenum have a close affinity, some point mutation and inserts were occurred in protein sequences involved in melanin biosynthesis. Genome information of A. pullulans CCTCC M2012223 was annotated and genes involved in melanin, pullulan and polymalic acid pathway were compared, which would provide a theoretical basis for genetic modification of metabolic pathway in A. pullulans.

  7. Genomic and metagenomic challenges and opportunities for bioleaching: a mini-review.

    PubMed

    Cárdenas, Juan Pablo; Quatrini, Raquel; Holmes, David S

    2016-09-01

    High-throughput genomic technologies are accelerating progress in understanding the diversity of microbial life in many environments. Here we highlight advances in genomics and metagenomics of microorganisms from bioleaching heaps and related acidic mining environments. Bioleaching heaps used for copper recovery provide significant opportunities to study the processes and mechanisms underlying microbial successions and the influence of community composition on ecosystem functioning. Obtaining quantitative and process-level knowledge of these dynamics is pivotal for understanding how microorganisms contribute to the solubilization of copper for industrial recovery. Advances in DNA sequencing technology provide unprecedented opportunities to obtain information about the genomes of bioleaching microorganisms, allowing predictive models of metabolic potential and ecosystem-level interactions to be constructed. These approaches are enabling predictive phenotyping of organisms many of which are recalcitrant to genetic approaches or are unculturable. This mini-review describes current bioleaching genomic and metagenomic projects and addresses the use of genome information to: (i) build metabolic models; (ii) predict microbial interactions; (iii) estimate genetic diversity; and (iv) study microbial evolution. Key challenges and perspectives of bioleaching genomics/metagenomics are addressed. Copyright © 2016 The Author(s). Published by Elsevier Masson SAS.. All rights reserved.

  8. Target genes discovery through copy number alteration analysis in human hepatocellular carcinoma.

    PubMed

    Gu, De-Leung; Chen, Yen-Hsieh; Shih, Jou-Ho; Lin, Chi-Hung; Jou, Yuh-Shan; Chen, Chian-Feng

    2013-12-21

    High-throughput short-read sequencing of exomes and whole cancer genomes in multiple human hepatocellular carcinoma (HCC) cohorts confirmed previously identified frequently mutated somatic genes, such as TP53, CTNNB1 and AXIN1, and identified several novel genes with moderate mutation frequencies, including ARID1A, ARID2, MLL, MLL2, MLL3, MLL4, IRF2, ATM, CDKN2A, FGF19, PIK3CA, RPS6KA3, JAK1, KEAP1, NFE2L2, C16orf62, LEPR, RAC2, and IL6ST. Functional classification of these mutated genes suggested that alterations in pathways participating in chromatin remodeling, Wnt/β-catenin signaling, JAK/STAT signaling, and oxidative stress play critical roles in HCC tumorigenesis. Nevertheless, because there are few druggable genes used in HCC therapy, the identification of new therapeutic targets through integrated genomic approaches remains an important task. Because a large amount of HCC genomic data genotyped by high density single nucleotide polymorphism arrays is deposited in the public domain, copy number alteration (CNA) analyses of these arrays is a cost-effective way to reveal target genes through profiling of recurrent and overlapping amplicons, homozygous deletions and potentially unbalanced chromosomal translocations accumulated during HCC progression. Moreover, integration of CNAs with other high-throughput genomic data, such as aberrantly coding transcriptomes and non-coding gene expression in human HCC tissues and rodent HCC models, provides lines of evidence that can be used to facilitate the identification of novel HCC target genes with the potential of improving the survival of HCC patients.

  9. Ecological roles of dominant and rare prokaryotes in acid mine drainage revealed by metagenomics and metatranscriptomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hua, Zheng-Shuang; Han, Yu-Jiao; Chen, Lin-Xing

    Here we report that high-throughput sequencing is expanding our knowledge of microbial diversity in the environment. Still, understanding the metabolic potentials and ecological roles of rare and uncultured microbes in natural communities remains a major challenge. To this end, we applied a ‘divide and conquer’ strategy that partitioned a massive metagenomic data set (>100 Gbp) into subsets based on K-mer frequency in sequence assembly to a low-diversity acid mine drainage (AMD) microbial community and, by integrating with an additional metatranscriptomic assembly, successfully obtained 11 draft genomes most of which represent yet uncultured and/or rare taxa (relative abundance <1%). We reportmore » the first genome of a naturally occurring Ferrovum population (relative abundance >90%) and its metabolic potentials and gene expression profile, providing initial molecular insights into the ecological role of these lesser known, but potentially important, microorganisms in the AMD environment. Gene transcriptional analysis of the active taxa revealed major metabolic capabilities executed in situ, including carbon- and nitrogen-related metabolisms associated with syntrophic interactions, iron and sulfur oxidation, which are key in energy conservation and AMD generation, and the mechanisms of adaptation and response to the environmental stresses (heavy metals, low pH and oxidative stress). Remarkably, nitrogen fixation and sulfur oxidation were performed by the rare taxa, indicating their critical roles in the overall functioning and assembly of the AMD community. Finally, our study demonstrates the potential of the ‘divide and conquer’ strategy in high-throughput sequencing data assembly for genome reconstruction and functional partitioning analysis of both dominant and rare species in natural microbial assemblages.« less

  10. Vaccine candidate discovery for the next generation of malaria vaccines.

    PubMed

    Tuju, James; Kamuyu, Gathoni; Murungi, Linda M; Osier, Faith H A

    2017-10-01

    Although epidemiological observations, IgG passive transfer studies and experimental infections in humans all support the feasibility of developing highly effective malaria vaccines, the precise antigens that induce protective immunity remain uncertain. Here, we review the methodologies applied to vaccine candidate discovery for Plasmodium falciparum malaria from the pre- to post-genomic era. Probing of genomic and cDNA libraries with antibodies of defined specificities or functional activity predominated the former, whereas reverse vaccinology encompassing high throughput in silico analyses of genomic, transcriptomic or proteomic parasite data sets is the mainstay of the latter. Antibody-guided vaccine design spanned both eras but currently benefits from technological advances facilitating high-throughput screening and downstream applications. We make the case that although we have exponentially increased our ability to identify numerous potential vaccine candidates in a relatively short space of time, a significant bottleneck remains in their validation and prioritization for evaluation in clinical trials. Longitudinal cohort studies provide supportive evidence but results are often conflicting between studies. Demonstration of antigen-specific antibody function is valuable but the relative importance of one mechanism over another with regards to protection remains undetermined. Animal models offer useful insights but may not accurately reflect human disease. Challenge studies in humans are preferable but prohibitively expensive. In the absence of reliable correlates of protection, suitable animal models or a better understanding of the mechanisms underlying protective immunity in humans, vaccine candidate discovery per se may not be sufficient to provide the paradigm shift necessary to develop the next generation of highly effective subunit malaria vaccines. © 2017 The Authors. Immunology Published by John Wiley & Sons Ltd.

  11. Ecological roles of dominant and rare prokaryotes in acid mine drainage revealed by metagenomics and metatranscriptomics

    DOE PAGES

    Hua, Zheng-Shuang; Han, Yu-Jiao; Chen, Lin-Xing; ...

    2014-11-07

    Here we report that high-throughput sequencing is expanding our knowledge of microbial diversity in the environment. Still, understanding the metabolic potentials and ecological roles of rare and uncultured microbes in natural communities remains a major challenge. To this end, we applied a ‘divide and conquer’ strategy that partitioned a massive metagenomic data set (>100 Gbp) into subsets based on K-mer frequency in sequence assembly to a low-diversity acid mine drainage (AMD) microbial community and, by integrating with an additional metatranscriptomic assembly, successfully obtained 11 draft genomes most of which represent yet uncultured and/or rare taxa (relative abundance <1%). We reportmore » the first genome of a naturally occurring Ferrovum population (relative abundance >90%) and its metabolic potentials and gene expression profile, providing initial molecular insights into the ecological role of these lesser known, but potentially important, microorganisms in the AMD environment. Gene transcriptional analysis of the active taxa revealed major metabolic capabilities executed in situ, including carbon- and nitrogen-related metabolisms associated with syntrophic interactions, iron and sulfur oxidation, which are key in energy conservation and AMD generation, and the mechanisms of adaptation and response to the environmental stresses (heavy metals, low pH and oxidative stress). Remarkably, nitrogen fixation and sulfur oxidation were performed by the rare taxa, indicating their critical roles in the overall functioning and assembly of the AMD community. Finally, our study demonstrates the potential of the ‘divide and conquer’ strategy in high-throughput sequencing data assembly for genome reconstruction and functional partitioning analysis of both dominant and rare species in natural microbial assemblages.« less

  12. New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era (2010 JGI/ANL HPC Workshop)

    ScienceCinema

    Notredame, Cedric

    2018-05-02

    Cedric Notredame from the Centre for Genomic Regulation gives a presentation on New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era at the JGI/Argonne HPC Workshop on January 26, 2010.

  13. Prunus transcription factors: breeding perspectives

    PubMed Central

    Bianchi, Valmor J.; Rubio, Manuel; Trainotti, Livio; Verde, Ignazio; Bonghi, Claudio; Martínez-Gómez, Pedro

    2015-01-01

    Many plant processes depend on differential gene expression, which is generally controlled by complex proteins called transcription factors (TFs). In peach, 1533 TFs have been identified, accounting for about 5.5% of the 27,852 protein-coding genes. These TFs are the reference for the rest of the Prunus species. TF studies in Prunus have been performed on the gene expression analysis of different agronomic traits, including control of the flowering process, fruit quality, and biotic and abiotic stress resistance. These studies, using quantitative RT-PCR, have mainly been performed in peach, and to a lesser extent in other species, including almond, apricot, black cherry, Fuji cherry, Japanese apricot, plum, and sour and sweet cherry. Other tools have also been used in TF studies, including cDNA-AFLP, LC-ESI-MS, RNA, and DNA blotting or mapping. More recently, new tools assayed include microarray and high-throughput DNA sequencing (DNA-Seq) and RNA sequencing (RNA-Seq). New functional genomics opportunities include genome resequencing and the well-known synteny among Prunus genomes and transcriptomes. These new functional studies should be applied in breeding programs in the development of molecular markers. With the genome sequences available, some strategies that have been used in model systems (such as SNP genotyping assays and genotyping-by-sequencing) may be applicable in the functional analysis of Prunus TFs as well. In addition, the knowledge of the gene functions and position in the peach reference genome of the TFs represents an additional advantage. These facts could greatly facilitate the isolation of genes via QTL (quantitative trait loci) map-based cloning in the different Prunus species, following the association of these TFs with the identified QTLs using the peach reference genome. PMID:26124770

  14. CMS: A Web-Based System for Visualization and Analysis of Genome-Wide Methylation Data of Human Cancers

    PubMed Central

    Huang, Yi-Wen; Roa, Juan C.; Goodfellow, Paul J.; Kizer, E. Lynette; Huang, Tim H. M.; Chen, Yidong

    2013-01-01

    Background DNA methylation of promoter CpG islands is associated with gene suppression, and its unique genome-wide profiles have been linked to tumor progression. Coupled with high-throughput sequencing technologies, it can now efficiently determine genome-wide methylation profiles in cancer cells. Also, experimental and computational technologies make it possible to find the functional relationship between cancer-specific methylation patterns and their clinicopathological parameters. Methodology/Principal Findings Cancer methylome system (CMS) is a web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. Methylation intensities were obtained from MBDCap-sequencing, pre-processed and stored in the database. 191 patient samples (169 tumor and 22 normal specimen) and 41 breast cancer cell-lines are deposited in the database, comprising about 6.6 billion uniquely mapped sequence reads. This provides comprehensive and genome-wide epigenetic portraits of human breast cancer and endometrial cancer to date. Two views are proposed for users to better understand methylation structure at the genomic level or systemic methylation alteration at the gene level. In addition, a variety of annotation tracks are provided to cover genomic information. CMS includes important analytic functions for interpretation of methylation data, such as the detection of differentially methylated regions, statistical calculation of global methylation intensities, multiple gene sets of biologically significant categories, interactivity with UCSC via custom-track data. We also present examples of discoveries utilizing the framework. Conclusions/Significance CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research. CMS is freely accessible at: http://cbbiweb.uthscsa.edu/KMethylomes/. PMID:23630576

  15. CMS: a web-based system for visualization and analysis of genome-wide methylation data of human cancers.

    PubMed

    Gu, Fei; Doderer, Mark S; Huang, Yi-Wen; Roa, Juan C; Goodfellow, Paul J; Kizer, E Lynette; Huang, Tim H M; Chen, Yidong

    2013-01-01

    DNA methylation of promoter CpG islands is associated with gene suppression, and its unique genome-wide profiles have been linked to tumor progression. Coupled with high-throughput sequencing technologies, it can now efficiently determine genome-wide methylation profiles in cancer cells. Also, experimental and computational technologies make it possible to find the functional relationship between cancer-specific methylation patterns and their clinicopathological parameters. Cancer methylome system (CMS) is a web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. Methylation intensities were obtained from MBDCap-sequencing, pre-processed and stored in the database. 191 patient samples (169 tumor and 22 normal specimen) and 41 breast cancer cell-lines are deposited in the database, comprising about 6.6 billion uniquely mapped sequence reads. This provides comprehensive and genome-wide epigenetic portraits of human breast cancer and endometrial cancer to date. Two views are proposed for users to better understand methylation structure at the genomic level or systemic methylation alteration at the gene level. In addition, a variety of annotation tracks are provided to cover genomic information. CMS includes important analytic functions for interpretation of methylation data, such as the detection of differentially methylated regions, statistical calculation of global methylation intensities, multiple gene sets of biologically significant categories, interactivity with UCSC via custom-track data. We also present examples of discoveries utilizing the framework. CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research. CMS is freely accessible at: http://cbbiweb.uthscsa.edu/KMethylomes/.

  16. Evaluation of high throughput gene expression platforms using a genomic biomarker signature for prediction of skin sensitization.

    PubMed

    Forreryd, Andy; Johansson, Henrik; Albrekt, Ann-Sofie; Lindstedt, Malin

    2014-05-16

    Allergic contact dermatitis (ACD) develops upon exposure to certain chemical compounds termed skin sensitizers. To reduce the occurrence of skin sensitizers, chemicals are regularly screened for their capacity to induce sensitization. The recently developed Genomic Allergen Rapid Detection (GARD) assay is an in vitro alternative to animal testing for identification of skin sensitizers, classifying chemicals by evaluating transcriptional levels of a genomic biomarker signature. During assay development and biomarker identification, genome-wide expression analysis was applied using microarrays covering approximately 30,000 transcripts. However, the microarray platform suffers from drawbacks in terms of low sample throughput, high cost per sample and time consuming protocols and is a limiting factor for adaption of GARD into a routine assay for screening of potential sensitizers. With the purpose to simplify assay procedures, improve technical parameters and increase sample throughput, we assessed the performance of three high throughput gene expression platforms--nCounter®, BioMark HD™ and OpenArray®--and correlated their performance metrics against our previously generated microarray data. We measured the levels of 30 transcripts from the GARD biomarker signature across 48 samples. Detection sensitivity, reproducibility, correlations and overall structure of gene expression measurements were compared across platforms. Gene expression data from all of the evaluated platforms could be used to classify most of the sensitizers from non-sensitizers in the GARD assay. Results also showed high data quality and acceptable reproducibility for all platforms but only medium to poor correlations of expression measurements across platforms. In addition, evaluated platforms were superior to the microarray platform in terms of cost efficiency, simplicity of protocols and sample throughput. We evaluated the performance of three non-array based platforms using a limited set of transcripts from the GARD biomarker signature. We demonstrated that it was possible to achieve acceptable discriminatory power in terms of separation between sensitizers and non-sensitizers in the GARD assay while reducing assay costs, simplify assay procedures and increase sample throughput by using an alternative platform, providing a first step towards the goal to prepare GARD for formal validation and adaption of the assay for industrial screening of potential sensitizers.

  17. Genome-wide mapping of autonomous promoter activity in human cells

    PubMed Central

    van Arensbergen, Joris; FitzPatrick, Vincent D.; de Haas, Marcel; Pagie, Ludo; Sluimer, Jasper; Bussemaker, Harmen J.; van Steensel, Bas

    2017-01-01

    Previous methods to systematically characterize sequence-intrinsic activity of promoters have been limited by relatively low throughput and the length of sequences that could be tested. Here we present Survey of Regulatory Elements (SuRE), a method to assay more than 108 DNA fragments, each 0.2–2kb in size, for their ability to drive transcription autonomously. In SuRE, a plasmid library is constructed of random genomic fragments upstream of a 20bp barcode and decoded by paired-end sequencing. This library is then transfected into cells and transcribed barcodes are quantified in the RNA by high throughput sequencing. When applied to the human genome, we achieved a 55-fold genome coverage, allowing us to map autonomous promoter activity genome-wide. By computational modeling we delineated subregions within promoters that are relevant for their activity. For instance, we show that antisense promoter transcription is generally dependent on the sense core promoter sequences, and that most enhancers and several families of repetitive elements act as autonomous transcription initiation sites. PMID:28024146

  18. Application of chemical biology in target identification and drug discovery.

    PubMed

    Zhu, Yue; Xiao, Ting; Lei, Saifei; Zhou, Fulai; Wang, Ming-Wei

    2015-09-01

    Drug discovery and development is vital to the well-being of mankind and sustainability of the pharmaceutical industry. Using chemical biology approaches to discover drug leads has become a widely accepted path partially because of the completion of the Human Genome Project. Chemical biology mainly solves biological problems through searching previously unknown targets for pharmacologically active small molecules or finding ligands for well-defined drug targets. It is a powerful tool to study how these small molecules interact with their respective targets, as well as their roles in signal transduction, molecular recognition and cell functions. There have been an increasing number of new therapeutic targets being identified and subsequently validated as a result of advances in functional genomics, which in turn led to the discovery of numerous active small molecules via a variety of high-throughput screening initiatives. In this review, we highlight some applications of chemical biology in the context of drug discovery.

  19. Identification of genetic elements in metabolism by high-throughput mouse phenotyping.

    PubMed

    Rozman, Jan; Rathkolb, Birgit; Oestereicher, Manuela A; Schütt, Christine; Ravindranath, Aakash Chavan; Leuchtenberger, Stefanie; Sharma, Sapna; Kistler, Martin; Willershäuser, Monja; Brommage, Robert; Meehan, Terrence F; Mason, Jeremy; Haselimashhadi, Hamed; Hough, Tertius; Mallon, Ann-Marie; Wells, Sara; Santos, Luis; Lelliott, Christopher J; White, Jacqueline K; Sorg, Tania; Champy, Marie-France; Bower, Lynette R; Reynolds, Corey L; Flenniken, Ann M; Murray, Stephen A; Nutter, Lauryl M J; Svenson, Karen L; West, David; Tocchini-Valentini, Glauco P; Beaudet, Arthur L; Bosch, Fatima; Braun, Robert B; Dobbie, Michael S; Gao, Xiang; Herault, Yann; Moshiri, Ala; Moore, Bret A; Kent Lloyd, K C; McKerlie, Colin; Masuya, Hiroshi; Tanaka, Nobuhiko; Flicek, Paul; Parkinson, Helen E; Sedlacek, Radislav; Seong, Je Kyung; Wang, Chi-Kuang Leo; Moore, Mark; Brown, Steve D; Tschöp, Matthias H; Wurst, Wolfgang; Klingenspor, Martin; Wolf, Eckhard; Beckers, Johannes; Machicao, Fausto; Peter, Andreas; Staiger, Harald; Häring, Hans-Ulrich; Grallert, Harald; Campillos, Monica; Maier, Holger; Fuchs, Helmut; Gailus-Durner, Valerie; Werner, Thomas; Hrabe de Angelis, Martin

    2018-01-18

    Metabolic diseases are a worldwide problem but the underlying genetic factors and their relevance to metabolic disease remain incompletely understood. Genome-wide research is needed to characterize so-far unannotated mammalian metabolic genes. Here, we generate and analyze metabolic phenotypic data of 2016 knockout mouse strains under the aegis of the International Mouse Phenotyping Consortium (IMPC) and find 974 gene knockouts with strong metabolic phenotypes. 429 of those had no previous link to metabolism and 51 genes remain functionally completely unannotated. We compared human orthologues of these uncharacterized genes in five GWAS consortia and indeed 23 candidate genes are associated with metabolic disease. We further identify common regulatory elements in promoters of candidate genes. As each regulatory element is composed of several transcription factor binding sites, our data reveal an extensive metabolic phenotype-associated network of co-regulated genes. Our systematic mouse phenotype analysis thus paves the way for full functional annotation of the genome.

  20. Biodiversity and Functional Genomics in the Human Microbiome

    PubMed Central

    Morgan, Xochitl C.; Segata, Nicola; Huttenhower, Curtis

    2012-01-01

    Over the course of our lives, humans are colonized by a tremendous diversity of commensal microbes, which comprise the human microbiome. The collective genetic potential (metagenome) of the human microbiome is orders of magnitude more than the human genome, and it profoundly affects human health and disease in ways we are only beginning to understand. Advances in computing and high-throughput sequencing have enabled population-level surveys such as MetaHIT and the recently-released Human Microbiome Project, detailed investigations of the microbiome in human disease, and mechanistic studies employing gnotobiotic model organisms. The resulting knowledge of human microbiome composition, function, and range of variation across multiple body sites has begun to assemble a rich picture of commensal host-microbe and microbe- microbe interactions as well as their roles in human health and disease and their potential as diagnostic and therapeutic tools. PMID:23140990

  1. Small molecules enhance CRISPR genome editing in pluripotent stem cells.

    PubMed

    Yu, Chen; Liu, Yanxia; Ma, Tianhua; Liu, Kai; Xu, Shaohua; Zhang, Yu; Liu, Honglei; La Russa, Marie; Xie, Min; Ding, Sheng; Qi, Lei S

    2015-02-05

    The bacterial CRISPR-Cas9 system has emerged as an effective tool for sequence-specific gene knockout through non-homologous end joining (NHEJ), but it remains inefficient for precise editing of genome sequences. Here we develop a reporter-based screening approach for high-throughput identification of chemical compounds that can modulate precise genome editing through homology-directed repair (HDR). Using our screening method, we have identified small molecules that can enhance CRISPR-mediated HDR efficiency, 3-fold for large fragment insertions and 9-fold for point mutations. Interestingly, we have also observed that a small molecule that inhibits HDR can enhance frame shift insertion and deletion (indel) mutations mediated by NHEJ. The identified small molecules function robustly in diverse cell types with minimal toxicity. The use of small molecules provides a simple and effective strategy to enhance precise genome engineering applications and facilitates the study of DNA repair mechanisms in mammalian cells. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. TARGETED CAPTURE IN EVOLUTIONARY AND ECOLOGICAL GENOMICS

    PubMed Central

    Jones, Matthew R.; Good, Jeffrey M.

    2016-01-01

    The rapid expansion of next-generation sequencing has yielded a powerful array of tools to address fundamental biological questions at a scale that was inconceivable just a few years ago. Various genome partitioning strategies to sequence select subsets of the genome have emerged as powerful alternatives to whole genome sequencing in ecological and evolutionary genomic studies. High throughput targeted capture is one such strategy that involves the parallel enrichment of pre-selected genomic regions of interest. The growing use of targeted capture demonstrates its potential power to address a range of research questions, yet these approaches have yet to expand broadly across labs focused on evolutionary and ecological genomics. In part, the use of targeted capture has been hindered by the logistics of capture design and implementation in species without established reference genomes. Here we aim to 1) increase the accessibility of targeted capture to researchers working in non-model taxa by discussing capture methods that circumvent the need of a reference genome, 2) highlight the evolutionary and ecological applications where this approach is emerging as a powerful sequencing strategy, and 3) discuss the future of targeted capture and other genome partitioning approaches in light of the increasing accessibility of whole genome sequencing. Given the practical advantages and increasing feasibility of high-throughput targeted capture, we anticipate an ongoing expansion of capture-based approaches in evolutionary and ecological research, synergistic with an expansion of whole genome sequencing. PMID:26137993

  3. INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles

    PubMed Central

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B.

    2013-01-01

    Background The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo. PMID:24324765

  4. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    PubMed

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba Alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B

    2013-01-01

    The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.

  5. Illumina GA IIx& HiSeq 2000 Production Sequenccing and QC Analysis Pipelines at the DOE Joint Genome Institute

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daum, Christopher; Zane, Matthew; Han, James

    2011-01-31

    The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less

  6. Characterizing visible and invisible cell wall mutant phenotypes.

    PubMed

    Carpita, Nicholas C; McCann, Maureen C

    2015-07-01

    About 10% of a plant's genome is devoted to generating the protein machinery to synthesize, remodel, and deconstruct the cell wall. High-throughput genome sequencing technologies have enabled a reasonably complete inventory of wall-related genes that can be assembled into families of common evolutionary origin. Assigning function to each gene family member has been aided immensely by identification of mutants with visible phenotypes or by chemical and spectroscopic analysis of mutants with 'invisible' phenotypes of modified cell wall composition and architecture that do not otherwise affect plant growth or development. This review connects the inference of gene function on the basis of deviation from the wild type in genetic functional analyses to insights provided by modern analytical techniques that have brought us ever closer to elucidating the sequence structures of the major polysaccharide components of the plant cell wall. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  7. G23D: Online tool for mapping and visualization of genomic variants on 3D protein structures.

    PubMed

    Solomon, Oz; Kunik, Vered; Simon, Amos; Kol, Nitzan; Barel, Ortal; Lev, Atar; Amariglio, Ninette; Somech, Raz; Rechavi, Gidi; Eyal, Eran

    2016-08-26

    Evaluation of the possible implications of genomic variants is an increasingly important task in the current high throughput sequencing era. Structural information however is still not routinely exploited during this evaluation process. The main reasons can be attributed to the partial structural coverage of the human proteome and the lack of tools which conveniently convert genomic positions, which are the frequent output of genomic pipelines, to proteins and structure coordinates. We present G23D, a tool for conversion of human genomic coordinates to protein coordinates and protein structures. G23D allows mapping of genomic positions/variants on evolutionary related (and not only identical) protein three dimensional (3D) structures as well as on theoretical models. By doing so it significantly extends the space of variants for which structural insight is feasible. To facilitate interpretation of the variant consequence, pathogenic variants, functional sites and polymorphism sites are displayed on protein sequence and structure diagrams alongside the input variants. G23D also provides modeling of the mutant structure, analysis of intra-protein contacts and instant access to functional predictions and predictions of thermo-stability changes. G23D is available at http://www.sheba-cancer.org.il/G23D . G23D extends the fraction of variants for which structural analysis is applicable and provides better and faster accessibility for structural data to biologists and geneticists who routinely work with genomic information.

  8. Marine Invertebrate Xenobiotic-Activated Nuclear Receptors: Their Application as Sensor Elements in High-Throughput Bioassays for Marine Bioactive Compounds

    PubMed Central

    Richter, Ingrid; Fidler, Andrew E.

    2014-01-01

    Developing high-throughput assays to screen marine extracts for bioactive compounds presents both conceptual and technical challenges. One major challenge is to develop assays that have well-grounded ecological and evolutionary rationales. In this review we propose that a specific group of ligand-activated transcription factors are particularly well-suited to act as sensors in such bioassays. More specifically, xenobiotic-activated nuclear receptors (XANRs) regulate transcription of genes involved in xenobiotic detoxification. XANR ligand-binding domains (LBDs) may adaptively evolve to bind those bioactive, and potentially toxic, compounds to which organisms are normally exposed to through their specific diets. A brief overview of the function and taxonomic distribution of both vertebrate and invertebrate XANRs is first provided. Proof-of-concept experiments are then described which confirm that a filter-feeding marine invertebrate XANR LBD is activated by marine bioactive compounds. We speculate that increasing access to marine invertebrate genome sequence data, in combination with the expression of functional recombinant marine invertebrate XANR LBDs, will facilitate the generation of high-throughput bioassays/biosensors of widely differing specificities, but all based on activation of XANR LBDs. Such assays may find application in screening marine extracts for bioactive compounds that could act as drug lead compounds. PMID:25421319

  9. Next Generation Sequencing at the University of Chicago Genomics Core

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Faber, Pieter

    2013-04-24

    The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

  10. Community standards for genomic resources, genetic conservation, and data integration

    Treesearch

    Jill Wegrzyn; Meg Staton; Emily Grau; Richard Cronn; C. Dana Nelson

    2017-01-01

    Genetics and genomics are increasingly important in forestry management and conservation. Next generation sequencing can increase analytical power, but still relies on building on the structure of previously acquired data. Data standards and data sharing allow the community to maximize the analytical power of high throughput genomics data. The landscape of incomplete...

  11. Ultra-barcoding in cacao (Theobroma spp.; malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA

    USDA-ARS?s Scientific Manuscript database

    High-throughput next-generation sequencing was used to scan the genome and generate reliable sequence of high copy number regions. Using this method, we examined whole plastid genomes as well as nearly 6000 bases of nuclear ribosomal DNA sequences for nine genotypes of Theobroma cacao and an indivi...

  12. The draft genome sequence of cork oak

    PubMed Central

    Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M.; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B.; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J. M.; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M.; Oliveira, M. Margarida; Ricardo, Cândido P.; Gonçalves, Sónia

    2018-01-01

    Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species. PMID:29786699

  13. The draft genome sequence of cork oak.

    PubMed

    Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J M; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M; Oliveira, M Margarida; Ricardo, Cândido P; Gonçalves, Sónia

    2018-05-22

    Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species.

  14. Noncoding RNAs in DNA Repair and Genome Integrity

    PubMed Central

    Wan, Guohui; Liu, Yunhua; Han, Cecil; Zhang, Xinna

    2014-01-01

    Abstract Significance: The well-studied sequences in the human genome are those of protein-coding genes, which account for only 1%–2% of the total genome. However, with the advent of high-throughput transcriptome sequencing technology, we now know that about 90% of our genome is extensively transcribed and that the vast majority of them are transcribed into noncoding RNAs (ncRNAs). It is of great interest and importance to decipher the functions of these ncRNAs in humans. Recent Advances: In the last decade, it has become apparent that ncRNAs play a crucial role in regulating gene expression in normal development, in stress responses to internal and environmental stimuli, and in human diseases. Critical Issues: In addition to those constitutively expressed structural RNA, such as ribosomal and transfer RNAs, regulatory ncRNAs can be classified as microRNAs (miRNAs), Piwi-interacting RNAs (piRNAs), small interfering RNAs (siRNAs), small nucleolar RNAs (snoRNAs), and long noncoding RNAs (lncRNAs). However, little is known about the biological features and functional roles of these ncRNAs in DNA repair and genome instability, although a number of miRNAs and lncRNAs are regulated in the DNA damage response. Future Directions: A major goal of modern biology is to identify and characterize the full profile of ncRNAs with regard to normal physiological functions and roles in human disorders. Clinically relevant ncRNAs will also be evaluated and targeted in therapeutic applications. Antioxid. Redox Signal. 20, 655–677. PMID:23879367

  15. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    PubMed

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.

  16. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

    PubMed Central

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096

  17. Cell-Free Expression and In Situ Immobilization of Parasite Proteins from Clonorchis sinensis for Rapid Identification of Antigenic Candidates

    PubMed Central

    Ju, Jung Won; Kim, Ho-Cheol; Shin, Hyun-Il; Kim, Yu Jung; Kim, Dong-Myung

    2015-01-01

    Progress towards genetic sequencing of human parasites has provided the groundwork for a post-genomic approach to develop novel antigens for the diagnosis and treatment of parasite infections. To fully utilize the genomic data, however, high-throughput methodologies are required for functional analysis of the proteins encoded in the genomic sequences. In this study, we investigated cell-free expression and in situ immobilization of parasite proteins as a novel platform for the discovery of antigenic proteins. PCR-amplified parasite DNA was immobilized on microbeads that were also functionalized to capture synthesized proteins. When the microbeads were incubated in a reaction mixture for cell-free synthesis, proteins expressed from the microbead-immobilized DNA were instantly immobilized on the same microbeads, providing a physical linkage between the genetic information and encoded proteins. This approach of in situ expression and isolation enables streamlined recovery and analysis of cell-free synthesized proteins and also allows facile identification of the genes coding antigenic proteins through direct PCR of the microbead-bound DNA. PMID:26599101

  18. FMLRC: Hybrid long read error correction using an FM-index.

    PubMed

    Wang, Jeremy R; Holt, James; McMillan, Leonard; Jones, Corbin D

    2018-02-09

    Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging "hybrid" assemblies that use long reads for scaffolding and short reads for accuracy. We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.

  19. HITS-CLIP yields genome-wide insights into brain alternative RNA processing

    NASA Astrophysics Data System (ADS)

    Licatalosi, Donny D.; Mele, Aldo; Fak, John J.; Ule, Jernej; Kayikci, Melis; Chi, Sung Wook; Clark, Tyson A.; Schweitzer, Anthony C.; Blume, John E.; Wang, Xuning; Darnell, Jennifer C.; Darnell, Robert B.

    2008-11-01

    Protein-RNA interactions have critical roles in all aspects of gene expression. However, applying biochemical methods to understand such interactions in living tissues has been challenging. Here we develop a genome-wide means of mapping protein-RNA binding sites in vivo, by high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP). HITS-CLIP analysis of the neuron-specific splicing factor Nova revealed extremely reproducible RNA-binding maps in multiple mouse brains. These maps provide genome-wide in vivo biochemical footprints confirming the previous prediction that the position of Nova binding determines the outcome of alternative splicing; moreover, they are sufficiently powerful to predict Nova action de novo. HITS-CLIP revealed a large number of Nova-RNA interactions in 3' untranslated regions, leading to the discovery that Nova regulates alternative polyadenylation in the brain. HITS-CLIP, therefore, provides a robust, unbiased means to identify functional protein-RNA interactions in vivo.

  20. TabSQL: a MySQL tool to facilitate mapping user data to public databases.

    PubMed

    Xia, Xiao-Qin; McClelland, Michael; Wang, Yipeng

    2010-06-23

    With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data.

  1. TabSQL: a MySQL tool to facilitate mapping user data to public databases

    PubMed Central

    2010-01-01

    Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251

  2. Extensive Genome Rearrangements and Multiple Horizontal Gene Transfers in a Population of Pyrococcus Isolates from Vulcano Island, Italy▿ †

    PubMed Central

    White, James R.; Escobar-Paramo, Patricia; Mongodin, Emmanuel F.; Nelson, Karen E.; DiRuggiero, Jocelyne

    2008-01-01

    The extent of chromosome rearrangements in Pyrococcus isolates from marine hydrothermal vents in Vulcano Island, Italy, was evaluated by high-throughput genomic methods. The results illustrate the dynamic nature of the genomes of the genus Pyrococcus and raise the possibility of a connection between rapidly changing environmental conditions and adaptive genomic properties. PMID:18723649

  3. Canopy Temperature and Vegetation Indices from High-Throughput Phenotyping Improve Accuracy of Pedigree and Genomic Selection for Grain Yield in Wheat

    PubMed Central

    Rutkoski, Jessica; Poland, Jesse; Mondal, Suchismita; Autrique, Enrique; Pérez, Lorena González; Crossa, José; Reynolds, Matthew; Singh, Ravi

    2016-01-01

    Genomic selection can be applied prior to phenotyping, enabling shorter breeding cycles and greater rates of genetic gain relative to phenotypic selection. Traits measured using high-throughput phenotyping based on proximal or remote sensing could be useful for improving pedigree and genomic prediction model accuracies for traits not yet possible to phenotype directly. We tested if using aerial measurements of canopy temperature, and green and red normalized difference vegetation index as secondary traits in pedigree and genomic best linear unbiased prediction models could increase accuracy for grain yield in wheat, Triticum aestivum L., using 557 lines in five environments. Secondary traits on training and test sets, and grain yield on the training set were modeled as multivariate, and compared to univariate models with grain yield on the training set only. Cross validation accuracies were estimated within and across-environment, with and without replication, and with and without correcting for days to heading. We observed that, within environment, with unreplicated secondary trait data, and without correcting for days to heading, secondary traits increased accuracies for grain yield by 56% in pedigree, and 70% in genomic prediction models, on average. Secondary traits increased accuracy slightly more when replicated, and considerably less when models corrected for days to heading. In across-environment prediction, trends were similar but less consistent. These results show that secondary traits measured in high-throughput could be used in pedigree and genomic prediction to improve accuracy. This approach could improve selection in wheat during early stages if validated in early-generation breeding plots. PMID:27402362

  4. Emerging insights on intestinal dysbiosis during bacterial infections☆

    PubMed Central

    Pham, Tu Anh N; Lawley, Trevor D

    2014-01-01

    Infection of the gastrointestinal tract is commonly linked to pathological imbalances of the resident microbiota, termed dysbiosis. In recent years, advanced high-throughput genomic approaches have allowed us to examine the microbiota in an unprecedented manner, revealing novel biological insights about infection-associated dysbiosis at the community and individual species levels. A dysbiotic microbiota is typically reduced in taxonomic diversity and metabolic function, and can harbour pathobionts that exacerbate intestinal inflammation or manifest systemic disease. Dysbiosis can also promote pathogen genome evolution, while allowing the pathogens to persist at high density and transmit to new hosts. A deeper understanding of bacterial pathogenicity in the context of the intestinal microbiota should unveil new approaches for developing diagnostics and therapies for enteropathogens. PMID:24581695

  5. The re-emergence of natural products for drug discovery in the genomics era.

    PubMed

    Harvey, Alan L; Edrada-Ebel, RuAngelie; Quinn, Ronald J

    2015-02-01

    Natural products have been a rich source of compounds for drug discovery. However, their use has diminished in the past two decades, in part because of technical barriers to screening natural products in high-throughput assays against molecular targets. Here, we review strategies for natural product screening that harness the recent technical advances that have reduced these barriers. We also assess the use of genomic and metabolomic approaches to augment traditional methods of studying natural products, and highlight recent examples of natural products in antimicrobial drug discovery and as inhibitors of protein-protein interactions. The growing appreciation of functional assays and phenotypic screens may further contribute to a revival of interest in natural products for drug discovery.

  6. Family genome browser: visualizing genomes with pedigree information.

    PubMed

    Juan, Liran; Liu, Yongzhuang; Wang, Yongtian; Teng, Mingxiang; Zang, Tianyi; Wang, Yadong

    2015-07-15

    Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. The FGB is available at http://mlg.hit.edu.cn/FGB/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Applying Genomic and Genetic Tools to Understand and Mitigate Damage from Exposure to Toxins

    DTIC Science & Technology

    2013-10-01

    sequences to the human genome . Genome Biol 10, R25 (2009). 26 Award number: W81XWH-09-1-0715 Title: Applying Genomic and Genetic Tools to Understand...utilizing the high-throughput technology of mRNA-seq. BODY The goal of our research program (W81XWH-09-1-0715) was to utilize genetic and genomic ...also acquired the achetf222a * * * * * 5 Award number: W81XWH-09-1-0715 Title: Applying Genomic and Genetic Tools to Understand and Mitigate

  8. OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.

    PubMed

    Poehlman, William L; Rynge, Mats; Branton, Chris; Balamurugan, D; Feltus, Frank A

    2016-01-01

    High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.

  9. OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid

    PubMed Central

    Poehlman, William L.; Rynge, Mats; Branton, Chris; Balamurugan, D.; Feltus, Frank A.

    2016-01-01

    High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments. PMID:27499617

  10. A technological update of molecular diagnostics for infectious diseases

    PubMed Central

    Liu, Yu-Tsueng

    2008-01-01

    Identification of a causative pathogen is essential for the choice of treatment for most infectious diseases. Many FDA approved molecular assays; usually more sensitive and specific compared to traditional tests, have been developed in the last decade. A new trend of high throughput and multiplexing assays are emerging thanks to technological developments for the human genome sequencing project. The applications of microarray and ultra high throughput sequencing technologies for diagnostic microbiology are reviewed. The race for the $1000 genome technology by 2014 will have a profound impact in diagnosis and treatment of infectious diseases in the near future. PMID:18782035

  11. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing

    PubMed Central

    Shafer, Aaron B. A.; Northrup, Joseph M.; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B. W.

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations. PMID:26745372

  12. Yeast for virus research

    PubMed Central

    Zhao, Richard Yuqi

    2017-01-01

    Budding yeast (Saccharomyces cerevisiae) and fission yeast (Schizosaccharomyces pombe) are two popular model organisms for virus research. They are natural hosts for viruses as they carry their own indigenous viruses. Both yeasts have been used for studies of plant, animal and human viruses. Many positive sense (+) RNA viruses and some DNA viruses replicate with various levels in yeasts, thus allowing study of those viral activities during viral life cycle. Yeasts are single cell eukaryotic organisms. Hence, many of the fundamental cellular functions such as cell cycle regulation or programed cell death are highly conserved from yeasts to higher eukaryotes. Therefore, they are particularly suited to study the impact of those viral activities on related cellular activities during virus-host interactions. Yeasts present many unique advantages in virus research over high eukaryotes. Yeast cells are easy to maintain in the laboratory with relative short doubling time. They are non-biohazardous, genetically amendable with small genomes that permit genome-wide analysis of virologic and cellular functions. In this review, similarities and differences of these two yeasts are described. Studies of virologic activities such as viral translation, viral replication and genome-wide study of virus-cell interactions in yeasts are highlighted. Impacts of viral proteins on basic cellular functions such as cell cycle regulation and programed cell death are discussed. Potential applications of using yeasts as hosts to carry out functional analysis of small viral genome and to develop high throughput drug screening platform for the discovery of antiviral drugs are presented. PMID:29082230

  13. High-throughput and targeted in-depth mass spectrometry-based approaches for biofluid profiling and biomarker discovery.

    PubMed

    Jimenez, Connie R; Piersma, Sander; Pham, Thang V

    2007-12-01

    Proteomics aims to create a link between genomic information, biological function and disease through global studies of protein expression, modification and protein-protein interactions. Recent advances in key proteomics tools, such as mass spectrometry (MS) and (bio)informatics, provide tremendous opportunities for biomarker-related clinical applications. In this review, we focus on two complementary MS-based approaches with high potential for the discovery of biomarker patterns and low-abundant candidate biomarkers in biofluids: high-throughput matrix-assisted laser desorption/ionization time-of-flight mass spectroscopy-based methods for peptidome profiling and label-free liquid chromatography-based methods coupled to MS for in-depth profiling of biofluids with a focus on subproteomes, including the low-molecular-weight proteome, carrier-bound proteome and N-linked glycoproteome. The two approaches differ in their aims, throughput and sensitivity. We discuss recent progress and challenges in the analysis of plasma/serum and proximal fluids using these strategies and highlight the potential of liquid chromatography-MS-based proteomics of cancer cell and tumor secretomes for the discovery of candidate blood-based biomarkers. Strategies for candidate validation are also described.

  14. Development and use of molecular markers: past and present.

    PubMed

    Grover, Atul; Sharma, P C

    2016-01-01

    Molecular markers, due to their stability, cost-effectiveness and ease of use provide an immensely popular tool for a variety of applications including genome mapping, gene tagging, genetic diversity diversity, phylogenetic analysis and forensic investigations. In the last three decades, a number of molecular marker techniques have been developed and exploited worldwide in different systems. However, only a handful of these techniques, namely RFLPs, RAPDs, AFLPs, ISSRs, SSRs and SNPs have received global acceptance. A recent revolution in DNA sequencing techniques has taken the discovery and application of molecular markers to high-throughput and ultrahigh-throughput levels. Although, the choice of marker will obviously depend on the targeted use, microsatellites, SNPs and genotyping by sequencing (GBS) largely fulfill most of the user requirements. Further, modern transcriptomic and functional markers will lead the ventures onto high-density genetic map construction, identification of QTLs, breeding and conservation strategies in times to come in combination with other high throughput techniques. This review presents an overview of different marker technologies and their variants with a comparative account of their characteristic features and applications.

  15. High-Throughput Sequencing, a Versatile Weapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology

    PubMed Central

    Caboche, Ségolène; Audebert, Christophe; Hot, David

    2014-01-01

    The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose. PMID:25437800

  16. Form and function of topologically associating genomic domains in budding yeast.

    PubMed

    Eser, Umut; Chandler-Brown, Devon; Ay, Ferhat; Straight, Aaron F; Duan, Zhijun; Noble, William Stafford; Skotheim, Jan M

    2017-04-11

    The genome of metazoan cells is organized into topologically associating domains (TADs) that have similar histone modifications, transcription level, and DNA replication timing. Although similar structures appear to be conserved in fission yeast, computational modeling and analysis of high-throughput chromosome conformation capture (Hi-C) data have been used to argue that the small, highly constrained budding yeast chromosomes could not have these structures. In contrast, herein we analyze Hi-C data for budding yeast and identify 200-kb scale TADs, whose boundaries are enriched for transcriptional activity. Furthermore, these boundaries separate regions of similarly timed replication origins connecting the long-known effect of genomic context on replication timing to genome architecture. To investigate the molecular basis of TAD formation, we performed Hi-C experiments on cells depleted for the Forkhead transcription factors, Fkh1 and Fkh2, previously associated with replication timing. Forkhead factors do not regulate TAD formation, but do promote longer-range genomic interactions and control interactions between origins near the centromere. Thus, our work defines spatial organization within the budding yeast nucleus, demonstrates the conserved role of genome architecture in regulating DNA replication, and identifies a molecular mechanism specifically regulating interactions between pericentric origins.

  17. Development and application of a novel genome-wide SNP array reveals domestication history in soybean

    PubMed Central

    Wang, Jiao; Chu, Shanshan; Zhang, Huairen; Zhu, Ying; Cheng, Hao; Yu, Deyue

    2016-01-01

    Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-wide scanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean. PMID:26856884

  18. Development and application of a novel genome-wide SNP array reveals domestication history in soybean.

    PubMed

    Wang, Jiao; Chu, Shanshan; Zhang, Huairen; Zhu, Ying; Cheng, Hao; Yu, Deyue

    2016-02-09

    Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-wide scanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean.

  19. A re-sequencing based assessment of genomic heterogeneity and fast neutron-induced deletions in a common bean cultivar

    USDA-ARS?s Scientific Manuscript database

    A small fast neutron mutant population has been established from Phaseolus vulgaris cv. Red Hawk. We leveraged the available P. vulgaris genome sequence and high throughput next generation DNA sequencing to examine the genomic structure of five Phaseolus vulgaris cv. Red Hawk fast neutron mutants wi...

  20. Coding Complete Genome for the Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil

    DTIC Science & Technology

    2017-05-04

    sequences for all four genome segments. We downloaded the raw Illumina sequence reads from the NCBI Short Read Archive (GenBank...MGTV genome segments through sequence similarity (BLASTN) to the published genome of Jingmen tick virus (JMTV) isolate SY84 (GenBank: KJ001579-KJ001582...2014. Standards for sequencing viral genomes in the era of high-throughput sequencing . MBio 5:e01360–14. 8. Bankevich A, Nurk S, Antipov

  1. Crop 3D-a LiDAR based platform for 3D high-throughput crop phenotyping.

    PubMed

    Guo, Qinghua; Wu, Fangfang; Pang, Shuxin; Zhao, Xiaoqian; Chen, Linhai; Liu, Jin; Xue, Baolin; Xu, Guangcai; Li, Le; Jing, Haichun; Chu, Chengcai

    2018-03-01

    With the growing population and the reducing arable land, breeding has been considered as an effective way to solve the food crisis. As an important part in breeding, high-throughput phenotyping can accelerate the breeding process effectively. Light detection and ranging (LiDAR) is an active remote sensing technology that is capable of acquiring three-dimensional (3D) data accurately, and has a great potential in crop phenotyping. Given that crop phenotyping based on LiDAR technology is not common in China, we developed a high-throughput crop phenotyping platform, named Crop 3D, which integrated LiDAR sensor, high-resolution camera, thermal camera and hyperspectral imager. Compared with traditional crop phenotyping techniques, Crop 3D can acquire multi-source phenotypic data in the whole crop growing period and extract plant height, plant width, leaf length, leaf width, leaf area, leaf inclination angle and other parameters for plant biology and genomics analysis. In this paper, we described the designs, functions and testing results of the Crop 3D platform, and briefly discussed the potential applications and future development of the platform in phenotyping. We concluded that platforms integrating LiDAR and traditional remote sensing techniques might be the future trend of crop high-throughput phenotyping.

  2. Efficient high-throughput sequencing of a laser microdissected chromosome arm

    PubMed Central

    2013-01-01

    Background Genomic sequence assemblies are key tools for a broad range of gene function and evolutionary studies. The diploid amphibian Xenopus tropicalis plays a pivotal role in these fields due to its combination of experimental flexibility, diploid genome, and early-branching tetrapod taxonomic position, having diverged from the amniote lineage ~360 million years ago. A genome assembly and a genetic linkage map have recently been made available. Unfortunately, large gaps in the linkage map attenuate long-range integrity of the genome assembly. Results We laser dissected the short arm of X. tropicalis chromosome 7 for next generation sequencing and computational mapping to the reference genome. This arm is of particular interest as it encodes the sex determination locus, but its genetic map contains large gaps which undermine available genome assemblies. Whole genome amplification of 15 laser-microdissected 7p arms followed by next generation sequencing yielded ~35 million reads, over four million of which uniquely mapped to the X. tropicalis genome. Our analysis placed more than 200 previously unmapped scaffolds on the analyzed chromosome arm, providing valuable low-resolution physical map information for de novo genome assembly. Conclusion We present a new approach for improving and validating genetic maps and sequence assemblies. Whole genome amplification of 15 microdissected chromosome arms provided sufficient high-quality material for localizing previously unmapped scaffolds and genes as well as recognizing mislocalized scaffolds. PMID:23714049

  3. High Throughput Sequence Analysis for Disease Resistance in Maize

    USDA-ARS?s Scientific Manuscript database

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  4. Epigenetic regulation of gene expression in cancer: techniques, resources and analysis

    PubMed Central

    Kagohara, Luciane T; Stein-O’Brien, Genevieve L; Kelley, Dylan; Flam, Emily; Wick, Heather C; Danilova, Ludmila V; Easwaran, Hariharan; Favorov, Alexander V; Qian, Jiang; Gaykalova, Daria A; Fertig, Elana J

    2018-01-01

    Abstract Cancer is a complex disease, driven by aberrant activity in numerous signaling pathways in even individual malignant cells. Epigenetic changes are critical mediators of these functional changes that drive and maintain the malignant phenotype. Changes in DNA methylation, histone acetylation and methylation, noncoding RNAs, posttranslational modifications are all epigenetic drivers in cancer, independent of changes in the DNA sequence. These epigenetic alterations were once thought to be crucial only for the malignant phenotype maintenance. Now, epigenetic alterations are also recognized as critical for disrupting essential pathways that protect the cells from uncontrolled growth, longer survival and establishment in distant sites from the original tissue. In this review, we focus on DNA methylation and chromatin structure in cancer. The precise functional role of these alterations is an area of active research using emerging high-throughput approaches and bioinformatics analysis tools. Therefore, this review also describes these high-throughput measurement technologies, public domain databases for high-throughput epigenetic data in tumors and model systems and bioinformatics algorithms for their analysis. Advances in bioinformatics data that combine these epigenetic data with genomics data are essential to infer the function of specific epigenetic alterations in cancer. These integrative algorithms are also a focus of this review. Future studies using these emerging technologies will elucidate how alterations in the cancer epigenome cooperate with genetic aberrations during tumor initiation and progression. This deeper understanding is essential to future studies with epigenetics biomarkers and precision medicine using emerging epigenetic therapies. PMID:28968850

  5. WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data

    PubMed Central

    Yi, Ming; Horton, Jay D; Cohen, Jonathan C; Hobbs, Helen H; Stephens, Robert M

    2006-01-01

    Background Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. Result WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. Conclusion This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at . PMID:16423281

  6. Generation of genetically modified mice using CRISPR/Cas9 and haploid embryonic stem cell systems

    PubMed Central

    JIN, Li-Fang; LI, Jin-Song

    2016-01-01

    With the development of high-throughput sequencing technology in the post-genomic era, researchers have concentrated their efforts on elucidating the relationships between genes and their corresponding functions. Recently, important progress has been achieved in the generation of genetically modified mice based on CRISPR/Cas9 and haploid embryonic stem cell (haESC) approaches, which provide new platforms for gene function analysis, human disease modeling, and gene therapy. Here, we review the CRISPR/Cas9 and haESC technology for the generation of genetically modified mice and discuss the key challenges in the application of these approaches. PMID:27469251

  7. MEGANTE: A Web-Based System for Integrated Plant Genome Annotation

    PubMed Central

    Numa, Hisataka; Itoh, Takeshi

    2014-01-01

    The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/. PMID:24253915

  8. Construction of a dairy microbial genome catalog opens new perspectives for the metagenomic analysis of dairy fermented products.

    PubMed

    Almeida, Mathieu; Hébert, Agnès; Abraham, Anne-Laure; Rasmussen, Simon; Monnet, Christophe; Pons, Nicolas; Delbès, Céline; Loux, Valentin; Batto, Jean-Michel; Leonard, Pierre; Kennedy, Sean; Ehrlich, Stanislas Dusko; Pop, Mihai; Montel, Marie-Christine; Irlinger, Françoise; Renault, Pierre

    2014-12-13

    Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned. We built a reference genome catalog suitable for short read metagenomic analysis using a low-cost sequencing strategy. We selected 142 bacteria isolated from dairy products belonging to 137 different species and 67 genera, and succeeded to reconstruct the draft genome of 117 of them at a standard or high quality level, including isolates from the genera Kluyvera, Luteococcus and Marinilactibacillus, still missing from public database. To demonstrate the potential of this catalog, we analysed the microbial composition of the surface of two smear cheeses and one blue-veined cheese, and showed that a significant part of the microbiota of these traditional cheeses was composed of microorganisms newly sequenced in our study. Our study provides data, which combined with publicly available genome references, represents the most expansive catalog to date of cheese-associated bacteria. Using this extended dairy catalog, we revealed the presence in traditional cheese of dominant microorganisms not deliberately inoculated, mainly Gram-negative genera such as Pseudoalteromonas haloplanktis or Psychrobacter immobilis, that may contribute to the characteristics of cheese produced through traditional methods.

  9. [The ENCODE project and functional genomics studies].

    PubMed

    Ding, Nan; Qu, Hongzhu; Fang, Xiangdong

    2014-03-01

    Upon the completion of the Human Genome Project, scientists have been trying to interpret the underlying genomic code for human biology. Since 2003, National Human Genome Research Institute (NHGRI) has invested nearly $0.3 billion and gathered over 440 scientists from more than 32 institutions in the United States, China, United Kingdom, Japan, Spain and Singapore to initiate the Encyclopedia of DNA Elements (ENCODE) project, aiming to identify and analyze all regulatory elements in the human genome. Taking advantage of the development of next-generation sequencing technologies and continuous improvement of experimental methods, ENCODE had made remarkable achievements: identified methylation and histone modification of DNA sequences and their regulatory effects on gene expression through altering chromatin structures, categorized binding sites of various transcription factors and constructed their regulatory networks, further revised and updated database for pseudogenes and non-coding RNA, and identified SNPs in regulatory sequences associated with diseases. These findings help to comprehensively understand information embedded in gene and genome sequences, the function of regulatory elements as well as the molecular mechanism underlying the transcriptional regulation by noncoding regions, and provide extensive data resource for life sciences, particularly for translational medicine. We re-viewed the contributions of high-throughput sequencing platform development and bioinformatical technology improve-ment to the ENCODE project, the association between epigenetics studies and the ENCODE project, and the major achievement of the ENCODE project. We also provided our prospective on the role of the ENCODE project in promoting the development of basic and clinical medicine.

  10. Genome sequencing in microfabricated high-density picolitre reactors.

    PubMed

    Margulies, Marcel; Egholm, Michael; Altman, William E; Attiya, Said; Bader, Joel S; Bemben, Lisa A; Berka, Jan; Braverman, Michael S; Chen, Yi-Ju; Chen, Zhoutao; Dewell, Scott B; Du, Lei; Fierro, Joseph M; Gomes, Xavier V; Godwin, Brian C; He, Wen; Helgesen, Scott; Ho, Chun Heen; Ho, Chun He; Irzyk, Gerard P; Jando, Szilveszter C; Alenquer, Maria L I; Jarvie, Thomas P; Jirage, Kshama B; Kim, Jong-Bum; Knight, James R; Lanza, Janna R; Leamon, John H; Lefkowitz, Steven M; Lei, Ming; Li, Jing; Lohman, Kenton L; Lu, Hong; Makhijani, Vinod B; McDade, Keith E; McKenna, Michael P; Myers, Eugene W; Nickerson, Elizabeth; Nobile, John R; Plant, Ramona; Puc, Bernard P; Ronan, Michael T; Roth, George T; Sarkis, Gary J; Simons, Jan Fredrik; Simpson, John W; Srinivasan, Maithreyan; Tartaro, Karrie R; Tomasz, Alexander; Vogt, Kari A; Volkmer, Greg A; Wang, Shally H; Wang, Yong; Weiner, Michael P; Yu, Pengguang; Begley, Richard F; Rothberg, Jonathan M

    2005-09-15

    The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.

  11. Exploration of Panviral Proteome: High-Throughput Cloning and Functional Implications in Virus-host Interactions

    PubMed Central

    Yu, Xiaobo; Bian, Xiaofang; Throop, Andrea; Song, Lusheng; Moral, Lerys Del; Park, Jin; Seiler, Catherine; Fiacco, Michael; Steel, Jason; Hunter, Preston; Saul, Justin; Wang, Jie; Qiu, Ji; Pipas, James M.; LaBaer, Joshua

    2014-01-01

    Throughout the long history of virus-host co-evolution, viruses have developed delicate strategies to facilitate their invasion and replication of their genome, while silencing the host immune responses through various mechanisms. The systematic characterization of viral protein-host interactions would yield invaluable information in the understanding of viral invasion/evasion, diagnosis and therapeutic treatment of a viral infection, and mechanisms of host biology. With more than 2,000 viral genomes sequenced, only a small percent of them are well investigated. The access of these viral open reading frames (ORFs) in a flexible cloning format would greatly facilitate both in vitro and in vivo virus-host interaction studies. However, the overall progress of viral ORF cloning has been slow. To facilitate viral studies, we are releasing the initiation of our panviral proteome collection of 2,035 ORF clones from 830 viral genes in the Gateway® recombinational cloning system. Here, we demonstrate several uses of our viral collection including highly efficient production of viral proteins using human cell-free expression system in vitro, global identification of host targets for rubella virus using Nucleic Acid Programmable Protein Arrays (NAPPA) containing 10,000 unique human proteins, and detection of host serological responses using micro-fluidic multiplexed immunoassays. The studies presented here begin to elucidate host-viral protein interactions with our systemic utilization of viral ORFs, high-throughput cloning, and proteomic technologies. These valuable plasmid resources will be available to the research community to enable continued viral functional studies. PMID:24955142

  12. Exploration of panviral proteome: high-throughput cloning and functional implications in virus-host interactions.

    PubMed

    Yu, Xiaobo; Bian, Xiaofang; Throop, Andrea; Song, Lusheng; Moral, Lerys Del; Park, Jin; Seiler, Catherine; Fiacco, Michael; Steel, Jason; Hunter, Preston; Saul, Justin; Wang, Jie; Qiu, Ji; Pipas, James M; LaBaer, Joshua

    2014-01-01

    Throughout the long history of virus-host co-evolution, viruses have developed delicate strategies to facilitate their invasion and replication of their genome, while silencing the host immune responses through various mechanisms. The systematic characterization of viral protein-host interactions would yield invaluable information in the understanding of viral invasion/evasion, diagnosis and therapeutic treatment of a viral infection, and mechanisms of host biology. With more than 2,000 viral genomes sequenced, only a small percent of them are well investigated. The access of these viral open reading frames (ORFs) in a flexible cloning format would greatly facilitate both in vitro and in vivo virus-host interaction studies. However, the overall progress of viral ORF cloning has been slow. To facilitate viral studies, we are releasing the initiation of our panviral proteome collection of 2,035 ORF clones from 830 viral genes in the Gateway® recombinational cloning system. Here, we demonstrate several uses of our viral collection including highly efficient production of viral proteins using human cell-free expression system in vitro, global identification of host targets for rubella virus using Nucleic Acid Programmable Protein Arrays (NAPPA) containing 10,000 unique human proteins, and detection of host serological responses using micro-fluidic multiplexed immunoassays. The studies presented here begin to elucidate host-viral protein interactions with our systemic utilization of viral ORFs, high-throughput cloning, and proteomic technologies. These valuable plasmid resources will be available to the research community to enable continued viral functional studies.

  13. Isolation, genome sequencing and functional analysis of two T7-like coliphages of avian pathogenic Escherichia coli.

    PubMed

    Chen, Mianmian; Xu, Juntian; Yao, Huochun; Lu, Chengping; Zhang, Wei

    2016-05-10

    Avian pathogenic Escherichia coli (APEC) causes colibacillosis, which results in significant economic losses to the poultry industry worldwide. Due to the drug residues and increased antibiotic resistance caused by antibiotic use, bacteriophages and other alternative therapeutic agents are expected to control APEC infection in poultry. Two APEC phages, named P483 and P694, were isolated from the feces from the farmers market in China. We then studied their biological properties, and carried out high-throughput genome sequencing and homology analyses of these phages. Assembly results of high-throughput sequencing showed that the structures of both P483 and P694 genomes consist of linear and double-stranded DNA. Results of the electron microscopy and homology analysis revealed that both P483 and P694 belong to T7-like virus which is a member of the Podoviridae family of the Caudovirales order. Comparative genomic analysis showed that most of the predicted proteins of these two phages showed strongest sequence similarity to the Enterobacteria phages BA14 and 285P, Erwinia phage FE44, and Kluyvera phage Kvp1; however, some proteins such as gp0.6a, gp1.7 and gp17 showed lower similarity (<85%) with the homologs of other phages in the T7 subgroup. We also found some unique characteristics of P483 and P694, such as the two types of the genes of P694 and no lytic activity of P694 against its host bacteria in liquid medium. Our results serve to further our understanding of phage evolution of T7-like coliphages and provide the potential application of the phages as therapeutic agents for the treatment of diseases. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Genome-wide RNAi Screening to Identify Host Factors That Modulate Oncolytic Virus Therapy.

    PubMed

    Allan, Kristina J; Mahoney, Douglas J; Baird, Stephen D; Lefebvre, Charles A; Stojdl, David F

    2018-04-03

    High-throughput genome-wide RNAi (RNA interference) screening technology has been widely used for discovering host factors that impact virus replication. Here we present the application of this technology to uncovering host targets that specifically modulate the replication of Maraba virus, an oncolytic rhabdovirus, and vaccinia virus with the goal of enhancing therapy. While the protocol has been tested for use with oncolytic Maraba virus and oncolytic vaccinia virus, this approach is applicable to other oncolytic viruses and can also be utilized for identifying host targets that modulate virus replication in mammalian cells in general. This protocol describes the development and validation of an assay for high-throughput RNAi screening in mammalian cells, the key considerations and preparation steps important for conducting a primary high-throughput RNAi screen, and a step-by-step guide for conducting a primary high-throughput RNAi screen; in addition, it broadly outlines the methods for conducting secondary screen validation and tertiary validation studies. The benefit of high-throughput RNAi screening is that it allows one to catalogue, in an extensive and unbiased fashion, host factors that modulate any aspect of virus replication for which one can develop an in vitro assay such as infectivity, burst size, and cytotoxicity. It has the power to uncover biotherapeutic targets unforeseen based on current knowledge.

  15. High-throughput transformation of Saccharomyces cerevisiae using liquid handling robots.

    PubMed

    Liu, Guangbo; Lanham, Clayton; Buchan, J Ross; Kaplan, Matthew E

    2017-01-01

    Saccharomyces cerevisiae (budding yeast) is a powerful eukaryotic model organism ideally suited to high-throughput genetic analyses, which time and again has yielded insights that further our understanding of cell biology processes conserved in humans. Lithium Acetate (LiAc) transformation of yeast with DNA for the purposes of exogenous protein expression (e.g., plasmids) or genome mutation (e.g., gene mutation, deletion, epitope tagging) is a useful and long established method. However, a reliable and optimized high throughput transformation protocol that runs almost no risk of human error has not been described in the literature. Here, we describe such a method that is broadly transferable to most liquid handling high-throughput robotic platforms, which are now commonplace in academic and industry settings. Using our optimized method, we are able to comfortably transform approximately 1200 individual strains per day, allowing complete transformation of typical genomic yeast libraries within 6 days. In addition, use of our protocol for gene knockout purposes also provides a potentially quicker, easier and more cost-effective approach to generating collections of double mutants than the popular and elegant synthetic genetic array methodology. In summary, our methodology will be of significant use to anyone interested in high throughput molecular and/or genetic analysis of yeast.

  16. Privacy Challenges of Genomic Big Data.

    PubMed

    Shen, Hong; Ma, Jian

    2017-01-01

    With the rapid advancement of high-throughput DNA sequencing technologies, genomics has become a big data discipline where large-scale genetic information of human individuals can be obtained efficiently with low cost. However, such massive amount of personal genomic data creates tremendous challenge for privacy, especially given the emergence of direct-to-consumer (DTC) industry that provides genetic testing services. Here we review the recent development in genomic big data and its implications on privacy. We also discuss the current dilemmas and future challenges of genomic privacy.

  17. High-Throughput Next-Generation Sequencing of Polioviruses

    PubMed Central

    Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

    2016-01-01

    ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929

  18. Systematic Analysis of Zn2Cys6 Transcription Factors Required for Development and Pathogenicity by High-Throughput Gene Knockout in the Rice Blast Fungus

    PubMed Central

    Huang, Pengyun; Lin, Fucheng

    2014-01-01

    Because of great challenges and workload in deleting genes on a large scale, the functions of most genes in pathogenic fungi are still unclear. In this study, we developed a high-throughput gene knockout system using a novel yeast-Escherichia-Agrobacterium shuttle vector, pKO1B, in the rice blast fungus Magnaporthe oryzae. Using this method, we deleted 104 fungal-specific Zn2Cys6 transcription factor (TF) genes in M. oryzae. We then analyzed the phenotypes of these mutants with regard to growth, asexual and infection-related development, pathogenesis, and 9 abiotic stresses. The resulting data provide new insights into how this rice pathogen of global significance regulates important traits in the infection cycle through Zn2Cys6TF genes. A large variation in biological functions of Zn2Cys6TF genes was observed under the conditions tested. Sixty-one of 104 Zn2Cys6 TF genes were found to be required for fungal development. In-depth analysis of TF genes revealed that TF genes involved in pathogenicity frequently tend to function in multiple development stages, and disclosed many highly conserved but unidentified functional TF genes of importance in the fungal kingdom. We further found that the virulence-required TF genes GPF1 and CNF2 have similar regulation mechanisms in the gene expression involved in pathogenicity. These experimental validations clearly demonstrated the value of a high-throughput gene knockout system in understanding the biological functions of genes on a genome scale in fungi, and provided a solid foundation for elucidating the gene expression network that regulates the development and pathogenicity of M. oryzae. PMID:25299517

  19. Emerging Genomic Tools for Legume Breeding: Current Status and Future Prospects

    PubMed Central

    Pandey, Manish K.; Roorkiwal, Manish; Singh, Vikas K.; Ramalingam, Abirami; Kudapa, Himabindu; Thudi, Mahendar; Chitikineni, Anu; Rathore, Abhishek; Varshney, Rajeev K.

    2016-01-01

    Legumes play a vital role in ensuring global nutritional food security and improving soil quality through nitrogen fixation. Accelerated higher genetic gains is required to meet the demand of ever increasing global population. In recent years, speedy developments have been witnessed in legume genomics due to advancements in next-generation sequencing (NGS) and high-throughput genotyping technologies. Reference genome sequences for many legume crops have been reported in the last 5 years. The availability of the draft genome sequences and re-sequencing of elite genotypes for several important legume crops have made it possible to identify structural variations at large scale. Availability of large-scale genomic resources and low-cost and high-throughput genotyping technologies are enhancing the efficiency and resolution of genetic mapping and marker-trait association studies. Most importantly, deployment of molecular breeding approaches has resulted in development of improved lines in some legume crops such as chickpea and groundnut. In order to support genomics-driven crop improvement at a fast pace, the deployment of breeder-friendly genomics and decision support tools seems appear to be critical in breeding programs in developing countries. This review provides an overview of emerging genomics and informatics tools/approaches that will be the key driving force for accelerating genomics-assisted breeding and ultimately ensuring nutritional and food security in developing countries. PMID:27199998

  20. The emerging genomics and systems biology research lead to systems genomics studies.

    PubMed

    Yang, Mary Qu; Yoshigoe, Kenji; Yang, William; Tong, Weida; Qin, Xiang; Dunker, A; Chen, Zhongxue; Arbania, Hamid R; Liu, Jun S; Niemierko, Andrzej; Yang, Jack Y

    2014-01-01

    Synergistically integrating multi-layer genomic data at systems level not only can lead to deeper insights into the molecular mechanisms related to disease initiation and progression, but also can guide pathway-based biomarker and drug target identification. With the advent of high-throughput next-generation sequencing technologies, sequencing both DNA and RNA has generated multi-layer genomic data that can provide DNA polymorphism, non-coding RNA, messenger RNA, gene expression, isoform and alternative splicing information. Systems biology on the other hand studies complex biological systems, particularly systematic study of complex molecular interactions within specific cells or organisms. Genomics and molecular systems biology can be merged into the study of genomic profiles and implicated biological functions at cellular or organism level. The prospectively emerging field can be referred to as systems genomics or genomic systems biology. The Mid-South Bioinformatics Centre (MBC) and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and University of Arkansas for Medical Sciences are particularly interested in promoting education and research advancement in this prospectively emerging field. Based on past investigations and research outcomes, MBC is further utilizing differential gene and isoform/exon expression from RNA-seq and co-regulation from the ChiP-seq specific for different phenotypes in combination with protein-protein interactions, and protein-DNA interactions to construct high-level gene networks for an integrative genome-phoneme investigation at systems biology level.

  1. Single nucleotide polymorphism (SNP) discovery in rainbow trout using restriction site associated DNA (RAD) sequencing of doubled haploids and assessment of polymorphism in a population survey

    USDA-ARS?s Scientific Manuscript database

    Background: Our goal is to produce a high-throughput SNP genotyping platform for genomic analyses in rainbow trout that will enable fine mapping of QTL, whole genome association studies, genomic selection for improved aquaculture production traits, and genetic analyses of wild populations that aid ...

  2. Rolling circle amplification of metazoan mitochondrialgenomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simison, W. Brian; Lindberg, D.R.; Boore, J.L.

    2005-07-31

    Here we report the successful use of rolling circle amplification (RCA) for the amplification of complete metazoan mt genomes to make a product that is amenable to high-throughput genome sequencing techniques. The benefits of RCA over PCR are many and with further development and refinement of RCA, the sequencing of organellar genomics will require far less time and effort than current long PCR approaches.

  3. High-Throughput SNP Discovery through Deep Resequencing of a Reduced Representation Library to Anchor and Orient Scaffolds in the Soybean Whole Genome Sequence

    USDA-ARS?s Scientific Manuscript database

    The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...

  4. Genome-Wide siRNA-Based Functional Genomics of Pigmentation Identifies Novel Genes and Pathways That Impact Melanogenesis in Human Cells

    PubMed Central

    Bodemann, Brian; Petersen, Sean; Aruri, Jayavani; Koshy, Shiney; Richardson, Zachary; Le, Lu Q.; Krasieva, Tatiana; Roth, Michael G.; Farmer, Pat; White, Michael A.

    2008-01-01

    Melanin protects the skin and eyes from the harmful effects of UV irradiation, protects neural cells from toxic insults, and is required for sound conduction in the inner ear. Aberrant regulation of melanogenesis underlies skin disorders (melasma and vitiligo), neurologic disorders (Parkinson's disease), auditory disorders (Waardenburg's syndrome), and opthalmologic disorders (age related macular degeneration). Much of the core synthetic machinery driving melanin production has been identified; however, the spectrum of gene products participating in melanogenesis in different physiological niches is poorly understood. Functional genomics based on RNA-mediated interference (RNAi) provides the opportunity to derive unbiased comprehensive collections of pharmaceutically tractable single gene targets supporting melanin production. In this study, we have combined a high-throughput, cell-based, one-well/one-gene screening platform with a genome-wide arrayed synthetic library of chemically synthesized, small interfering RNAs to identify novel biological pathways that govern melanin biogenesis in human melanocytes. Ninety-two novel genes that support pigment production were identified with a low false discovery rate. Secondary validation and preliminary mechanistic studies identified a large panel of targets that converge on tyrosinase expression and stability. Small molecule inhibition of a family of gene products in this class was sufficient to impair chronic tyrosinase expression in pigmented melanoma cells and UV-induced tyrosinase expression in primary melanocytes. Isolation of molecular machinery known to support autophagosome biosynthesis from this screen, together with in vitro and in vivo validation, exposed a close functional relationship between melanogenesis and autophagy. In summary, these studies illustrate the power of RNAi-based functional genomics to identify novel genes, pathways, and pharmacologic agents that impact a biological phenotype and operate outside of preconceived mechanistic relationships. PMID:19057677

  5. Single-cell genomic profiling of acute myeloid leukemia for clinical use: A pilot study

    PubMed Central

    Yan, Benedict; Hu, Yongli; Ban, Kenneth H.K.; Tiang, Zenia; Ng, Christopher; Lee, Joanne; Tan, Wilson; Chiu, Lily; Tan, Tin Wee; Seah, Elaine; Ng, Chin Hin; Chng, Wee-Joo; Foo, Roger

    2017-01-01

    Although bulk high-throughput genomic profiling studies have led to a significant increase in the understanding of cancer biology, there is increasing awareness that bulk profiling approaches do not completely elucidate tumor heterogeneity. Single-cell genomic profiling enables the distinction of tumor heterogeneity, and may improve clinical diagnosis through the identification and characterization of putative subclonal populations. In the present study, the challenges associated with a single-cell genomics profiling workflow for clinical diagnostics were investigated. Single-cell RNA-sequencing (RNA-seq) was performed on 20 cells from an acute myeloid leukemia bone marrow sample. Putative blasts were identified based on their gene expression profiles and principal component analysis was performed to identify outlier cells. Variant calling was performed on the single-cell RNA-seq data. The present pilot study demonstrates a proof of concept for clinical single-cell genomic profiling. The recognized limitations include significant stochastic RNA loss and the relatively low throughput of the current proposed platform. Although the results of the present study are promising, further technological advances and protocol optimization are necessary for single-cell genomic profiling to be clinically viable. PMID:28454300

  6. Molecular Markers and Cotton Genetic Improvement: Current Status and Future Prospects

    PubMed Central

    Malik, Waqas; Iqbal, Muhammad Zaffar; Ali Khan, Asif; Qayyum, Abdul; Ali Abid, Muhammad; Noor, Etrat; Qadir Ahmad, Muhammad; Hasan Abbasi, Ghulam

    2014-01-01

    Narrow genetic base and complex allotetraploid genome of cotton (Gossypium hirsutum L.) is stimulating efforts to avail required polymorphism for marker based breeding. The availability of draft genome sequence of G. raimondii and G. arboreum and next generation sequencing (NGS) technologies facilitated the development of high-throughput marker technologies in cotton. The concepts of genetic diversity, QTL mapping, and marker assisted selection (MAS) are evolving into more efficient concepts of linkage disequilibrium, association mapping, and genomic selection, respectively. The objective of the current review is to analyze the pace of evolution in the molecular marker technologies in cotton during the last ten years into the following four areas: (i) comparative analysis of low- and high-throughput marker technologies available in cotton, (ii) genetic diversity in the available wild and improved gene pools of cotton, (iii) identification of the genomic regions within cotton genome underlying economic traits, and (iv) marker based selection methodologies. Moreover, the applications of marker technologies to enhance the breeding efficiency in cotton are also summarized. Aforementioned genomic technologies and the integration of several other omics resources are expected to enhance the cotton productivity and meet the global fiber quantity and quality demands. PMID:25401149

  7. Molecular Pathways: Extracting Medical Knowledge from High Throughput Genomic Data

    PubMed Central

    Goldstein, Theodore; Paull, Evan O.; Ellis, Matthew J.; Stuart, Joshua M.

    2013-01-01

    High-throughput genomic data that measures RNA expression, DNA copy number, mutation status and protein levels provide us with insights into the molecular pathway structure of cancer. Genomic lesions (amplifications, deletions, mutations) and epigenetic modifications disrupt biochemical cellular pathways. While the number of possible lesions is vast, different genomic alterations may result in concordant expression and pathway activities, producing common tumor subtypes that share similar phenotypic outcomes. How can these data be translated into medical knowledge that provides prognostic and predictive information? First generation mRNA expression signatures such as Genomic Health's Oncotype DX already provide prognostic information, but do not provide therapeutic guidance beyond the current standard of care – which is often inadequate in high-risk patients. Rather than building molecular signatures based on gene expression levels, evidence is growing that signatures based on higher-level quantities such as from genetic pathways may provide important prognostic and diagnostic cues. We provide examples of how activities for molecular entities can be predicted from pathway analysis and how the composite of all such activities, referred to here as the “activitome,” help connect genomic events to clinical factors in order to predict the drivers of poor outcome. PMID:23430023

  8. Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics.

    PubMed

    Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko

    2017-07-12

    Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.

  9. Year 2 Report: Protein Function Prediction Platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, C E

    2012-04-27

    Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fullymore » automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.« less

  10. A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data.

    PubMed

    Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip

    2012-01-06

    Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.

  11. Integrated analysis of RNA-binding protein complexes using in vitro selection and high-throughput sequencing and sequence specificity landscapes (SEQRS).

    PubMed

    Lou, Tzu-Fang; Weidmann, Chase A; Killingsworth, Jordan; Tanaka Hall, Traci M; Goldstrohm, Aaron C; Campbell, Zachary T

    2017-04-15

    RNA-binding proteins (RBPs) collaborate to control virtually every aspect of RNA function. Tremendous progress has been made in the area of global assessment of RBP specificity using next-generation sequencing approaches both in vivo and in vitro. Understanding how protein-protein interactions enable precise combinatorial regulation of RNA remains a significant problem. Addressing this challenge requires tools that can quantitatively determine the specificities of both individual proteins and multimeric complexes in an unbiased and comprehensive way. One approach utilizes in vitro selection, high-throughput sequencing, and sequence-specificity landscapes (SEQRS). We outline a SEQRS experiment focused on obtaining the specificity of a multi-protein complex between Drosophila RBPs Pumilio (Pum) and Nanos (Nos). We discuss the necessary controls in this type of experiment and examine how the resulting data can be complemented with structural and cell-based reporter assays. Additionally, SEQRS data can be integrated with functional genomics data to uncover biological function. Finally, we propose extensions of the technique that will enhance our understanding of multi-protein regulatory complexes assembled onto RNA. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Machine Learning Analysis Identifies Drosophila Grunge/Atrophin as an Important Learning and Memory Gene Required for Memory Retention and Social Learning.

    PubMed

    Kacsoh, Balint Z; Greene, Casey S; Bosco, Giovanni

    2017-11-06

    High-throughput experiments are becoming increasingly common, and scientists must balance hypothesis-driven experiments with genome-wide data acquisition. We sought to predict novel genes involved in Drosophila learning and long-term memory from existing public high-throughput data. We performed an analysis using PILGRM, which analyzes public gene expression compendia using machine learning. We evaluated the top prediction alongside genes involved in learning and memory in IMP, an interface for functional relationship networks. We identified Grunge/Atrophin ( Gug/Atro ), a transcriptional repressor, histone deacetylase, as our top candidate. We find, through multiple, distinct assays, that Gug has an active role as a modulator of memory retention in the fly and its function is required in the adult mushroom body. Depletion of Gug specifically in neurons of the adult mushroom body, after cell division and neuronal development is complete, suggests that Gug function is important for memory retention through regulation of neuronal activity, and not by altering neurodevelopment. Our study provides a previously uncharacterized role for Gug as a possible regulator of neuronal plasticity at the interface of memory retention and memory extinction. Copyright © 2017 Kacsoh et al.

  13. Recent advance in carrot genomics

    USDA-ARS?s Scientific Manuscript database

    In recent years there has been an effort towards the development of genomic resources in carrot. The number of available sequences for carrot in public databases has increased recently. This has allowed the design of SSRs markers, COS markers and a high-throughput SNP assay for genotyping. Additiona...

  14. Germplasm Management in the Post-genomics Era-a case study with lettuce

    USDA-ARS?s Scientific Manuscript database

    High-throughput genotyping platforms and next-generation sequencing technologies revolutionized our ways in germplasm characterization. In collaboration with UC Davis Genome Center, we completed a project of genotyping the entire cultivated lettuce (Lactuca sativa L.) collection of 1,066 accessions ...

  15. Systematic Identification of Combinatorial Drivers and Targets in Cancer Cell Lines

    PubMed Central

    Tabchy, Adel; Eltonsy, Nevine; Housman, David E.; Mills, Gordon B.

    2013-01-01

    There is an urgent need to elicit and validate highly efficacious targets for combinatorial intervention from large scale ongoing molecular characterization efforts of tumors. We established an in silico bioinformatic platform in concert with a high throughput screening platform evaluating 37 novel targeted agents in 669 extensively characterized cancer cell lines reflecting the genomic and tissue-type diversity of human cancers, to systematically identify combinatorial biomarkers of response and co-actionable targets in cancer. Genomic biomarkers discovered in a 141 cell line training set were validated in an independent 359 cell line test set. We identified co-occurring and mutually exclusive genomic events that represent potential drivers and combinatorial targets in cancer. We demonstrate multiple cooperating genomic events that predict sensitivity to drug intervention independent of tumor lineage. The coupling of scalable in silico and biologic high throughput cancer cell line platforms for the identification of co-events in cancer delivers rational combinatorial targets for synthetic lethal approaches with a high potential to pre-empt the emergence of resistance. PMID:23577104

  16. A tag-based approach for high-throughput analysis of CCWGG methylation.

    PubMed

    Denisova, Oksana V; Chernov, Andrei V; Koledachkina, Tatyana Y; Matvienko, Nicholas I

    2007-10-15

    Non-CpG methylation occurring in the context of CNG sequences is found in plants at a large number of genomic loci. However, there is still little information available about non-CpG methylation in mammals. Efficient methods that would allow detection of scarcely localized methylated sites in small quantities of DNA are required to elucidate the biological role of non-CpG methylation in both plants and animals. In this study, we tested a new whole genome approach to identify sites of CCWGG methylation (W is A or T), a particular case of CNG methylation, in genomic DNA. This technique is based on digestion of DNAs with methylation-sensitive restriction endonucleases EcoRII-C and AjnI. Short DNAs flanking methylated CCWGG sites (tags) are selectively purified and assembled in tandem arrays of up to nine tags. This allows high-throughput sequencing of tags, identification of flanking regions, and their exact positions in the genome. In this study, we tested specificity and efficiency of the approach.

  17. Systematic identification of combinatorial drivers and targets in cancer cell lines.

    PubMed

    Tabchy, Adel; Eltonsy, Nevine; Housman, David E; Mills, Gordon B

    2013-01-01

    There is an urgent need to elicit and validate highly efficacious targets for combinatorial intervention from large scale ongoing molecular characterization efforts of tumors. We established an in silico bioinformatic platform in concert with a high throughput screening platform evaluating 37 novel targeted agents in 669 extensively characterized cancer cell lines reflecting the genomic and tissue-type diversity of human cancers, to systematically identify combinatorial biomarkers of response and co-actionable targets in cancer. Genomic biomarkers discovered in a 141 cell line training set were validated in an independent 359 cell line test set. We identified co-occurring and mutually exclusive genomic events that represent potential drivers and combinatorial targets in cancer. We demonstrate multiple cooperating genomic events that predict sensitivity to drug intervention independent of tumor lineage. The coupling of scalable in silico and biologic high throughput cancer cell line platforms for the identification of co-events in cancer delivers rational combinatorial targets for synthetic lethal approaches with a high potential to pre-empt the emergence of resistance.

  18. CRISPR-enabled tools for engineering microbial genomes and phenotypes.

    PubMed

    Tarasava, Katia; Oh, Eun Joong; Eckert, Carrie A; Gill, Ryan T

    2018-06-19

    In recent years CRISPR-Cas technologies have revolutionized microbial engineering approaches. Genome editing and non-editing applications of various CRISPR-Cas systems have expanded the throughput and scale of engineering efforts, as well as opened up new avenues for manipulating genomes of non-model organisms. As we expand the range of organisms used for biotechnological applications, we need to develop better, more versatile tools for manipulation of these systems. Here we summarize the current advances in microbial gene editing using CRISPR-Cas based tools, and highlight state-of-the-art methods for high-throughput, efficient genome-scale engineering in model organisms Escherichia coli and Saccharomyces cerevisiae. We also review non-editing CRISPR-Cas applications available for gene expression manipulation, epigenetic remodeling, RNA editing, labeling and synthetic gene circuit design. Finally, we point out the areas of research that need further development in order to expand the range of applications and increase the utility of these new methods. This article is protected by copyright. All rights reserved.

  19. Genome-wide generation and use of informative intron-spanning and intron-length polymorphism markers for high-throughput genetic analysis in rice

    PubMed Central

    Badoni, Saurabh; Das, Sweta; Sayal, Yogesh K.; Gopalakrishnan, S.; Singh, Ashok K.; Rao, Atmakuri R.; Agarwal, Pinky; Parida, Swarup K.; Tyagi, Akhilesh K.

    2016-01-01

    We developed genome-wide 84634 ISM (intron-spanning marker) and 16510 InDel-fragment length polymorphism-based ILP (intron-length polymorphism) markers from genes physically mapped on 12 rice chromosomes. These genic markers revealed much higher amplification-efficiency (80%) and polymorphic-potential (66%) among rice accessions even by a cost-effective agarose gel-based assay. A wider level of functional molecular diversity (17–79%) and well-defined precise admixed genetic structure was assayed by 3052 genome-wide markers in a structured population of indica, japonica, aromatic and wild rice. Six major grain weight QTLs (11.9–21.6% phenotypic variation explained) were mapped on five rice chromosomes of a high-density (inter-marker distance: 0.98 cM) genetic linkage map (IR 64 x Sonasal) anchored with 2785 known/candidate gene-derived ISM and ILP markers. The designing of multiple ISM and ILP markers (2 to 4 markers/gene) in an individual gene will broaden the user-preference to select suitable primer combination for efficient assaying of functional allelic variation/diversity and realistic estimation of differential gene expression profiles among rice accessions. The genomic information generated in our study is made publicly accessible through a user-friendly web-resource, “Oryza ISM-ILP marker” database. The known/candidate gene-derived ISM and ILP markers can be enormously deployed to identify functionally relevant trait-associated molecular tags by optimal-resource expenses, leading towards genomics-assisted crop improvement in rice. PMID:27032371

  20. PARALLEL ASSAY OF OXYGEN EQUILIBRIA OF HEMOGLOBIN

    PubMed Central

    Lilly, Laura E.; Blinebry, Sara K.; Viscardi, Chelsea M.; Perez, Luis; Bonaventura, Joe; McMahon, Tim J.

    2013-01-01

    Methods to systematically analyze in parallel the function of multiple protein or cell samples in vivo or ex vivo (i.e. functional proteomics) in a controlled gaseous environment have thus far been limited. Here we describe an apparatus and procedure that enables, for the first time, parallel assay of oxygen equilibria in multiple samples. Using this apparatus, numerous simultaneous oxygen equilibrium curves (OECs) can be obtained under truly identical conditions from blood cell samples or purified hemoglobins (Hbs). We suggest that the ability to obtain these parallel datasets under identical conditions can be of immense value, both to biomedical researchers and clinicians who wish to monitor blood health, and to physiologists studying non-human organisms and the effects of climate change on these organisms. Parallel monitoring techniques are essential in order to better understand the functions of critical cellular proteins. The procedure can be applied to human studies, wherein an OEC can be analyzed in light of an individual’s entire genome. Here, we analyzed intraerythrocytic Hb, a protein that operates at the organism’s environmental interface and then comes into close contact with virtually all of the organism’s cells. The apparatus is theoretically scalable, and establishes a functional proteomic screen that can be correlated with genomic information on the same individuals. This new method is expected to accelerate our general understanding of protein function, an increasingly challenging objective as advances in proteomic and genomic throughput outpace the ability to study proteins’ functional properties. PMID:23827235

  1. Comparative aerial- and ground-based high-throughput phenotyping for the genetic dissection of NDVI as a proxy for drought-adaptive traits in durum wheat

    USDA-ARS?s Scientific Manuscript database

    High-throughput phenotyping platforms (HTPPs) provide novel opportunities to more effectively dissect the genetic basis of drought-adaptive traits. This genome-wide association study (GWAS) compares the results obtained with two Unmanned Aerial Vehicles (UAVs) and a ground-based platform used to mea...

  2. DOSE RESPONSE FROM HIGH THROUGHPUT GENE EXPRESSION STUDIES AND THE INFLUENCE OF TIME AND CELL LINE ON INFERRED MODE OF ACTION BY ONTOLOGIC ENRICHMENT (SOT)

    EPA Science Inventory

    Gene expression with ontologic enrichment and connectivity mapping tools is widely used to infer modes of action (MOA) for therapeutic drugs. Despite progress in high-throughput (HT) genomic systems, strategies suitable to identify industrial chemical MOA are needed. The L1000 is...

  3. Moving Toward Integrating Gene Expression Profiling into High-throughput Testing:A Gene Expression Biomarker Accurately Predicts Estrogen Receptor α Modulation in a Microarray Compendium

    EPA Science Inventory

    Microarray profiling of chemical-induced effects is being increasingly used in medium and high-throughput formats. In this study, we describe computational methods to identify molecular targets from whole-genome microarray data using as an example the estrogen receptor α (ERα), ...

  4. GermOnline 4.0 is a genomics gateway for germline development, meiosis and the mitotic cell cycle.

    PubMed

    Lardenois, Aurélie; Gattiker, Alexandre; Collin, Olivier; Chalmel, Frédéric; Primig, Michael

    2010-01-01

    GermOnline 4.0 is a cross-species database portal focusing on high-throughput expression data relevant for germline development, the meiotic cell cycle and mitosis in healthy versus malignant cells. It is thus a source of information for life scientists as well as clinicians who are interested in gene expression and regulatory networks. The GermOnline gateway provides unlimited access to information produced with high-density oligonucleotide microarrays (3'-UTR GeneChips), genome-wide protein-DNA binding assays and protein-protein interaction studies in the context of Ensembl genome annotation. Samples used to produce high-throughput expression data and to carry out genome-wide in vivo DNA binding assays are annotated via the MIAME-compliant Multiomics Information Management and Annotation System (MIMAS 3.0). Furthermore, the Saccharomyces Genomics Viewer (SGV) was developed and integrated into the gateway. SGV is a visualization tool that outputs genome annotation and DNA-strand specific expression data produced with high-density oligonucleotide tiling microarrays (Sc_tlg GeneChips) which cover the complete budding yeast genome on both DNA strands. It facilitates the interpretation of expression levels and transcript structures determined for various cell types cultured under different growth and differentiation conditions. Database URL: www.germonline.org/

  5. GermOnline 4.0 is a genomics gateway for germline development, meiosis and the mitotic cell cycle

    PubMed Central

    Lardenois, Aurélie; Gattiker, Alexandre; Collin, Olivier; Chalmel, Frédéric; Primig, Michael

    2010-01-01

    GermOnline 4.0 is a cross-species database portal focusing on high-throughput expression data relevant for germline development, the meiotic cell cycle and mitosis in healthy versus malignant cells. It is thus a source of information for life scientists as well as clinicians who are interested in gene expression and regulatory networks. The GermOnline gateway provides unlimited access to information produced with high-density oligonucleotide microarrays (3′-UTR GeneChips), genome-wide protein–DNA binding assays and protein–protein interaction studies in the context of Ensembl genome annotation. Samples used to produce high-throughput expression data and to carry out genome-wide in vivo DNA binding assays are annotated via the MIAME-compliant Multiomics Information Management and Annotation System (MIMAS 3.0). Furthermore, the Saccharomyces Genomics Viewer (SGV) was developed and integrated into the gateway. SGV is a visualization tool that outputs genome annotation and DNA-strand specific expression data produced with high-density oligonucleotide tiling microarrays (Sc_tlg GeneChips) which cover the complete budding yeast genome on both DNA strands. It facilitates the interpretation of expression levels and transcript structures determined for various cell types cultured under different growth and differentiation conditions. Database URL: www.germonline.org/ PMID:21149299

  6. The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

    PubMed

    Harb, Omar S; Roos, David S

    2015-01-01

    Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.

  7. Emerging approaches in predictive toxicology.

    PubMed

    Zhang, Luoping; McHale, Cliona M; Greene, Nigel; Snyder, Ronald D; Rich, Ivan N; Aardema, Marilyn J; Roy, Shambhu; Pfuhler, Stefan; Venkatactahalam, Sundaresan

    2014-12-01

    Predictive toxicology plays an important role in the assessment of toxicity of chemicals and the drug development process. While there are several well-established in vitro and in vivo assays that are suitable for predictive toxicology, recent advances in high-throughput analytical technologies and model systems are expected to have a major impact on the field of predictive toxicology. This commentary provides an overview of the state of the current science and a brief discussion on future perspectives for the field of predictive toxicology for human toxicity. Computational models for predictive toxicology, needs for further refinement and obstacles to expand computational models to include additional classes of chemical compounds are highlighted. Functional and comparative genomics approaches in predictive toxicology are discussed with an emphasis on successful utilization of recently developed model systems for high-throughput analysis. The advantages of three-dimensional model systems and stem cells and their use in predictive toxicology testing are also described. © 2014 Wiley Periodicals, Inc.

  8. Identification of several high-risk HPV inhibitors and drug targets with a novel high-throughput screening assay

    PubMed Central

    Toots, Mart; Ustav, Mart; Männik, Andres; Mumm, Karl; Tämm, Kaido; Tamm, Tarmo; Ustav, Mart

    2017-01-01

    Human papillomaviruses (HPVs) are oncogenic viruses that cause numerous different cancers as well as benign lesions in the epithelia. To date, there is no effective cure for an ongoing HPV infection. Here, we describe the generation process of a platform for the development of anti-HPV drugs. This system consists of engineered full-length HPV genomes that express reporter genes for evaluation of the viral copy number in all three HPV replication stages. We demonstrate the usefulness of this system by conducting high-throughput screens to identify novel high-risk HPV-specific inhibitors. At least five of the inhibitors block the function of Tdp1 and PARP1, which have been identified as essential cellular proteins for HPV replication and promising candidates for the development of antivirals against HPV and possibly against HPV-related cancers. PMID:28182794

  9. Emerging Approaches in Predictive Toxicology

    PubMed Central

    Zhang, Luoping; McHale, Cliona M.; Greene, Nigel; Snyder, Ronald D.; Rich, Ivan N.; Aardema, Marilyn J.; Roy, Shambhu; Pfuhler, Stefan; Venkatactahalam, Sundaresan

    2016-01-01

    Predictive toxicology plays an important role in the assessment of toxicity of chemicals and the drug development process. While there are several well-established in vitro and in vivo assays that are suitable for predictive toxicology, recent advances in high-throughput analytical technologies and model systems are expected to have a major impact on the field of predictive toxicology. This commentary provides an overview of the state of the current science and a brief discussion on future perspectives for the field of predictive toxicology for human toxicity. Computational models for predictive toxicology, needs for further refinement and obstacles to expand computational models to include additional classes of chemical compounds are highlighted. Functional and comparative genomics approaches in predictive toxicology are discussed with an emphasis on successful utilization of recently developed model systems for high-throughput analysis. The advantages of three-dimensional model systems and stem cells and their use in predictive toxicology testing are also described. PMID:25044351

  10. Reverse Ecology: from systems to environments and back.

    PubMed

    Levy, Roie; Borenstein, Elhanan

    2012-01-01

    The structure of complex biological systems reflects not only their function but also the environments in which they evolved and are adapted to. Reverse Ecology-an emerging new frontier in Evolutionary Systems Biology-aims to extract this information and to obtain novel insights into an organism's ecology. The Reverse Ecology framework facilitates the translation of high-throughput genomic data into large-scale ecological data, and has the potential to transform ecology into a high-throughput field. In this chapter, we describe some of the pioneering work in Reverse Ecology, demonstrating how system-level analysis of complex biological networks can be used to predict the natural habitats of poorly characterized microbial species, their interactions with other species, and universal patterns governing the adaptation of organisms to their environments. We further present several studies that applied Reverse Ecology to elucidate various aspects of microbial ecology, and lay out exciting future directions and potential future applications in biotechnology, biomedicine, and ecological engineering.

  11. BIOREL: the benchmark resource to estimate the relevance of the gene networks.

    PubMed

    Antonov, Alexey V; Mewes, Hans W

    2006-02-06

    The progress of high-throughput methodologies in functional genomics has lead to the development of statistical procedures to infer gene networks from various types of high-throughput data. However, due to the lack of common standards, the biological significance of the results of the different studies is hard to compare. To overcome this problem we propose a benchmark procedure and have developed a web resource (BIOREL), which is useful for estimating the biological relevance of any genetic network by integrating different sources of biological information. The associations of each gene from the network are classified as biologically relevant or not. The proportion of genes in the network classified as "relevant" is used as the overall network relevance score. Employing synthetic data we demonstrated that such a score ranks the networks fairly in respect to the relevance level. Using BIOREL as the benchmark resource we compared the quality of experimental and theoretically predicted protein interaction data.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gentry, T.; Schadt, C.; Zhou, J.

    Microarray technology has the unparalleled potential tosimultaneously determine the dynamics and/or activities of most, if notall, of the microbial populations in complex environments such as soilsand sediments. Researchers have developed several types of arrays thatcharacterize the microbial populations in these samples based on theirphylogenetic relatedness or functional genomic content. Several recentstudies have used these microarrays to investigate ecological issues;however, most have only analyzed a limited number of samples withrelatively few experiments utilizing the full high-throughput potentialof microarray analysis. This is due in part to the unique analyticalchallenges that these samples present with regard to sensitivity,specificity, quantitation, and data analysis. Thismore » review discussesspecific applications of microarrays to microbial ecology research alongwith some of the latest studies addressing the difficulties encounteredduring analysis of complex microbial communities within environmentalsamples. With continued development, microarray technology may ultimatelyachieve its potential for comprehensive, high-throughput characterizationof microbial populations in near real-time.« less

  13. On-chip Magnetic Separation and Cell Encapsulation in Droplets

    NASA Astrophysics Data System (ADS)

    Chen, A.; Byvank, T.; Bharde, A.; Miller, B. L.; Chalmers, J. J.; Sooryakumar, R.; Chang, W.-J.; Bashir, R.

    2012-02-01

    The demand for high-throughput single cell assays is gaining importance because of the heterogeneity of many cell suspensions, even after significant initial sorting. These suspensions may display cell-to-cell variability at the gene expression level that could impact single cell functional genomics, cancer, stem-cell research and drug screening. The on-chip monitoring of individual cells in an isolated environment could prevent cross-contamination, provide high recovery yield and ability to study biological traits at a single cell level These advantages of on-chip biological experiments contrast to conventional methods, which require bulk samples that provide only averaged information on cell metabolism. We report on a device that integrates microfluidic technology with a magnetic tweezers array to combine the functionality of separation and encapsulation of objects such as immunomagnetically labeled cells or magnetic beads into pico-liter droplets on the same chip. The ability to control the separation throughput that is independent of the hydrodynamic droplet generation rate allows the encapsulation efficiency to be optimized. The device can potentially be integrated with on-chip labeling and/or bio-detection to become a powerful single-cell analysis device.

  14. Microplate-based platform for combined chromatin and DNA methylation immunoprecipitation assays

    PubMed Central

    2011-01-01

    Background The processes that compose expression of a given gene are far more complex than previously thought presenting unprecedented conceptual and mechanistic challenges that require development of new tools. Chromatin structure, which is regulated by DNA methylation and histone modification, is at the center of gene regulation. Immunoprecipitations of chromatin (ChIP) and methylated DNA (MeDIP) represent a major achievement in this area that allow researchers to probe chromatin modifications as well as specific protein-DNA interactions in vivo and to estimate the density of proteins at specific sites genome-wide. Although a critical component of chromatin structure, DNA methylation has often been studied independently of other chromatin events and transcription. Results To allow simultaneous measurements of DNA methylation with other genomic processes, we developed and validated a simple and easy-to-use high throughput microplate-based platform for analysis of DNA methylation. Compared to the traditional beads-based MeDIP the microplate MeDIP was more sensitive and had lower non-specific binding. We integrated the MeDIP method with a microplate ChIP assay which allows measurements of both DNA methylation and histone marks at the same time, Matrix ChIP-MeDIP platform. We illustrated several applications of this platform to relate DNA methylation, with chromatin and transcription events at selected genes in cultured cells, human cancer and in a model of diabetic kidney disease. Conclusion The high throughput capacity of Matrix ChIP-MeDIP to profile tens and potentially hundreds of different genomic events at the same time as DNA methylation represents a powerful platform to explore complex genomic mechanism at selected genes in cultured cells and in whole tissues. In this regard, Matrix ChIP-MeDIP should be useful to complement genome-wide studies where the rich chromatin and transcription database resources provide fruitful foundation to pursue mechanistic, functional and diagnostic information at genes of interest in health and disease. PMID:22098709

  15. Ancient genomics

    PubMed Central

    Der Sarkissian, Clio; Allentoft, Morten E.; Ávila-Arcos, María C.; Barnett, Ross; Campos, Paula F.; Cappellini, Enrico; Ermini, Luca; Fernández, Ruth; da Fonseca, Rute; Ginolhac, Aurélien; Hansen, Anders J.; Jónsson, Hákon; Korneliussen, Thorfinn; Margaryan, Ashot; Martin, Michael D.; Moreno-Mayar, J. Víctor; Raghavan, Maanasa; Rasmussen, Morten; Velasco, Marcela Sandoval; Schroeder, Hannes; Schubert, Mikkel; Seguin-Orlando, Andaine; Wales, Nathan; Gilbert, M. Thomas P.; Willerslev, Eske; Orlando, Ludovic

    2015-01-01

    The past decade has witnessed a revolution in ancient DNA (aDNA) research. Although the field's focus was previously limited to mitochondrial DNA and a few nuclear markers, whole genome sequences from the deep past can now be retrieved. This breakthrough is tightly connected to the massive sequence throughput of next generation sequencing platforms and the ability to target short and degraded DNA molecules. Many ancient specimens previously unsuitable for DNA analyses because of extensive degradation can now successfully be used as source materials. Additionally, the analytical power obtained by increasing the number of sequence reads to billions effectively means that contamination issues that have haunted aDNA research for decades, particularly in human studies, can now be efficiently and confidently quantified. At present, whole genomes have been sequenced from ancient anatomically modern humans, archaic hominins, ancient pathogens and megafaunal species. Those have revealed important functional and phenotypic information, as well as unexpected adaptation, migration and admixture patterns. As such, the field of aDNA has entered the new era of genomics and has provided valuable information when testing specific hypotheses related to the past. PMID:25487338

  16. Optogenetic Approaches to Drug Discovery in Neuroscience and Beyond.

    PubMed

    Zhang, Hongkang; Cohen, Adam E

    2017-07-01

    Recent advances in optogenetics have opened new routes to drug discovery, particularly in neuroscience. Physiological cellular assays probe functional phenotypes that connect genomic data to patient health. Optogenetic tools, in particular tools for all-optical electrophysiology, now provide a means to probe cellular disease models with unprecedented throughput and information content. These techniques promise to identify functional phenotypes associated with disease states and to identify compounds that improve cellular function regardless of whether the compound acts directly on a target or through a bypass mechanism. This review discusses opportunities and unresolved challenges in applying optogenetic techniques throughout the discovery pipeline - from target identification and validation, to target-based and phenotypic screens, to clinical trials. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Messenger RNA exchange between scions and rootstocks in grafted grapevines

    USDA-ARS?s Scientific Manuscript database

    We demonstrated the existence of genome-scale mRNA exchange in grafted grapevines, a woody fruit species with significant economic importance. By using diagnostic SNPs derived from high throughput genome sequencing, we identified more than three thousand genes transporting mRNAs across graft junctio...

  18. 5C-ID: Increased resolution Chromosome-Conformation-Capture-Carbon-Copy with in situ 3C and double alternating primer design.

    PubMed

    Kim, Ji Hun; Titus, Katelyn R; Gong, Wanfeng; Beagan, Jonathan A; Cao, Zhendong; Phillips-Cremins, Jennifer E

    2018-05-14

    Mammalian genomes are folded in a hierarchy of compartments, topologically associating domains (TADs), subTADs, and looping interactions. Currently, there is a great need to evaluate the link between chromatin topology and genome function across many biological conditions and genetic perturbations. Hi-C can generate genome-wide maps of looping interactions but is intractable for high-throughput comparison of loops across multiple conditions due to the enormous number of reads (>6 Billion) required per library. Here, we describe 5C-ID, a new version of Chromosome-Conformation-Capture-Carbon-Copy (5C) with restriction digest and ligation performed in the nucleus (in situ Chromosome-Conformation-Capture (3C)) and ligation-mediated amplification performed with a double alternating primer design. We demonstrate that 5C-ID produces higher-resolution 3D genome folding maps with reduced spatial noise using markedly lower cell numbers than canonical 5C. 5C-ID enables the creation of high-resolution, high-coverage maps of chromatin loops in up to a 30 Megabase subset of the genome at a fraction of the cost of Hi-C. Copyright © 2018 Elsevier Inc. All rights reserved.

  19. The Genomic Basis of Evolutionary Innovation in Pseudomonas aeruginosa

    PubMed Central

    Wagner, Andreas; MacLean, R. Craig

    2016-01-01

    Novel traits play a key role in evolution, but their origins remain poorly understood. Here we address this problem by using experimental evolution to study bacterial innovation in real time. We allowed 380 populations of Pseudomonas aeruginosa to adapt to 95 different carbon sources that challenged bacteria with either evolving novel metabolic traits or optimizing existing traits. Whole genome sequencing of more than 80 clones revealed profound differences in the genetic basis of innovation and optimization. Innovation was associated with the rapid acquisition of mutations in genes involved in transcription and metabolism. Mutations in pre-existing duplicate genes in the P. aeruginosa genome were common during innovation, but not optimization. These duplicate genes may have been acquired by P. aeruginosa due to either spontaneous gene amplification or horizontal gene transfer. High throughput phenotype assays revealed that novelty was associated with increased pleiotropic costs that are likely to constrain innovation. However, mutations in duplicate genes with close homologs in the P. aeruginosa genome were associated with low pleiotropic costs compared to mutations in duplicate genes with distant homologs in the P. aeruginosa genome, suggesting that functional redundancy between duplicates facilitates innovation by buffering pleiotropic costs. PMID:27149698

  20. Next-generation mammalian genetics toward organism-level systems biology.

    PubMed

    Susaki, Etsuo A; Ukai, Hideki; Ueda, Hiroki R

    2017-01-01

    Organism-level systems biology in mammals aims to identify, analyze, control, and design molecular and cellular networks executing various biological functions in mammals. In particular, system-level identification and analysis of molecular and cellular networks can be accelerated by next-generation mammalian genetics. Mammalian genetics without crossing, where all production and phenotyping studies of genome-edited animals are completed within a single generation drastically reduce the time, space, and effort of conducting the systems research. Next-generation mammalian genetics is based on recent technological advancements in genome editing and developmental engineering. The process begins with introduction of double-strand breaks into genomic DNA by using site-specific endonucleases, which results in highly efficient genome editing in mammalian zygotes or embryonic stem cells. By using nuclease-mediated genome editing in zygotes, or ~100% embryonic stem cell-derived mouse technology, whole-body knock-out and knock-in mice can be produced within a single generation. These emerging technologies allow us to produce multiple knock-out or knock-in strains in high-throughput manner. In this review, we discuss the basic concepts and related technologies as well as current challenges and future opportunities for next-generation mammalian genetics in organism-level systems biology.

  1. Robustness of Massively Parallel Sequencing Platforms

    PubMed Central

    Kavak, Pınar; Yüksel, Bayram; Aksu, Soner; Kulekci, M. Oguzhan; Güngör, Tunga; Hach, Faraz; Şahinalp, S. Cenk; Alkan, Can; Sağıroğlu, Mahmut Şamil

    2015-01-01

    The improvements in high throughput sequencing technologies (HTS) made clinical sequencing projects such as ClinSeq and Genomics England feasible. Although there are significant improvements in accuracy and reproducibility of HTS based analyses, the usability of these types of data for diagnostic and prognostic applications necessitates a near perfect data generation. To assess the usability of a widely used HTS platform for accurate and reproducible clinical applications in terms of robustness, we generated whole genome shotgun (WGS) sequence data from the genomes of two human individuals in two different genome sequencing centers. After analyzing the data to characterize SNPs and indels using the same tools (BWA, SAMtools, and GATK), we observed significant number of discrepancies in the call sets. As expected, the most of the disagreements between the call sets were found within genomic regions containing common repeats and segmental duplications, albeit only a small fraction of the discordant variants were within the exons and other functionally relevant regions such as promoters. We conclude that although HTS platforms are sufficiently powerful for providing data for first-pass clinical tests, the variant predictions still need to be confirmed using orthogonal methods before using in clinical applications. PMID:26382624

  2. Using DNase Hi-C techniques to map global and local three-dimensional genome architecture at high resolution.

    PubMed

    Ma, Wenxiu; Ay, Ferhat; Lee, Choli; Gulsoy, Gunhan; Deng, Xinxian; Cook, Savannah; Hesson, Jennifer; Cavanaugh, Christopher; Ware, Carol B; Krumm, Anton; Shendure, Jay; Blau, C Anthony; Disteche, Christine M; Noble, William S; Duan, ZhiJun

    2018-06-01

    The folding and three-dimensional (3D) organization of chromatin in the nucleus critically impacts genome function. The past decade has witnessed rapid advances in genomic tools for delineating 3D genome architecture. Among them, chromosome conformation capture (3C)-based methods such as Hi-C are the most widely used techniques for mapping chromatin interactions. However, traditional Hi-C protocols rely on restriction enzymes (REs) to fragment chromatin and are therefore limited in resolution. We recently developed DNase Hi-C for mapping 3D genome organization, which uses DNase I for chromatin fragmentation. DNase Hi-C overcomes RE-related limitations associated with traditional Hi-C methods, leading to improved methodological resolution. Furthermore, combining this method with DNA capture technology provides a high-throughput approach (targeted DNase Hi-C) that allows for mapping fine-scale chromatin architecture at exceptionally high resolution. Hence, targeted DNase Hi-C will be valuable for delineating the physical landscapes of cis-regulatory networks that control gene expression and for characterizing phenotype-associated chromatin 3D signatures. Here, we provide a detailed description of method design and step-by-step working protocols for these two methods. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. A field ornithologist’s guide to genomics: Practical considerations for ecology and conservation

    USGS Publications Warehouse

    Oyler-McCance, Sara J.; Oh, Kevin; Langin, Kathryn; Aldridge, Cameron L.

    2016-01-01

    Vast improvements in sequencing technology have made it practical to simultaneously sequence millions of nucleotides distributed across the genome, opening the door for genomic studies in virtually any species. Ornithological research stands to benefit in three substantial ways. First, genomic methods enhance our ability to parse and simultaneously analyze both neutral and non-neutral genomic regions, thus providing insight into adaptive evolution and divergence. Second, the sheer quantity of sequence data generated by current sequencing platforms allows increased precision and resolution in analyses. Third, high-throughput sequencing can benefit applications that focus on a small number of loci that are otherwise prohibitively expensive, time-consuming, and technically difficult using traditional sequencing methods. These advances have improved our ability to understand evolutionary processes like speciation and local adaptation, but they also offer many practical applications in the fields of population ecology, migration tracking, conservation planning, diet analyses, and disease ecology. This review provides a guide for field ornithologists interested in incorporating genomic approaches into their research program, with an emphasis on techniques related to ecology and conservation. We present a general overview of contemporary genomic approaches and methods, as well as important considerations when selecting a genomic technique. We also discuss research questions that are likely to benefit from utilizing high-throughput sequencing instruments, highlighting select examples from recent avian studies.

  4. High-throughput physical mapping of chromosomes using automated in situ hybridization.

    PubMed

    George, Phillip; Sharakhova, Maria V; Sharakhov, Igor V

    2012-06-28

    Projects to obtain whole-genome sequences for 10,000 vertebrate species and for 5,000 insect and related arthropod species are expected to take place over the next 5 years. For example, the sequencing of the genomes for 15 malaria mosquitospecies is currently being done using an Illumina platform. This Anopheles species cluster includes both vectors and non-vectors of malaria. When the genome assemblies become available, researchers will have the unique opportunity to perform comparative analysis for inferring evolutionary changes relevant to vector ability. However, it has proven difficult to use next-generation sequencing reads to generate high-quality de novo genome assemblies. Moreover, the existing genome assemblies for Anopheles gambiae, although obtained using the Sanger method, are gapped or fragmented. Success of comparative genomic analyses will be limited if researchers deal with numerous sequencing contigs, rather than with chromosome-based genome assemblies. Fragmented, unmapped sequences create problems for genomic analyses because: (i) unidentified gaps cause incorrect or incomplete annotation of genomic sequences; (ii) unmapped sequences lead to confusion between paralogous genes and genes from different haplotypes; and (iii) the lack of chromosome assignment and orientation of the sequencing contigs does not allow for reconstructing rearrangement phylogeny and studying chromosome evolution. Developing high-resolution physical maps for species with newly sequenced genomes is a timely and cost-effective investment that will facilitate genome annotation, evolutionary analysis, and re-sequencing of individual genomes from natural populations. Here, we present innovative approaches to chromosome preparation, fluorescent in situ hybridization (FISH), and imaging that facilitate rapid development of physical maps. Using An. gambiae as an example, we demonstrate that the development of physical chromosome maps can potentially improve genome assemblies and, thus, the quality of genomic analyses. First, we use a high-pressure method to prepare polytene chromosome spreads. This method, originally developed for Drosophila, allows the user to visualize more details on chromosomes than the regular squashing technique. Second, a fully automated, front-end system for FISH is used for high-throughput physical genome mapping. The automated slide staining system runs multiple assays simultaneously and dramatically reduces hands-on time. Third, an automatic fluorescent imaging system, which includes a motorized slide stage, automatically scans and photographs labeled chromosomes after FISH. This system is especially useful for identifying and visualizing multiple chromosomal plates on the same slide. In addition, the scanning process captures a more uniform FISH result. Overall, the automated high-throughput physical mapping protocol is more efficient than a standard manual protocol.

  5. Satellite DNA: An Evolving Topic

    PubMed Central

    Garrido-Ramos, Manuel A.

    2017-01-01

    Satellite DNA represents one of the most fascinating parts of the repetitive fraction of the eukaryotic genome. Since the discovery of highly repetitive tandem DNA in the 1960s, a lot of literature has extensively covered various topics related to the structure, organization, function, and evolution of such sequences. Today, with the advent of genomic tools, the study of satellite DNA has regained a great interest. Thus, Next-Generation Sequencing (NGS), together with high-throughput in silico analysis of the information contained in NGS reads, has revolutionized the analysis of the repetitive fraction of the eukaryotic genomes. The whole of the historical and current approaches to the topic gives us a broad view of the function and evolution of satellite DNA and its role in chromosomal evolution. Currently, we have extensive information on the molecular, chromosomal, biological, and population factors that affect the evolutionary fate of satellite DNA, knowledge that gives rise to a series of hypotheses that get on well with each other about the origin, spreading, and evolution of satellite DNA. In this paper, I review these hypotheses from a methodological, conceptual, and historical perspective and frame them in the context of chromosomal organization and evolution. PMID:28926993

  6. De novo characterization of Lentinula edodes C(91-3) transcriptome by deep Solexa sequencing.

    PubMed

    Zhong, Mintao; Liu, Ben; Wang, Xiaoli; Liu, Lei; Lun, Yongzhi; Li, Xingyun; Ning, Anhong; Cao, Jing; Huang, Min

    2013-02-01

    Lentinula edodes, has been utilized as food, as well as, in popular medicine, moreover, its extract isolated from its mycelium and fruiting body have shown several therapeutic properties. Yet little is understood about its genes involved in these properties, and the absence of L.edodes genomes has been a barrier to the development of functional genomics research. However, high throughput sequencing technologies are now being widely applied to non-model species. To facilitate research on L.edodes, we leveraged Solexa sequencing technology in de novo assembly of L.edodes C(91-3) transcriptome. In a single run, we produced more than 57 million sequencing reads. These reads were assembled into 28,923 unigene sequences (mean size=689bp) including 18,120 unigenes with coding sequence (CDS). Based on similarity search with known proteins, assembled unigene sequences were annotated with gene descriptions, gene ontology (GO) and clusters of orthologous group (COG) terms. Our data provides the first comprehensive sequence resource available for functional genomics studies in L.edodes, and demonstrates the utility of Illumina/Solexa sequencing for de novo transcriptome characterization and gene discovery in a non-model mushroom. Copyright © 2012 Elsevier Inc. All rights reserved.

  7. AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments

    PubMed Central

    Zheng, Jie; Stoyanovich, Julia; Manduchi, Elisabetta; Liu, Junmin; Stoeckert, Christian J.

    2011-01-01

    The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute. Database URL: http://www.cbil.upenn.edu/annotCompute/ PMID:22190598

  8. Establishment of an efficient virus-induced gene silencing (VIGS) assay in Arabidopsis by Agrobacterium-mediated rubbing infection.

    PubMed

    Manhães, Ana Marcia E de A; de Oliveira, Marcos V V; Shan, Libo

    2015-01-01

    Several VIGS protocols have been established for high-throughput functional genomic screens as it bypasses the time-consuming and laborious process of generation of transgenic plants. The silencing efficiency in this approach is largely hindered by a technically demanding step in which the first pair of newly emerged true leaves at the 2-week-old stage are infiltrated with a needleless syringe. To further optimize VIGS efficiency and achieve rapid inoculation for a large-scale functional genomic study, here we describe a protocol of an efficient VIGS assay in Arabidopsis using Agrobacterium-mediated rubbing infection. The Agrobacterium inoculation is performed by simply rubbing the leaves with Filter Agent Celite(®) 545. The highly efficient and uniform silencing effect was indicated by the development of a visibly albino phenotype due to silencing of the Cloroplastos alterados 1 (CLA1) gene in the newly emerged leaves. In addition, the albino phenotype could be observed in stems and flowers, indicating its potential application for gene functional studies in the late vegetative development and flowering stages.

  9. Analyzing and interpreting genome data at the network level with ConsensusPathDB.

    PubMed

    Herwig, Ralf; Hardt, Christopher; Lienhard, Matthias; Kamburov, Atanas

    2016-10-01

    ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction network modules, biochemical pathways and functional information that are significantly enriched by the user's input, applying computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up functional assay experiments and to generate topology for kinetic models at different scales.

  10. De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

    PubMed

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

  11. De Novo Assembly, Characterization and Functional Annotation of Pineapple Fruit Transcriptome through Massively Parallel Sequencing

    PubMed Central

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Background Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. Methodology/Principal Findings To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. Conclusions The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple. PMID:23091603

  12. Automated detection system of single nucleotide polymorphisms using two kinds of functional magnetic nanoparticles

    NASA Astrophysics Data System (ADS)

    Liu, Hongna; Li, Song; Wang, Zhifei; Li, Zhiyang; Deng, Yan; Wang, Hua; Shi, Zhiyang; He, Nongyue

    2008-11-01

    Single nucleotide polymorphisms (SNPs) comprise the most abundant source of genetic variation in the human genome wide codominant SNPs identification. Therefore, large-scale codominant SNPs identification, especially for those associated with complex diseases, has induced the need for completely high-throughput and automated SNP genotyping method. Herein, we present an automated detection system of SNPs based on two kinds of functional magnetic nanoparticles (MNPs) and dual-color hybridization. The amido-modified MNPs (NH 2-MNPs) modified with APTES were used for DNA extraction from whole blood directly by electrostatic reaction, and followed by PCR, was successfully performed. Furthermore, biotinylated PCR products were captured on the streptavidin-coated MNPs (SA-MNPs) and interrogated by hybridization with a pair of dual-color probes to determine SNP, then the genotype of each sample can be simultaneously identified by scanning the microarray printed with the denatured fluorescent probes. This system provided a rapid, sensitive and highly versatile automated procedure that will greatly facilitate the analysis of different known SNPs in human genome.

  13. Effective gene delivery to Trypanosoma cruzi epimastigotes through nucleofection.

    PubMed

    Pacheco-Lugo, Lisandro; Díaz-Olmos, Yirys; Sáenz-García, José; Probst, Christian Macagnan; DaRocha, Wanderson Duarte

    2017-06-01

    New opportunities have raised to study the gene function approaches of Trypanosoma cruzi after its genome sequencing in 2005. Functional genomic approaches in Trypanosoma cruzi are challenging due to the reduced tools available for genetic manipulation, as well as to the reduced efficiency of the transient transfection conducted through conventional methods. The Amaxa nucleofector device was systematically tested in the present study in order to improve the electroporation conditions in the epimastigote forms of T. cruzi. The transfection efficiency was quantified using the green fluorescent protein (GFP) as reporter gene followed by cell survival assessment. The herein used nucleofection parameters have increased the survival rates (>90%) and the transfection efficiency by approximately 35%. The small amount of epimastigotes and DNA required for the nucleofection can turn the method adopted here into an attractive tool for high throughput screening (HTS) applications, and for gene editing in parasites where genetic manipulation tools remain relatively scarce. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. What is biodiversity? Stepping forward from barcoding to understanding biological differences.

    PubMed

    Nikinmaa, Mikko

    2014-10-01

    This opinion paper gives personal views of the direction that cataloguing biodiversity should be going in. Although molecular taxonomy enables rapid and high throughput identification of species, it needs to be anchored to traditional taxonomy, because without information of actual biological properties of species, DNA barcoding just reports differences in selected DNA sequences, which need not have anything to do with the biological properties of the organisms, and the reasons for the development of the species. Since functional differences are the most common reason behind species differences, the future of cataloguing biodiversity and biodiversity research is, in my opinion, in trying to integrate genomic research to comparative physiology in order to be able to evaluate which functional properties have likely been important in generating biodiversity. This task is overwhelming, and requires forgetting the traditional disciplines. Further, major problems associated with the present-day treatment of genomic data are presented from my viewpoint. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. Exploring FlyBase Data Using QuickSearch.

    PubMed

    Marygold, Steven J; Antonazzo, Giulia; Attrill, Helen; Costa, Marta; Crosby, Madeline A; Dos Santos, Gilberto; Goodman, Joshua L; Gramates, L Sian; Matthews, Beverley B; Rey, Alix J; Thurmond, Jim

    2016-12-08

    FlyBase (flybase.org) is the primary online database of genetic, genomic, and functional information about Drosophila species, with a major focus on the model organism Drosophila melanogaster. The long and rich history of Drosophila research, combined with recent surges in genomic-scale and high-throughput technologies, mean that FlyBase now houses a huge quantity of data. Researchers need to be able to rapidly and intuitively query these data, and the QuickSearch tool has been designed to meet these needs. This tool is conveniently located on the FlyBase homepage and is organized into a series of simple tabbed interfaces that cover the major data and annotation classes within the database. This unit describes the functionality of all aspects of the QuickSearch tool. With this knowledge, FlyBase users will be equipped to take full advantage of all QuickSearch features and thereby gain improved access to data relevant to their research. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  16. Sequencing and annotation of mitochondrial genomes from individual parasitic helminths.

    PubMed

    Jex, Aaron R; Littlewood, D Timothy; Gasser, Robin B

    2015-01-01

    Mitochondrial (mt) genomics has significant implications in a range of fundamental areas of parasitology, including evolution, systematics, and population genetics as well as explorations of mt biochemistry, physiology, and function. Mt genomes also provide a rich source of markers to aid molecular epidemiological and ecological studies of key parasites. However, there is still a paucity of information on mt genomes for many metazoan organisms, particularly parasitic helminths, which has often related to challenges linked to sequencing from tiny amounts of material. The advent of next-generation sequencing (NGS) technologies has paved the way for low cost, high-throughput mt genomic research, but there have been obstacles, particularly in relation to post-sequencing assembly and analyses of large datasets. In this chapter, we describe protocols for the efficient amplification and sequencing of mt genomes from small portions of individual helminths, and highlight the utility of NGS platforms to expedite mt genomics. In addition, we recommend approaches for manual or semi-automated bioinformatic annotation and analyses to overcome the bioinformatic "bottleneck" to research in this area. Taken together, these approaches have demonstrated applicability to a range of parasites and provide prospects for using complete mt genomic sequence datasets for large-scale molecular systematic and epidemiological studies. In addition, these methods have broader utility and might be readily adapted to a range of other medium-sized molecular regions (i.e., 10-100 kb), including large genomic operons, and other organellar (e.g., plastid) and viral genomes.

  17. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    PubMed Central

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813

  18. Fungal proteomics: from identification to function.

    PubMed

    Doyle, Sean

    2011-08-01

    Some fungi cause disease in humans and plants, while others have demonstrable potential for the control of insect pests. In addition, fungi are also a rich reservoir of therapeutic metabolites and industrially useful enzymes. Detailed analysis of fungal biochemistry is now enabled by multiple technologies including protein mass spectrometry, genome and transcriptome sequencing and advances in bioinformatics. Yet, the assignment of function to fungal proteins, encoded either by in silico annotated, or unannotated genes, remains problematic. The purpose of this review is to describe the strategies used by many researchers to reveal protein function in fungi, and more importantly, to consolidate the nomenclature of 'unknown function protein' as opposed to 'hypothetical protein' - once any protein has been identified by protein mass spectrometry. A combination of approaches including comparative proteomics, pathogen-induced protein expression and immunoproteomics are outlined, which, when used in combination with a variety of other techniques (e.g. functional genomics, microarray analysis, immunochemical and infection model systems), appear to yield comprehensive and definitive information on protein function in fungi. The relative advantages of proteomic, as opposed to transcriptomic-only, analyses are also described. In the future, combined high-throughput, quantitative proteomics, allied to transcriptomic sequencing, are set to reveal much about protein function in fungi. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  19. ISOL@: an Italian SOLAnaceae genomics resource.

    PubMed

    Chiusano, Maria Luisa; D'Agostino, Nunzio; Traini, Alessandra; Licciardello, Concetta; Raimondo, Enrico; Aversano, Mario; Frusciante, Luigi; Monti, Luigi

    2008-03-26

    Present-day '-omics' technologies produce overwhelming amounts of data which include genome sequences, information on gene expression (transcripts and proteins) and on cell metabolic status. These data represent multiple aspects of a biological system and need to be investigated as a whole to shed light on the mechanisms which underpin the system functionality. The gathering and convergence of data generated by high-throughput technologies, the effective integration of different data-sources and the analysis of the information content based on comparative approaches are key methods for meaningful biological interpretations. In the frame of the International Solanaceae Genome Project, we propose here ISOLA, an Italian SOLAnaceae genomics resource. ISOLA (available at http://biosrv.cab.unina.it/isola) represents a trial platform and it is conceived as a multi-level computational environment.ISOLA currently consists of two main levels: the genome and the expression level. The cornerstone of the genome level is represented by the Solanum lycopersicum genome draft sequences generated by the International Tomato Genome Sequencing Consortium. Instead, the basic element of the expression level is the transcriptome information from different Solanaceae species, mainly in the form of species-specific comprehensive collections of Expressed Sequence Tags (ESTs). The cross-talk between the genome and the expression levels is based on data source sharing and on tools that enhance data quality, that extract information content from the levels' under parts and produce value-added biological knowledge. ISOLA is the result of a bioinformatics effort that addresses the challenges of the post-genomics era. It is designed to exploit '-omics' data based on effective integration to acquire biological knowledge and to approach a systems biology view. Beyond providing experimental biologists with a preliminary annotation of the tomato genome, this effort aims to produce a trial computational environment where different aspects and details are maintained as they are relevant for the analysis of the organization, the functionality and the evolution of the Solanaceae family.

  20. Emerging techniques for the discovery and validation of therapeutic targets for skeletal diseases.

    PubMed

    Cho, Christine H; Nuttall, Mark E

    2002-12-01

    Advances in genomics and proteomics have revolutionised the drug discovery process and target validation. Identification of novel therapeutic targets for chronic skeletal diseases is an extremely challenging process based on the difficulty of obtaining high-quality human diseased versus normal tissue samples. The quality of tissue and genomic information obtained from the sample is critical to identifying disease-related genes. Using a genomics-based approach, novel genes or genes with similar homology to existing genes can be identified from cDNA libraries generated from normal versus diseased tissue. High-quality cDNA libraries are prepared from uncontaminated homogeneous cell populations harvested from tissue sections of interest. Localised gene expression analysis and confirmation are obtained through in situ hybridisation or immunohistochemical studies. Cells overexpressing the recombinant protein are subsequently designed for primary cell-based high-throughput assays that are capable of screening large compound banks for potential hits. Afterwards, secondary functional assays are used to test promising compounds. The same overexpressing cells are used in the secondary assay to test protein activity and functionality as well as screen for small-molecule agonists or antagonists. Once a hit is generated, a structure-activity relationship of the compound is optimised for better oral bioavailability and pharmacokinetics allowing the compound to progress into development. Parallel efforts from proteomics, as well as genetics/transgenics, bioinformatics and combinatorial chemistry, and improvements in high-throughput automation technologies, allow the drug discovery process to meet the demands of the medicinal market. This review discusses and illustrates how different approaches are incorporated into the discovery and validation of novel targets and, consequently, the development of potentially therapeutic agents in the areas of osteoporosis and osteoarthritis. While current treatments exist in the form of hormone replacement therapy, antiresorptive and anabolic agents for osteoporosis, there are no disease-modifying therapies for the treatment of the most common human joint disease, osteoarthritis. A massive market potential for improved options with better safety and efficacy still remains. Therefore, the application of genomics and proteomics for both diseases should provide much needed novel therapeutic approaches to treating these major world health problems.

  1. SNP-based genotyping in lentil: linking sequence information with phenotypes

    USDA-ARS?s Scientific Manuscript database

    Lentil (Lens culinaris) has been late to enter the world of high throughput molecular analysis due to a general lack of genomic resources. Using a 454 sequencing-based approach, SNPs have been identified in genes across the lentil genome. Several hundred have been turned into single SNP KASP assay...

  2. Mining conifers’ mega-genome using rapid and efficient multiplexed high-throughput genotyping-by-sequencing (GBS) SNP discovery platform

    USDA-ARS?s Scientific Manuscript database

    Next-generation sequencing (NGS) technologies are revolutionizing both medical and biological research through generation of massive SNP data sets for identifying heritable genome variation underlying key traits, from rare human diseases to important agronomic phenotypes in crop species. We evaluate...

  3. Short-read, high-throughput sequencing technology for STR genotyping

    PubMed Central

    Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.

    2013-01-01

    DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315

  4. VESPA: Software to Facilitate Genomic Annotation of Prokaryotic Organisms Through Integration of Proteomic and Transcriptomic Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peterson, Elena S.; McCue, Lee Ann; Rutledge, Alexandra C.

    2012-04-25

    Visual Exploration and Statistics to Promote Annotation (VESPA) is an interactive visual analysis software tool that facilitates the discovery of structural mis-annotations in prokaryotic genomes. VESPA integrates high-throughput peptide-centric proteomics data and oligo-centric or RNA-Seq transcriptomics data into a genomic context. The data may be interrogated via visual analysis across multiple levels of genomic resolution, linked searches, exports and interaction with BLAST to rapidly identify location of interest within the genome and evaluate potential mis-annotations.

  5. fluff: exploratory analysis and visualization of high-throughput sequencing data

    PubMed Central

    Georgiou, Georgios

    2016-01-01

    Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532

  6. Measuring Sister Chromatid Cohesion Protein Genome Occupancy in Drosophila melanogaster by ChIP-seq.

    PubMed

    Dorsett, Dale; Misulovin, Ziva

    2017-01-01

    This chapter presents methods to conduct and analyze genome-wide chromatin immunoprecipitation of the cohesin complex and the Nipped-B cohesin loading factor in Drosophila cells using high-throughput DNA sequencing (ChIP-seq). Procedures for isolation of chromatin, immunoprecipitation, and construction of sequencing libraries for the Ion Torrent Proton high throughput sequencer are detailed, and computational methods to calculate occupancy as input-normalized fold-enrichment are described. The results obtained by ChIP-seq are compared to those obtained by ChIP-chip (genomic ChIP using tiling microarrays), and the effects of sequencing depth on the accuracy are analyzed. ChIP-seq provides similar sensitivity and reproducibility as ChIP-chip, and identifies the same broad regions of occupancy. The locations of enrichment peaks, however, can differ between ChIP-chip and ChIP-seq, and low sequencing depth can splinter broad regions of occupancy into distinct peaks.

  7. Development of an Efficient Genome Editing Method by CRISPR/Cas9 in a Fish Cell Line.

    PubMed

    Dehler, Carola E; Boudinot, Pierre; Martin, Samuel A M; Collet, Bertrand

    2016-08-01

    CRISPR/Cas9 system has been used widely in animals and plants to direct mutagenesis. To date, no such method exists for fish somatic cell lines. We describe an efficient procedure for genome editing in the Chinook salmon Oncorhynchus tshawytscha CHSE. This cell line was genetically modified to firstly overexpress a monomeric form of EGFP (cell line CHSE-E Geneticin resistant) and additionally to overexpress nCas9n, a nuclear version of Cas9 (cell line CHSE-EC, Hygromycin and Geneticin resistant). A pre-validated sgRNA was produced in vitro and used to transfect CHSE-EC cells. The EGFP gene was disrupted in 34.6 % of cells, as estimated by FACS and microscopy. The targeted locus was characterised by PCR amplification, cloning and sequencing of PCR products; inactivation of the EGFP gene by deletions in the expected site was validated in 25 % of clones. This method opens perspectives for functional genomic studies compatible with high-throughput screening.

  8. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library.

    PubMed

    Koike-Yusa, Hiroko; Li, Yilong; Tan, E-Pien; Velasco-Herrera, Martin Del Castillo; Yusa, Kosuke

    2014-03-01

    Identification of genes influencing a phenotype of interest is frequently achieved through genetic screening by RNA interference (RNAi) or knockouts. However, RNAi may only achieve partial depletion of gene activity, and knockout-based screens are difficult in diploid mammalian cells. Here we took advantage of the efficiency and high throughput of genome editing based on type II, clustered, regularly interspaced, short palindromic repeats (CRISPR)-CRISPR-associated (Cas) systems to introduce genome-wide targeted mutations in mouse embryonic stem cells (ESCs). We designed 87,897 guide RNAs (gRNAs) targeting 19,150 mouse protein-coding genes and used a lentiviral vector to express these gRNAs in ESCs that constitutively express Cas9. Screening the resulting ESC mutant libraries for resistance to either Clostridium septicum alpha-toxin or 6-thioguanine identified 27 known and 4 previously unknown genes implicated in these phenotypes. Our results demonstrate the potential for efficient loss-of-function screening using the CRISPR-Cas9 system.

  9. The NCI Genomic Data Commons as an engine for precision medicine.

    PubMed

    Jensen, Mark A; Ferretti, Vincent; Grossman, Robert L; Staudt, Louis M

    2017-07-27

    The National Cancer Institute Genomic Data Commons (GDC) is an information system for storing, analyzing, and sharing genomic and clinical data from patients with cancer. The recent high-throughput sequencing of cancer genomes and transcriptomes has produced a big data problem that precludes many cancer biologists and oncologists from gleaning knowledge from these data regarding the nature of malignant processes and the relationship between tumor genomic profiles and treatment response. The GDC aims to democratize access to cancer genomic data and to foster the sharing of these data to promote precision medicine approaches to the diagnosis and treatment of cancer.

  10. Image Harvest: an open-source platform for high-throughput plant image processing and analysis

    PubMed Central

    Knecht, Avi C.; Campbell, Malachy T.; Caprez, Adam; Swanson, David R.; Walia, Harkamal

    2016-01-01

    High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. PMID:27141917

  11. Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

    PubMed Central

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

    2015-01-01

    ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644

  12. The USC Epigenome Center.

    PubMed

    Laird, Peter W

    2009-10-01

    The University of Southern California (USC, CA, USA) has a long tradition of excellence in epigenetics. With the recent explosive growth and technological maturation of the field of epigenetics, it became clear that a dedicated high-throughput epigenomic data production facility would be needed to remain at the forefront of epigenetic research. To address this need, USC launched the USC Epigenome Center as the first large-scale center in academics dedicated to epigenomic research. The Center is providing high-throughput data production for large-scale genomic and epigenomic studies, and developing novel analysis tools for epigenomic research. This unique facility promises to be a valuable resource for multidisciplinary research, education and training in genomics, epigenomics, bioinformatics, and translational medicine.

  13. Recent Progress in CFTR Interactome Mapping and Its Importance for Cystic Fibrosis.

    PubMed

    Lim, Sang Hyun; Legere, Elizabeth-Ann; Snider, Jamie; Stagljar, Igor

    2017-01-01

    Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) is a chloride channel found in secretory epithelia with a plethora of known interacting proteins. Mutations in the CFTR gene cause cystic fibrosis (CF), a disease that leads to progressive respiratory illness and other complications of phenotypic variance resulting from perturbations of this protein interaction network. Studying the collection of CFTR interacting proteins and the differences between the interactomes of mutant and wild type CFTR provides insight into the molecular machinery of the disease and highlights possible therapeutic targets. This mini review focuses on functional genomics and proteomics approaches used for systematic, high-throughput identification of CFTR-interacting proteins to provide comprehensive insight into CFTR regulation and function.

  14. Diversity and Composition of Sulfate-Reducing Microbial Communities Based on Genomic DNA and RNA Transcription in Production Water of High Temperature and Corrosive Oil Reservoir

    PubMed Central

    Li, Xiao-Xiao; Liu, Jin-Feng; Zhou, Lei; Mbadinga, Serge M.; Yang, Shi-Zhong; Gu, Ji-Dong; Mu, Bo-Zhong

    2017-01-01

    Deep subsurface petroleum reservoir ecosystems harbor a high diversity of microorganisms, and microbial influenced corrosion is a major problem for the petroleum industry. Here, we used high-throughput sequencing to explore the microbial communities based on genomic 16S rDNA and metabolically active 16S rRNA analyses of production water samples with different extents of corrosion from a high-temperature oil reservoir. Results showed that Desulfotignum and Roseovarius were the most abundant genera in both genomic and active bacterial communities of all the samples. Both genomic and active archaeal communities were mainly composed of Archaeoglobus and Methanolobus. Within both bacteria and archaea, the active and genomic communities were compositionally distinct from one another across the different oil wells (bacteria p = 0.002; archaea p = 0.01). In addition, the sulfate-reducing microorganisms (SRMs) were specifically assessed by Sanger sequencing of functional genes aprA and dsrA encoding the enzymes adenosine-5′-phosphosulfate reductase and dissimilatory sulfite reductase, respectively. Functional gene analysis indicated that potentially active Archaeoglobus, Desulfotignum, Desulfovibrio, and Thermodesulforhabdus were frequently detected, with Archaeoglobus as the most abundant and active sulfate-reducing group. Canonical correspondence analysis revealed that the SRM communities in petroleum reservoir system were closely related to pH of the production water and sulfate concentration. This study highlights the importance of distinguishing the metabolically active microorganisms from the genomic community and extends our knowledge on the active SRM communities in corrosive petroleum reservoirs. PMID:28638372

  15. Automated Purification of Recombinant Proteins: Combining High-throughput with High Yield

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin, Chiann Tso; Moore, Priscilla A.; Auberry, Deanna L.

    2006-05-01

    Protein crystallography, mapping protein interactions and other approaches of current functional genomics require not only purifying large numbers of proteins but also obtaining sufficient yield and homogeneity for downstream high-throughput applications. There is a need for the development of robust automated high-throughput protein expression and purification processes to meet these requirements. We developed and compared two alternative workflows for automated purification of recombinant proteins based on expression of bacterial genes in Escherichia coli: First - a filtration separation protocol based on expression of 800 ml E. coli cultures followed by filtration purification using Ni2+-NTATM Agarose (Qiagen). Second - a smallermore » scale magnetic separation method based on expression in 25 ml cultures of E.coli followed by 96-well purification on MagneHisTM Ni2+ Agarose (Promega). Both workflows provided comparable average yields of proteins about 8 ug of purified protein per unit of OD at 600 nm of bacterial culture. We discuss advantages and limitations of the automated workflows that can provide proteins more than 90 % pure in the range of 100 ug – 45 mg per purification run as well as strategies for optimization of these protocols.« less

  16. Detection of COPB2 as a KRAS synthetic lethal partner through integration of functional genomics screens

    PubMed Central

    Christodoulou, Eleni G.; Yang, Hai; Lademann, Franziska; Pilarsky, Christian; Beyer, Andreas; Schroeder, Michael

    2017-01-01

    Mutated KRAS plays an important role in many cancers. Although targeting KRAS directly is difficult, indirect inactivation via synthetic lethal partners (SLPs) is promising. Yet to date, there are no SLPs from high-throughput RNAi screening, which are supported by multiple screens. Here, we address this problem by aggregating and ranking data over three independent high-throughput screens. We integrate rankings by minimizing the displacement and by considering established methods such as RIGER and RSA. Our meta analysis reveals COPB2 as a potential SLP of KRAS with good support from all three screens. COPB2 is a coatomer subunit and its knock down has already been linked to disabled autophagy and reduced tumor growth. We confirm COPB2 as SLP in knock down experiments on pancreas and colorectal cancer cell lines. Overall, consistent integration of high throughput data can generate candidate synthetic lethal partners, which individual screens do not uncover. Concretely, we reveal and confirm that COPB2 is a synthetic lethal partner of KRAS and hence a promising cancer target. Ligands inhibiting COPB2 may, therefore, be promising new cancer drugs. PMID:28415695

  17. Evaluating High-Throughput Ab Initio Gene Finders to Discover Proteins Encoded in Eukaryotic Pathogen Genomes Missed by Laboratory Techniques

    PubMed Central

    Goodswen, Stephen J.; Kennedy, Paul J.; Ellis, John T.

    2012-01-01

    Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen’s genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab initio gene finders use statistical models to predict genes and their exon-intron structures from the genome sequence alone. This paper evaluates whether existing ab initio gene finders can effectively predict genes to deduce proteins that have presently missed capture by laboratory techniques. An aim here is to identify possible patterns of prediction inaccuracies for gene finders as a whole irrespective of the target pathogen. All currently available ab initio gene finders are considered in the evaluation but only four fulfil high-throughput capability: AUGUSTUS, GeneMark_hmm, GlimmerHMM, and SNAP. These gene finders require training data specific to a target pathogen and consequently the evaluation results are inextricably linked to the availability and quality of the data. The pathogen, Toxoplasma gondii, is used to illustrate the evaluation methods. The results support current opinion that predicted exons by ab initio gene finders are inaccurate in the absence of experimental evidence. However, the results reveal some patterns of inaccuracy that are common to all gene finders and these inaccuracies may provide a focus area for future gene finder developers. PMID:23226328

  18. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    PubMed

    Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca

    2015-01-01

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  19. The draft genome of the C3 panicoid grass species Dichanthelium oligosanthes.

    PubMed

    Studer, Anthony J; Schnable, James C; Weissmann, Sarit; Kolbe, Allison R; McKain, Michael R; Shao, Ying; Cousins, Asaph B; Kellogg, Elizabeth A; Brutnell, Thomas P

    2016-10-28

    Comparisons between C 3 and C 4 grasses often utilize C 3 species from the subfamilies Ehrhartoideae or Pooideae and C 4 species from the subfamily Panicoideae, two clades that diverged over 50 million years ago. The divergence of the C 3 panicoid grass Dichanthelium oligosanthes from the independent C 4 lineages represented by Setaria viridis and Sorghum bicolor occurred approximately 15 million years ago, which is significantly more recent than members of the Bambusoideae, Ehrhartoideae, and Pooideae subfamilies. D. oligosanthes is ideally placed within the panicoid clade for comparative studies of C 3 and C 4 grasses. We report the assembly of the nuclear and chloroplast genomes of D. oligosanthes, from high-throughput short read sequencing data and a comparative transcriptomics analysis of the developing leaf of D. oligosanthes, S. viridis, and S. bicolor. Physiological and anatomical characterizations verified that D. oligosanthes utilizes the C 3 pathway for carbon fixation and lacks Kranz anatomy. Expression profiles of transcription factors along developing leaves of D. oligosanthes and S. viridis were compared with previously published data from S. bicolor, Zea mays, and Oryza sativa to identify a small suite of transcription factors that likely acquired functions specifically related to C 4 photosynthesis. The phylogenetic location of D. oligosanthes makes it an ideal C 3 plant for comparative analysis of C 4 evolution in the panicoid grasses. This genome will not only provide a better C 3 species for comparisons with C 4 panicoid grasses, but also highlights the power of using high-throughput sequencing to address questions in evolutionary biology.

  20. The draft genome of the C 3 panicoid grass species Dichanthelium oligosanthes

    DOE PAGES

    Studer, Anthony J.; Schnable, James C.; Weissmann, Sarit; ...

    2016-10-28

    Comparisons between C 3 and C 4 grasses often utilize C 3 species from the subfamilies Ehrhartoideae or Pooideae and C 4 species from the subfamily Panicoideae, two clades that diverged over 50 million years ago. The divergence of the C 3 panicoid grass Dichanthelium oligosanthes from the independent C 4 lineages represented by Setaria viridis and Sorghum bicolor occurred approximately 15 million years ago, which is significantly more recent than members of the Bambusoideae, Ehrhartoideae, and Pooideae subfamilies. D. oligosanthes is ideally placed within the panicoid clade for comparative studies of C 3 and C 4 grasses. Here, wemore » report the assembly of the nuclear and chloroplast genomes of D. oligosanthes, from high-throughput short read sequencing data and a comparative transcriptomics analysis of the developing leaf of D. oligosanthes, S. viridis, and S. bicolor. Physiological and anatomical characterizations verified that D. oligosanthes utilizes the C 3 pathway for carbon fixation and lacks Kranz anatomy. Expression profiles of transcription factors along developing leaves of D. oligosanthes and S. viridis were compared with previously published data from S. bicolor, Zea mays, and Oryza sativa to identify a small suite of transcription factors that likely acquired functions specifically related to C 4 photosynthesis. In conclusion, the phylogenetic location of D. oligosanthes makes it an ideal C 3 plant for comparative analysis of C 4 evolution in the panicoid grasses. This genome will not only provide a better C 3 species for comparisons with C 4 panicoid grasses, but also highlights the power of using high-throughput sequencing to address questions in evolutionary biology.« less

  1. Overview Article: Identifying transcriptional cis-regulatory modules in animal genomes

    PubMed Central

    Suryamohan, Kushal; Halfon, Marc S.

    2014-01-01

    Gene expression is regulated through the activity of transcription factors and chromatin modifying proteins acting on specific DNA sequences, referred to as cis-regulatory elements. These include promoters, located at the transcription initiation sites of genes, and a variety of distal cis-regulatory modules (CRMs), the most common of which are transcriptional enhancers. Because regulated gene expression is fundamental to cell differentiation and acquisition of new cell fates, identifying, characterizing, and understanding the mechanisms of action of CRMs is critical for understanding development. CRM discovery has historically been challenging, as CRMs can be located far from the genes they regulate, have few readily-identifiable sequence characteristics, and for many years were not amenable to high-throughput discovery methods. However, the recent availability of complete genome sequences and the development of next-generation sequencing methods has led to an explosion of both computational and empirical methods for CRM discovery in model and non-model organisms alike. Experimentally, CRMs can be identified through chromatin immunoprecipitation directed against transcription factors or histone post-translational modifications, identification of nucleosome-depleted “open” chromatin regions, or sequencing-based high-throughput functional screening. Computational methods include comparative genomics, clustering of known or predicted transcription factor binding sites, and supervised machine-learning approaches trained on known CRMs. All of these methods have proven effective for CRM discovery, but each has its own considerations and limitations, and each is subject to a greater or lesser number of false-positive identifications. Experimental confirmation of predictions is essential, although shortcomings in current methods suggest that additional means of validation need to be developed. PMID:25704908

  2. The draft genome of the C 3 panicoid grass species Dichanthelium oligosanthes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Studer, Anthony J.; Schnable, James C.; Weissmann, Sarit

    Comparisons between C 3 and C 4 grasses often utilize C 3 species from the subfamilies Ehrhartoideae or Pooideae and C 4 species from the subfamily Panicoideae, two clades that diverged over 50 million years ago. The divergence of the C 3 panicoid grass Dichanthelium oligosanthes from the independent C 4 lineages represented by Setaria viridis and Sorghum bicolor occurred approximately 15 million years ago, which is significantly more recent than members of the Bambusoideae, Ehrhartoideae, and Pooideae subfamilies. D. oligosanthes is ideally placed within the panicoid clade for comparative studies of C 3 and C 4 grasses. Here, wemore » report the assembly of the nuclear and chloroplast genomes of D. oligosanthes, from high-throughput short read sequencing data and a comparative transcriptomics analysis of the developing leaf of D. oligosanthes, S. viridis, and S. bicolor. Physiological and anatomical characterizations verified that D. oligosanthes utilizes the C 3 pathway for carbon fixation and lacks Kranz anatomy. Expression profiles of transcription factors along developing leaves of D. oligosanthes and S. viridis were compared with previously published data from S. bicolor, Zea mays, and Oryza sativa to identify a small suite of transcription factors that likely acquired functions specifically related to C 4 photosynthesis. In conclusion, the phylogenetic location of D. oligosanthes makes it an ideal C 3 plant for comparative analysis of C 4 evolution in the panicoid grasses. This genome will not only provide a better C 3 species for comparisons with C 4 panicoid grasses, but also highlights the power of using high-throughput sequencing to address questions in evolutionary biology.« less

  3. Data mining for discovery of endophytic and epiphytic fungal diversity in short-read genomic data from deciduous trees

    Treesearch

    Nicholas R. ​LaBonte; James Jacobs; Aziz Ebrahimi; Shaneka Lawson; Keith Woeste

    2018-01-01

    High-throughput sequencing of DNA barcodes, such as the internal transcribed spacer (ITS) of the 16s rRNA sequence, has expanded the ability of researchers to investigate the endophytic fungal communities of living plants. With a large and growing database of complete fungal genomes, it may be possible to utilize portions of fungal symbiont genomes outside conventional...

  4. High-throughput analysis of the satellitome illuminates satellite DNA evolution

    NASA Astrophysics Data System (ADS)

    Ruiz-Ruano, Francisco J.; López-León, María Dolores; Cabrero, Josefa; Camacho, Juan Pedro M.

    2016-07-01

    Satellite DNA (satDNA) is a major component yet the great unknown of eukaryote genomes and clearly underrepresented in genome sequencing projects. Here we show the high-throughput analysis of satellite DNA content in the migratory locust by means of the bioinformatic analysis of Illumina reads with the RepeatExplorer and RepeatMasker programs. This unveiled 62 satDNA families and we propose the term “satellitome” for the whole collection of different satDNA families in a genome. The finding that satDNAs were present in many contigs of the migratory locust draft genome indicates that they show many genomic locations invisible by fluorescent in situ hybridization (FISH). The cytological pattern of five satellites showing common descent (belonging to the SF3 superfamily) suggests that non-clustered satDNAs can become into clustered through local amplification at any of the many genomic loci resulting from previous dissemination of short satDNA arrays. The fact that all kinds of satDNA (micro- mini- and satellites) can show the non-clustered and clustered states suggests that all these elements are mostly similar, except for repeat length. Finally, the presence of VNTRs in bacteria, showing similar properties to non-clustered satDNAs in eukaryotes, suggests that this kind of tandem repeats show common properties in all living beings.

  5. Multiplex High-Throughput Targeted Proteomic Assay To Identify Induced Pluripotent Stem Cells.

    PubMed

    Baud, Anna; Wessely, Frank; Mazzacuva, Francesca; McCormick, James; Camuzeaux, Stephane; Heywood, Wendy E; Little, Daniel; Vowles, Jane; Tuefferd, Marianne; Mosaku, Olukunbi; Lako, Majlinda; Armstrong, Lyle; Webber, Caleb; Cader, M Zameel; Peeters, Pieter; Gissen, Paul; Cowley, Sally A; Mills, Kevin

    2017-02-21

    Induced pluripotent stem cells have great potential as a human model system in regenerative medicine, disease modeling, and drug screening. However, their use in medical research is hampered by laborious reprogramming procedures that yield low numbers of induced pluripotent stem cells. For further applications in research, only the best, competent clones should be used. The standard assays for pluripotency are based on genomic approaches, which take up to 1 week to perform and incur significant cost. Therefore, there is a need for a rapid and cost-effective assay able to distinguish between pluripotent and nonpluripotent cells. Here, we describe a novel multiplexed, high-throughput, and sensitive peptide-based multiple reaction monitoring mass spectrometry assay, allowing for the identification and absolute quantitation of multiple core transcription factors and pluripotency markers. This assay provides simpler and high-throughput classification into either pluripotent or nonpluripotent cells in 7 min analysis while being more cost-effective than conventional genomic tests.

  6. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

    PubMed

    Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Carbonell, Silvia; Pérez-Lluch, Sílvia; Abad, Amaya; Davis, Carrie; Gingeras, Thomas R; Frankish, Adam; Harrow, Jennifer; Guigo, Roderic; Johnson, Rory

    2017-12-01

    Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.

  7. Genome-wide ENU mutagenesis for the discovery of novel male fertility regulators.

    PubMed

    Jamsai, Duangporn; O'Bryan, Moira K

    2010-06-01

    The completion of genome sequencing projects has provided an extensive knowledge of the contents of the genomes of human, mouse, and many other organisms. Despite this, the function of most of the estimated 25,000 human genes remains largely unknown. Attention has now turned to elucidating gene function and identifying biological pathways that contribute to human diseases, including male infertility. Our understanding of the genetic regulation of male fertility has been accelerated through the use of genetically modified mouse models including knockout, knock-in, gene-trapped, and transgenic mice. Such reverse genetic approaches however, require some fore-knowledge of a gene's function and, as such, bias against the discovery of completely novel genes and biological pathways. To facilitate high throughput gene discovery, genome-wide mouse mutagenesis via the use of a potent chemical mutagen, N-ethyl-N-nitrosourea (ENU), has been developed over the past decade. This forward genetic, or phenotype-driven, approach relies upon observing a phenotype first, then subsequently defining the underlining genetic defect. Mutations are randomly introduced into the mouse genome via ENU exposure. Through a controlled breeding scheme, mutations causing a phenotype of interest (e.g., male infertility) are then identified by linkage analysis and candidate gene sequencing. This approach allows for the possibility of revealing comprehensive phenotype-genotype relationships for a range of genes and pathways i.e. in addition to null alleles, mice containing partial loss of function or gain-of-function mutations, can be recovered. Such point mutations are likely to be more reflective of those that occur within the human population. Many research groups have successfully used this approach to generate infertile mouse lines and some novel male fertility genes have been revealed. In this review, we focus on the utility of ENU mutagenesis for the discovery of novel male fertility regulators.

  8. HTS-DB: an online resource to publish and query data from functional genomics high-throughput siRNA screening projects.

    PubMed

    Saunders, Rebecca E; Instrell, Rachael; Rispoli, Rossella; Jiang, Ming; Howell, Michael

    2013-01-01

    High-throughput screening (HTS) uses technologies such as RNA interference to generate loss-of-function phenotypes on a genomic scale. As these technologies become more popular, many research institutes have established core facilities of expertise to deal with the challenges of large-scale HTS experiments. As the efforts of core facility screening projects come to fruition, focus has shifted towards managing the results of these experiments and making them available in a useful format that can be further mined for phenotypic discovery. The HTS-DB database provides a public view of data from screening projects undertaken by the HTS core facility at the CRUK London Research Institute. All projects and screens are described with comprehensive assay protocols, and datasets are provided with complete descriptions of analysis techniques. This format allows users to browse and search data from large-scale studies in an informative and intuitive way. It also provides a repository for additional measurements obtained from screens that were not the focus of the project, such as cell viability, and groups these data so that it can provide a gene-centric summary across several different cell lines and conditions. All datasets from our screens that can be made available can be viewed interactively and mined for further hit lists. We believe that in this format, the database provides researchers with rapid access to results of large-scale experiments that might facilitate their understanding of genes/compounds identified in their own research. DATABASE URL: http://hts.cancerresearchuk.org/db/public.

  9. New generation pharmacogenomic tools: a SNP linkage disequilibrium Map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies.

    PubMed

    De La Vega, Francisco M; Dailey, David; Ziegle, Janet; Williams, Julie; Madden, Dawn; Gilbert, Dennis A

    2002-06-01

    Since public and private efforts announced the first draft of the human genome last year, researchers have reported great numbers of single nucleotide polymorphisms (SNPs). We believe that the availability of well-mapped, quality SNP markers constitutes the gateway to a revolution in genetics and personalized medicine that will lead to better diagnosis and treatment of common complex disorders. A new generation of tools and public SNP resources for pharmacogenomic and genetic studies--specifically for candidate-gene, candidate-region, and whole-genome association studies--will form part of the new scientific landscape. This will only be possible through the greater accessibility of SNP resources and superior high-throughput instrumentation-assay systems that enable affordable, highly productive large-scale genetic studies. We are contributing to this effort by developing a high-quality linkage disequilibrium SNP marker map and an accompanying set of ready-to-use, validated SNP assays across every gene in the human genome. This effort incorporates both the public sequence and SNP data sources, and Celera Genomics' human genome assembly and enormous resource ofphysically mapped SNPs (approximately 4,000,000 unique records). This article discusses our approach and methodology for designing the map, choosing quality SNPs, designing and validating these assays, and obtaining population frequency ofthe polymorphisms. We also discuss an advanced, high-performance SNP assay chemisty--a new generation of the TaqMan probe-based, 5' nuclease assay-and high-throughput instrumentation-software system for large-scale genotyping. We provide the new SNP map and validation information, validated SNP assays and reagents, and instrumentation systems as a novel resource for genetic discoveries.

  10. A time-and-motion approach to micro-costing of high-throughput genomic assays

    PubMed Central

    Costa, S.; Regier, D.A.; Meissner, B.; Cromwell, I.; Ben-Neriah, S.; Chavez, E.; Hung, S.; Steidl, C.; Scott, D.W.; Marra, M.A.; Peacock, S.J.; Connors, J.M.

    2016-01-01

    Background Genomic technologies are increasingly used to guide clinical decision-making in cancer control. Economic evidence about the cost-effectiveness of genomic technologies is limited, in part because of a lack of published comprehensive cost estimates. In the present micro-costing study, we used a time-and-motion approach to derive cost estimates for 3 genomic assays and processes—digital gene expression profiling (gep), fluorescence in situ hybridization (fish), and targeted capture sequencing, including bioinformatics analysis—in the context of lymphoma patient management. Methods The setting for the study was the Department of Lymphoid Cancer Research laboratory at the BC Cancer Agency in Vancouver, British Columbia. Mean per-case hands-on time and resource measurements were determined from a series of direct observations of each assay. Per-case cost estimates were calculated using a bottom-up costing approach, with labour, capital and equipment, supplies and reagents, and overhead costs included. Results The most labour-intensive assay was found to be fish at 258.2 minutes per case, followed by targeted capture sequencing (124.1 minutes per case) and digital gep (14.9 minutes per case). Based on a historical case throughput of 180 cases annually, the mean per-case cost (2014 Canadian dollars) was estimated to be $1,029.16 for targeted capture sequencing and bioinformatics analysis, $596.60 for fish, and $898.35 for digital gep with an 807-gene code set. Conclusions With the growing emphasis on personalized approaches to cancer management, the need for economic evaluations of high-throughput genomic assays is increasing. Through economic modelling and budget-impact analyses, the cost estimates presented here can be used to inform priority-setting decisions about the implementation of such assays in clinical practice. PMID:27803594

  11. Heat*seq: an interactive web tool for high-throughput sequencing experiment comparison with public data.

    PubMed

    Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha

    2016-11-01

    Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Web application: http://www.heatstarseq.roslin.ed.ac.uk/ Source code: https://github.com/gdevailly CONTACT: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  12. Networking Omic Data to Envisage Systems Biological Regulation.

    PubMed

    Kalapanulak, Saowalak; Saithong, Treenut; Thammarongtham, Chinae

    To understand how biological processes work, it is necessary to explore the systematic regulation governing the behaviour of the processes. Not only driving the normal behavior of organisms, the systematic regulation evidently underlies the temporal responses to surrounding environments (dynamics) and long-term phenotypic adaptation (evolution). The systematic regulation is, in effect, formulated from the regulatory components which collaboratively work together as a network. In the drive to decipher such a code of lives, a spectrum of technologies has continuously been developed in the post-genomic era. With current advances, high-throughput sequencing technologies are tremendously powerful for facilitating genomics and systems biology studies in the attempt to understand system regulation inside the cells. The ability to explore relevant regulatory components which infer transcriptional and signaling regulation, driving core cellular processes, is thus enhanced. This chapter reviews high-throughput sequencing technologies, including second and third generation sequencing technologies, which support the investigation of genomics and transcriptomics data. Utilization of this high-throughput data to form the virtual network of systems regulation is explained, particularly transcriptional regulatory networks. Analysis of the resulting regulatory networks could lead to an understanding of cellular systems regulation at the mechanistic and dynamics levels. The great contribution of the biological networking approach to envisage systems regulation is finally demonstrated by a broad range of examples.

  13. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library.

    PubMed

    Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E

    2009-11-25

    To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the validated markers were associated with rainbow trout transcripts. The use of reduced representation libraries and pyrosequencing technology proved to be an effective strategy for the discovery of a high number of putative SNPs in rainbow trout; however, modifications to the technique to decrease the false discovery rate resulting from the evolutionary recent genome duplication would be desirable.

  14. Next generation tools for genomic data generation, distribution, and visualization

    PubMed Central

    2010-01-01

    Background With the rapidly falling cost and availability of high throughput sequencing and microarray technologies, the bottleneck for effectively using genomic analysis in the laboratory and clinic is shifting to one of effectively managing, analyzing, and sharing genomic data. Results Here we present three open-source, platform independent, software tools for generating, analyzing, distributing, and visualizing genomic data. These include a next generation sequencing/microarray LIMS and analysis project center (GNomEx); an application for annotating and programmatically distributing genomic data using the community vetted DAS/2 data exchange protocol (GenoPub); and a standalone Java Swing application (GWrap) that makes cutting edge command line analysis tools available to those who prefer graphical user interfaces. Both GNomEx and GenoPub use the rich client Flex/Flash web browser interface to interact with Java classes and a relational database on a remote server. Both employ a public-private user-group security model enabling controlled distribution of patient and unpublished data alongside public resources. As such, they function as genomic data repositories that can be accessed manually or programmatically through DAS/2-enabled client applications such as the Integrated Genome Browser. Conclusions These tools have gained wide use in our core facilities, research laboratories and clinics and are freely available for non-profit use. See http://sourceforge.net/projects/gnomex/, http://sourceforge.net/projects/genoviz/, and http://sourceforge.net/projects/useq. PMID:20828407

  15. Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012more » alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.« less

  16. Genomics of coloration in natural animal populations.

    PubMed

    San-Jose, Luis M; Roulin, Alexandre

    2017-07-05

    Animal coloration has traditionally been the target of genetic and evolutionary studies. However, until very recently, the study of the genetic basis of animal coloration has been mainly restricted to model species, whereas research on non-model species has been either neglected or mainly based on candidate approaches, and thereby limited by the knowledge obtained in model species. Recent high-throughput sequencing technologies allow us to overcome previous limitations, and open new avenues to study the genetic basis of animal coloration in a broader number of species and colour traits, and to address the general relevance of different genetic structures and their implications for the evolution of colour. In this review, we highlight aspects where genome-wide studies could be of major utility to fill in the gaps in our understanding of the biology and evolution of animal coloration. The new genomic approaches have been promptly adopted to study animal coloration although substantial work is still needed to consider a larger range of species and colour traits, such as those exhibiting continuous variation or based on reflective structures. We argue that a robust advancement in the study of animal coloration will also require large efforts to validate the functional role of the genes and variants discovered using genome-wide tools.This article is part of the themed issue 'Animal coloration: production, perception, function and application'. © 2017 The Author(s).

  17. Regulation of Mammalian Gene Dosage by Long Noncoding RNAs

    PubMed Central

    Hung, Ko-Hsuan; Wang, Yang; Zhao, Jing Crystal

    2013-01-01

    Recent transcriptome studies suggest that long noncoding RNAs (lncRNAs) are key components of the mammalian genome, and their study has become a new frontier in biomedical research. In fact, lncRNAs in the mammalian genome were identified and studied at particular epigenetic loci, including imprinted loci and X-chromosome inactivation center, at least two decades ago—long before development of high throughput sequencing technology. Since then, researchers have found that lncRNAs play essential roles in various biological processes, mostly during development. Since much of our understanding of lncRNAs originates from our knowledge of these well-established lncRNAs, in this review we will focus on lncRNAs from the X-chromosome inactivation center and the Dlk1-Dio3 imprinted cluster as examples of lncRNA mechanisms functioning in the epigenetic regulation of mammalian genes. PMID:24970160

  18. Chemical genomics: characterizing target pathways for bioactive compounds using the endomembrane trafficking network.

    PubMed

    Rodriguez-Furlán, Cecilia; Hicks, Glenn R; Norambuena, Lorena

    2014-01-01

    The plant endomembrane trafficking system is a highly complex set of processes. This complexity presents a challenge for its study. Classical plant genetics often struggles with loss-of-function lethality and gene redundancy. Chemical genomics allows overcoming many of these issues by using small molecules of natural or synthetic origin to inhibit specific trafficking proteins thereby affecting the processes in a tunable and reversible manner. Bioactive chemicals identified by high-throughput phenotype screens must be characterized in detail starting with understanding of the specific trafficking pathways affected. Here, we describe approaches to characterize bioactive compounds that perturb vesicle trafficking. This should equip researchers with practical knowledge on how to identify endomembrane-specific trafficking pathways that may be perturbed by specific compounds and will help to eventually identify molecular targets for these small molecules.

  19. Systematic pharmacogenomics analysis of a Malay whole genome: proof of concept for personalized medicine.

    PubMed

    Salleh, Mohd Zaki; Teh, Lay Kek; Lee, Lian Shien; Ismet, Rose Iszati; Patowary, Ashok; Joshi, Kandarp; Pasha, Ayesha; Ahmed, Azni Zain; Janor, Roziah Mohd; Hamzah, Ahmad Sazali; Adam, Aishah; Yusoff, Khalid; Hoh, Boon Peng; Hatta, Fazleen Haslinda Mohd; Ismail, Mohamad Izwan; Scaria, Vinod; Sivasubbu, Sridhar

    2013-01-01

    With a higher throughput and lower cost in sequencing, second generation sequencing technology has immense potential for translation into clinical practice and in the realization of pharmacogenomics based patient care. The systematic analysis of whole genome sequences to assess patient to patient variability in pharmacokinetics and pharmacodynamics responses towards drugs would be the next step in future medicine in line with the vision of personalizing medicine. Genomic DNA obtained from a 55 years old, self-declared healthy, anonymous male of Malay descent was sequenced. The subject's mother died of lung cancer and the father had a history of schizophrenia and deceased at the age of 65 years old. A systematic, intuitive computational workflow/pipeline integrating custom algorithm in tandem with large datasets of variant annotations and gene functions for genetic variations with pharmacogenomics impact was developed. A comprehensive pathway map of drug transport, metabolism and action was used as a template to map non-synonymous variations with potential functional consequences. Over 3 million known variations and 100,898 novel variations in the Malay genome were identified. Further in-depth pharmacogenetics analysis revealed a total of 607 unique variants in 563 proteins, with the eventual identification of 4 drug transport genes, 2 drug metabolizing enzyme genes and 33 target genes harboring deleterious SNVs involved in pharmacological pathways, which could have a potential role in clinical settings. The current study successfully unravels the potential of personal genome sequencing in understanding the functionally relevant variations with potential influence on drug transport, metabolism and differential therapeutic outcomes. These will be essential for realizing personalized medicine through the use of comprehensive computational pipeline for systematic data mining and analysis.

  20. Mapping of disease-associated variants in admixed populations

    PubMed Central

    2011-01-01

    Recent developments in high-throughput genotyping and whole-genome sequencing will enhance the identification of disease loci in admixed populations. We discuss how a more refined estimation of ancestry benefits both admixture mapping and association mapping, making disease loci identification in admixed populations more powerful. High-throughput genotyping and sequencing will enable refined estimation of ancestry, thus enhancing disease loci identification in admixed populations PMID:21635713

  1. TaqMan 5′-Nuclease Human Immunodeficiency Virus Type 1 PCR Assay with Phage-Packaged Competitive Internal Control for High-Throughput Blood Donor Screening

    PubMed Central

    Drosten, C.; Seifried, E.; Roth, W. K.

    2001-01-01

    Screening of blood donors for human immunodeficiency virus type 1 (HIV-1) infection by PCR permits the earlier diagnosis of HIV-1 infection compared with that by serologic assays. We have established a high-throughput reverse transcription (RT)-PCR assay based on 5′-nuclease PCR. By in-tube detection of HIV-1 RNA with a fluorogenic probe, the 5′-nuclease PCR technology (TaqMan PCR) eliminates the risk of carryover contamination, a major problem in PCR testing. We outline the development and evaluation of the PCR assay from a technical point of view. A one-step RT-PCR that targets the gag genes of all known HIV-1 group M isolates was developed. An internal control RNA detectable with a heterologous 5′-nuclease probe was derived from the viral target cDNA and was packaged into MS2 coliphages (Armored RNA). Because the RNA was protected against digestion with RNase, it could be spiked into patient plasma to control the complete sample preparation and amplification process. The assay detected 831 HIV-1 type B genome equivalents per ml of native plasma (95% confidence interval [CI], 759 to 936 HIV-1 B genome equivalents per ml) with a ≥95% probability of a positive result, as determined by probit regression analysis. A detection limit of 1,195 genome equivalents per ml of (individual) donor plasma (95% CI, 1,014 to 1,470 genome equivalents per ml of plasma pooled from individuals) was achieved when 96 samples were pooled and enriched by centrifugation. Up to 4,000 plasma samples per PCR run were tested in a 3-month trial period. Although data from the present pilot feasibility study will have to be complemented by a large clinical validation study, the assay is a promising approach to the high-throughput screening of blood donors and is the first noncommercial test for high-throughput screening for HIV-1. PMID:11724836

  2. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landrace and cultivars

    USDA-ARS?s Scientific Manuscript database

    Domesticated crops have experienced strong human-driven selection aimed at the development of improved varieties adapted to local conditions. To detect regions of the wheat genome subject to selection during improvement, we developed a high-throughput array to interrogate 9,000 gene-associated DNA m...

  3. Phased genotyping-by-sequencing enhances analysis of genetic diversity and reveals divergent copy number variants in maize

    USDA-ARS?s Scientific Manuscript database

    High-throughput sequencing of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken fr...

  4. New DArT markers for oat provide enhanced map coverage and global germplasm characterization

    USDA-ARS?s Scientific Manuscript database

    Genomic discovery in oat and its application to oat improvement have been hindered by a lack of common markers on different genetic maps, and by the difficulty of conducting whole-genome analysis using high throughput markers. In this study we developed, characterized, and applied a large set oat g...

  5. New DArT markers for oat provide enhanced map coverage and global germplasm characterization

    USDA-ARS?s Scientific Manuscript database

    Background Genomic discovery in oat and its application to oat improvement have been hindered by a lack of genetic markers common to different genetic maps, and by the difficulty of conducting whole-genome analysis using high-throughput markers. This study was intended to develop, characterize, and ...

  6. A high-throughput Sanger strategy for human mitochondrial genome sequencing

    PubMed Central

    2013-01-01

    Background A population reference database of complete human mitochondrial genome (mtGenome) sequences is needed to enable the use of mitochondrial DNA (mtDNA) coding region data in forensic casework applications. However, the development of entire mtGenome haplotypes to forensic data quality standards is difficult and laborious. A Sanger-based amplification and sequencing strategy that is designed for automated processing, yet routinely produces high quality sequences, is needed to facilitate high-volume production of these mtGenome data sets. Results We developed a robust 8-amplicon Sanger sequencing strategy that regularly produces complete, forensic-quality mtGenome haplotypes in the first pass of data generation. The protocol works equally well on samples representing diverse mtDNA haplogroups and DNA input quantities ranging from 50 pg to 1 ng, and can be applied to specimens of varying DNA quality. The complete workflow was specifically designed for implementation on robotic instrumentation, which increases throughput and reduces both the opportunities for error inherent to manual processing and the cost of generating full mtGenome sequences. Conclusions The described strategy will assist efforts to generate complete mtGenome haplotypes which meet the highest data quality expectations for forensic genetic and other applications. Additionally, high-quality data produced using this protocol can be used to assess mtDNA data developed using newer technologies and chemistries. Further, the amplification strategy can be used to enrich for mtDNA as a first step in sample preparation for targeted next-generation sequencing. PMID:24341507

  7. Developing High-Throughput HIV Incidence Assay with Pyrosequencing Platform

    PubMed Central

    Park, Sung Yong; Goeken, Nolan; Lee, Hyo Jin; Bolan, Robert; Dubé, Michael P.

    2014-01-01

    ABSTRACT Human immunodeficiency virus (HIV) incidence is an important measure for monitoring the epidemic and evaluating the efficacy of intervention and prevention trials. This study developed a high-throughput, single-measure incidence assay by implementing a pyrosequencing platform. We devised a signal-masking bioinformatics pipeline, which yielded a process error rate of 5.8 × 10−4 per base. The pipeline was then applied to analyze 18,434 envelope gene segments (HXB2 7212 to 7601) obtained from 12 incident and 24 chronic patients who had documented HIV-negative and/or -positive tests. The pyrosequencing data were cross-checked by using the single-genome-amplification (SGA) method to independently obtain 302 sequences from 13 patients. Using two genomic biomarkers that probe for the presence of similar sequences, the pyrosequencing platform correctly classified all 12 incident subjects (100% sensitivity) and 23 of 24 chronic subjects (96% specificity). One misclassified subject's chronic infection was correctly classified by conducting the same analysis with SGA data. The biomarkers were statistically associated across the two platforms, suggesting the assay's reproducibility and robustness. Sampling simulations showed that the biomarkers were tolerant of sequencing errors and template resampling, two factors most likely to affect the accuracy of pyrosequencing results. We observed comparable biomarker scores between AIDS and non-AIDS chronic patients (multivariate analysis of variance [MANOVA], P = 0.12), indicating that the stage of HIV disease itself does not affect the classification scheme. The high-throughput genomic HIV incidence marks a significant step toward determining incidence from a single measure in cross-sectional surveys. IMPORTANCE Annual HIV incidence, the number of newly infected individuals within a year, is the key measure of monitoring the epidemic's rise and decline. Developing reliable assays differentiating recent from chronic infections has been a long-standing quest in the HIV community. Over the past 15 years, these assays have traditionally measured various HIV-specific antibodies, but recent technological advancements have expanded the diversity of proposed accurate, user-friendly, and financially viable tools. Here we designed a high-throughput genomic HIV incidence assay based on the signature imprinted in the HIV gene sequence population. By combining next-generation sequencing techniques with bioinformatics analysis, we demonstrated that genomic fingerprints are capable of distinguishing recently infected patients from chronically infected patients with high precision. Our high-throughput platform is expected to allow us to process many patients' samples from a single experiment, permitting the assay to be cost-effective for routine surveillance. PMID:24371062

  8. Epigenetics of prostate cancer and the prospect of identification of novel drug targets by RNAi screening of epigenetic enzymes.

    PubMed

    Björkman, Mari; Rantala, Juha; Nees, Matthias; Kallioniemi, Olli

    2010-10-01

    Alterations in epigenetic processes probably underlie most human malignancies. Novel genome-wide techniques, such as chromatin immunoprecipitation and high-throughput sequencing, have become state-of-the-art methods to map the epigenomic landscape of development and disease, such as in cancers. Despite these advances, the functional significance of epigenetic enzymes in cancer progression, such as prostate cancer, remain incompletely understood. A comprehensive mapping and functional understanding of the cancer epigenome will hopefully help to facilitate development of novel cancer therapy targets and improve future diagnostics. The authors have developed a novel cell microarray-based high-content siRNA screening technique suitable to address the putative functional role and impact of all known putative and novel epigenetic enzymes in cancer, including prostate cancer.

  9. Complete Genome Sequence of a Naturally Occurring Simian Foamy Virus Isolate from Rhesus Macaque (SFVmmu_K3T).

    PubMed

    Nandakumar, Subhiksha; Bae, Eunhae H; Khan, Arifa S

    2017-08-17

    The full-length genome sequence of a simian foamy virus (SFVmmu_K3T), isolated from a rhesus macaque ( Macaca mulatta ), was obtained using high-throughput sequencing. SFVmmu_K3T consisted of 12,983 bp and had a genomic organization similar to that of other SFVs, with long terminal repeats (LTRs) and open reading frames for Gag, Pol, Env, Tas, and Bet.

  10. Development of Droplet Microfluidics Enabling High-Throughput Single-Cell Analysis.

    PubMed

    Wen, Na; Zhao, Zhan; Fan, Beiyuan; Chen, Deyong; Men, Dong; Wang, Junbo; Chen, Jian

    2016-07-05

    This article reviews recent developments in droplet microfluidics enabling high-throughput single-cell analysis. Five key aspects in this field are included in this review: (1) prototype demonstration of single-cell encapsulation in microfluidic droplets; (2) technical improvements of single-cell encapsulation in microfluidic droplets; (3) microfluidic droplets enabling single-cell proteomic analysis; (4) microfluidic droplets enabling single-cell genomic analysis; and (5) integrated microfluidic droplet systems enabling single-cell screening. We examine the advantages and limitations of each technique and discuss future research opportunities by focusing on key performances of throughput, multifunctionality, and absolute quantification.

  11. High-throughput protein analysis integrating bioinformatics and experimental assays

    PubMed Central

    del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan

    2004-01-01

    The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins. PMID:14762202

  12. High-throughput cell-based screening reveals a role for ZNF131 as a repressor of ERalpha signaling

    PubMed Central

    Han, Xiao; Guo, Jinhai; Deng, Weiwei; Zhang, Chenying; Du, Peige; Shi, Taiping; Ma, Dalong

    2008-01-01

    Background Estrogen receptor α (ERα) is a transcription factor whose activity is affected by multiple regulatory cofactors. In an effort to identify the human genes involved in the regulation of ERα, we constructed a high-throughput, cell-based, functional screening platform by linking a response element (ERE) with a reporter gene. This allowed the cellular activity of ERα, in cells cotransfected with the candidate gene, to be quantified in the presence or absence of its cognate ligand E2. Results From a library of 570 human cDNA clones, we identified zinc finger protein 131 (ZNF131) as a repressor of ERα mediated transactivation. ZNF131 is a typical member of the BTB/POZ family of transcription factors, and shows both ubiquitous expression and a high degree of sequence conservation. The luciferase reporter gene assay revealed that ZNF131 inhibits ligand-dependent transactivation by ERα in a dose-dependent manner. Electrophoretic mobility shift assay clearly demonstrated that the interaction between ZNF131 and ERα interrupts or prevents ERα binding to the estrogen response element (ERE). In addition, ZNF131 was able to suppress the expression of pS2, an ERα target gene. Conclusion We suggest that the functional screening platform we constructed can be applied for high-throughput genomic screening candidate ERα-related genes. This in turn may provide new insights into the underlying molecular mechanisms of ERα regulation in mammalian cells. PMID:18847501

  13. Differential Expression and Functional Analysis of High-Throughput -Omics Data Using Open Source Tools.

    PubMed

    Kebschull, Moritz; Fittler, Melanie Julia; Demmer, Ryan T; Papapanou, Panos N

    2017-01-01

    Today, -omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ, or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier "candidate" gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized -omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease.A major issue when inferring biological information from high-throughput -omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences.In this chapter, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of -omics data generated using microarrays or next-generation sequencing technology using open-source tools. Starting with quality control measures and necessary preprocessing steps for data originating from different -omics technologies, we next outline a differential expression analysis pipeline that can be used for data from both microarray and sequencing experiments, and offers the possibility to account for random or fixed effects. Finally, we present an overview of the possibilities for a functional analysis of the obtained data.

  14. Benchmarking Procedures for High-Throughput Context Specific Reconstruction Algorithms

    PubMed Central

    Pacheco, Maria P.; Pfau, Thomas; Sauter, Thomas

    2016-01-01

    Recent progress in high-throughput data acquisition has shifted the focus from data generation to processing and understanding of how to integrate collected information. Context specific reconstruction based on generic genome scale models like ReconX or HMR has the potential to become a diagnostic and treatment tool tailored to the analysis of specific individuals. The respective computational algorithms require a high level of predictive power, robustness and sensitivity. Although multiple context specific reconstruction algorithms were published in the last 10 years, only a fraction of them is suitable for model building based on human high-throughput data. Beside other reasons, this might be due to problems arising from the limitation to only one metabolic target function or arbitrary thresholding. This review describes and analyses common validation methods used for testing model building algorithms. Two major methods can be distinguished: consistency testing and comparison based testing. The first is concerned with robustness against noise, e.g., missing data due to the impossibility to distinguish between the signal and the background of non-specific binding of probes in a microarray experiment, and whether distinct sets of input expressed genes corresponding to i.e., different tissues yield distinct models. The latter covers methods comparing sets of functionalities, comparison with existing networks or additional databases. We test those methods on several available algorithms and deduce properties of these algorithms that can be compared with future developments. The set of tests performed, can therefore serve as a benchmarking procedure for future algorithms. PMID:26834640

  15. Integration and visualization of systems biology data in context of the genome

    PubMed Central

    2010-01-01

    Background High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment. PMID:20642854

  16. Complete Chloroplast Genome of the Multifunctional Crop Globe Artichoke and Comparison with Other Asteraceae

    PubMed Central

    Curci, Pasquale L.; De Paola, Domenico; Danzi, Donatella; Vendramin, Giovanni G.; Sonnante, Gabriella

    2015-01-01

    With over 20,000 species, Asteraceae is the second largest plant family. High-throughput sequencing of nuclear and chloroplast genomes has allowed for a better understanding of the evolutionary relationships within large plant families. Here, the globe artichoke chloroplast (cp) genome was obtained by a combination of whole-genome and BAC clone high-throughput sequencing. The artichoke cp genome is 152,529 bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 25,155 bp, representing the longest IRs found in the Asteraceae family so far. The large (LSC) and the small (SSC) single-copy regions span 83,578 bp and 18,641 bp, respectively. The artichoke cp sequence was compared to the other eight Asteraceae complete cp genomes available, revealing an IR expansion at the SSC/IR boundary. This expansion consists of 17 bp of the ndhF gene generating an overlap between the ndhF and ycf1 genes. A total of 127 cp simple sequence repeats (cpSSRs) were identified in the artichoke cp genome, potentially suitable for future population studies in the Cynara genus. Parsimony-informative regions were evaluated and allowed to place a Cynara species within the Asteraceae family tree. The eight most informative coding regions were also considered and tested for “specific barcode” purpose in the Asteraceae family. Our results highlight the usefulness of cp genome sequencing in exploring plant genome diversity and retrieving reliable molecular resources for phylogenetic and evolutionary studies, as well as for specific barcodes in plants. PMID:25774672

  17. Complete chloroplast genome of the multifunctional crop globe artichoke and comparison with other Asteraceae.

    PubMed

    Curci, Pasquale L; De Paola, Domenico; Danzi, Donatella; Vendramin, Giovanni G; Sonnante, Gabriella

    2015-01-01

    With over 20,000 species, Asteraceae is the second largest plant family. High-throughput sequencing of nuclear and chloroplast genomes has allowed for a better understanding of the evolutionary relationships within large plant families. Here, the globe artichoke chloroplast (cp) genome was obtained by a combination of whole-genome and BAC clone high-throughput sequencing. The artichoke cp genome is 152,529 bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 25,155 bp, representing the longest IRs found in the Asteraceae family so far. The large (LSC) and the small (SSC) single-copy regions span 83,578 bp and 18,641 bp, respectively. The artichoke cp sequence was compared to the other eight Asteraceae complete cp genomes available, revealing an IR expansion at the SSC/IR boundary. This expansion consists of 17 bp of the ndhF gene generating an overlap between the ndhF and ycf1 genes. A total of 127 cp simple sequence repeats (cpSSRs) were identified in the artichoke cp genome, potentially suitable for future population studies in the Cynara genus. Parsimony-informative regions were evaluated and allowed to place a Cynara species within the Asteraceae family tree. The eight most informative coding regions were also considered and tested for "specific barcode" purpose in the Asteraceae family. Our results highlight the usefulness of cp genome sequencing in exploring plant genome diversity and retrieving reliable molecular resources for phylogenetic and evolutionary studies, as well as for specific barcodes in plants.

  18. A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

    PubMed Central

    Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun

    2016-01-01

    Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591

  19. Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.

    PubMed

    Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.

  20. Community annotation and bioinformatics workforce development in concert—Little Skate Genome Annotation Workshops and Jamborees

    PubMed Central

    Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832

Top