Sample records for genome methods development

  1. Fish genome manipulation and directional breeding.

    PubMed

    Ye, Ding; Zhu, ZuoYan; Sun, YongHua

    2015-02-01

    Aquaculture is one of the fastest developing agricultural industries worldwide. One of the most important factors for sustainable aquaculture is the development of high performing culture strains. Genome manipulation offers a powerful method to achieve rapid and directional breeding in fish. We review the history of fish breeding methods based on classical genome manipulation, including polyploidy breeding and nuclear transfer. Then, we discuss the advances and applications of fish directional breeding based on transgenic technology and recently developed genome editing technologies. These methods offer increased efficiency, precision and predictability in genetic improvement over traditional methods.

  2. GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

    PubMed

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.

  3. [Genome editing of industrial microorganism].

    PubMed

    Zhu, Linjiang; Li, Qi

    2015-03-01

    Genome editing is defined as highly-effective and precise modification of cellular genome in a large scale. In recent years, such genome-editing methods have been rapidly developed in the field of industrial strain improvement. The quickly-updating methods thoroughly change the old mode of inefficient genetic modification, which is "one modification, one selection marker, and one target site". Highly-effective modification mode in genome editing have been developed including simultaneous modification of multiplex genes, highly-effective insertion, replacement, and deletion of target genes in the genome scale, cut-paste of a large DNA fragment. These new tools for microbial genome editing will certainly be applied widely, and increase the efficiency of industrial strain improvement, and promote the revolution of traditional fermentation industry and rapid development of novel industrial biotechnology like production of biofuel and biomaterial. The technological principle of these genome-editing methods and their applications were summarized in this review, which can benefit engineering and construction of industrial microorganism.

  4. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    PubMed

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Genome-Wide Profiling of DNA Double-Strand Breaks by the BLESS and BLISS Methods.

    PubMed

    Mirzazadeh, Reza; Kallas, Tomasz; Bienko, Magda; Crosetto, Nicola

    2018-01-01

    DNA double-strand breaks (DSBs) are major DNA lesions that are constantly formed during physiological processes such as DNA replication, transcription, and recombination, or as a result of exogenous agents such as ionizing radiation, radiomimetic drugs, and genome editing nucleases. Unrepaired DSBs threaten genomic stability by leading to the formation of potentially oncogenic rearrangements such as translocations. In past few years, several methods based on next-generation sequencing (NGS) have been developed to study the genome-wide distribution of DSBs or their conversion to translocation events. We developed Breaks Labeling, Enrichment on Streptavidin, and Sequencing (BLESS), which was the first method for direct labeling of DSBs in situ followed by their genome-wide mapping at nucleotide resolution (Crosetto et al., Nat Methods 10:361-365, 2013). Recently, we have further expanded the quantitative nature, applicability, and scalability of BLESS by developing Breaks Labeling In Situ and Sequencing (BLISS) (Yan et al., Nat Commun 8:15058, 2017). Here, we first present an overview of existing methods for genome-wide localization of DSBs, and then focus on the BLESS and BLISS methods, discussing different assay design options depending on the sample type and application.

  6. Analysis of health trait data from on-farm computer systems in the U.S. II: Comparison of genomic analyses including two-stage and single-step methods

    USDA-ARS?s Scientific Manuscript database

    The development of genomic selection methodology, with accompanying substantial gains in reliability for low-heritability traits, may dramatically improve the feasibility of genetic improvement of dairy cow health. Many methods for genomic analysis have now been developed, including the “Bayesian Al...

  7. Methods of Genomic Competency Integration in Practice

    PubMed Central

    Jenkins, Jean; Calzone, Kathleen A.; Caskey, Sarah; Culp, Stacey; Weiner, Marsha; Badzek, Laurie

    2015-01-01

    Purpose Genomics is increasingly relevant to health care, necessitating support for nurses to incorporate genomic competencies into practice. The primary aim of this project was to develop, implement, and evaluate a year-long genomic education intervention that trained, supported, and supervised institutional administrator and educator champion dyads to increase nursing capacity to integrate genomics through assessments of program satisfaction and institutional achieved outcomes. Design Longitudinal study of 23 Magnet Recognition Program® Hospitals (21 intervention, 2 controls) participating in a 1-year new competency integration effort aimed at increasing genomic nursing competency and overcoming barriers to genomics integration in practice. Methods Champion dyads underwent genomic training consisting of one in-person kick-off training meeting followed by monthly education webinars. Champion dyads designed institution-specific action plans detailing objectives, methods or strategies used to engage and educate nursing staff, timeline for implementation, and outcomes achieved. Action plans focused on a minimum of seven genomic priority areas: champion dyad personal development; practice assessment; policy content assessment; staff knowledge needs assessment; staff development; plans for integration; and anticipated obstacles and challenges. Action plans were updated quarterly, outlining progress made as well as inclusion of new methods or strategies. Progress was validated through virtual site visits with the champion dyads and chief nursing officers. Descriptive data were collected on all strategies or methods utilized, and timeline for achievement. Descriptive data were analyzed using content analysis. Findings The complexity of the competency content and the uniqueness of social systems and infrastructure resulted in a significant variation of champion dyad interventions. Conclusions Nursing champions can facilitate change in genomic nursing capacity through varied strategies but require substantial training in order to design and implement interventions. Clinical Relevance Genomics is critical to the practice of all nurses. There is a great opportunity and interest to address genomic knowledge deficits in the practicing nurse workforce as a strategy to improve patient outcomes. Exemplars of champion dyad interventions designed to increase nursing capacity focus on improving education, policy, and healthcare services. PMID:25808828

  8. Finding the Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions.

    PubMed

    Hoban, Sean; Kelley, Joanna L; Lotterhos, Katie E; Antolin, Michael F; Bradburd, Gideon; Lowry, David B; Poss, Mary L; Reed, Laura K; Storfer, Andrew; Whitlock, Michael C

    2016-10-01

    Uncovering the genetic and evolutionary basis of local adaptation is a major focus of evolutionary biology. The recent development of cost-effective methods for obtaining high-quality genome-scale data makes it possible to identify some of the loci responsible for adaptive differences among populations. Two basic approaches for identifying putatively locally adaptive loci have been developed and are broadly used: one that identifies loci with unusually high genetic differentiation among populations (differentiation outlier methods) and one that searches for correlations between local population allele frequencies and local environments (genetic-environment association methods). Here, we review the promises and challenges of these genome scan methods, including correcting for the confounding influence of a species' demographic history, biases caused by missing aspects of the genome, matching scales of environmental data with population structure, and other statistical considerations. In each case, we make suggestions for best practices for maximizing the accuracy and efficiency of genome scans to detect the underlying genetic basis of local adaptation. With attention to their current limitations, genome scan methods can be an important tool in finding the genetic basis of adaptive evolutionary change.

  9. Differential DNA Methylation Analysis without a Reference Genome.

    PubMed

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-12-22

    Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  10. Methods comparison for microsatellite marker development: Different isolation methods, different yield efficiency

    NASA Astrophysics Data System (ADS)

    Zhan, Aibin; Bao, Zhenmin; Hu, Xiaoli; Lu, Wei; Hu, Jingjie

    2009-06-01

    Microsatellite markers have become one kind of the most important molecular tools used in various researches. A large number of microsatellite markers are required for the whole genome survey in the fields of molecular ecology, quantitative genetics and genomics. Therefore, it is extremely necessary to select several versatile, low-cost, efficient and time- and labor-saving methods to develop a large panel of microsatellite markers. In this study, we used Zhikong scallop ( Chlamys farreri) as the target species to compare the efficiency of the five methods derived from three strategies for microsatellite marker development. The results showed that the strategy of constructing small insert genomic DNA library resulted in poor efficiency, while the microsatellite-enriched strategy highly improved the isolation efficiency. Although the mining public database strategy is time- and cost-saving, it is difficult to obtain a large number of microsatellite markers, mainly due to the limited sequence data of non-model species deposited in public databases. Based on the results in this study, we recommend two methods, microsatellite-enriched library construction method and FIASCO-colony hybridization method, for large-scale microsatellite marker development. Both methods were derived from the microsatellite-enriched strategy. The experimental results obtained from Zhikong scallop also provide the reference for microsatellite marker development in other species with large genomes.

  11. Navigating the Interface Between Landscape Genetics and Landscape Genomics.

    PubMed

    Storfer, Andrew; Patton, Austin; Fraik, Alexandra K

    2018-01-01

    As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used.

  12. Navigating the Interface Between Landscape Genetics and Landscape Genomics

    PubMed Central

    Storfer, Andrew; Patton, Austin; Fraik, Alexandra K.

    2018-01-01

    As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used. PMID:29593776

  13. Stakeholder engagement in policy development: challenges and opportunities for human genomics

    PubMed Central

    Lemke, Amy A.; Harris-Wai, Julie N.

    2015-01-01

    Along with rapid advances in human genomics, policies governing genomic data and clinical technologies have proliferated. Stakeholder engagement is widely lauded as an important methodology for improving clinical, scientific, and public health policy decision making. The purpose of this paper is to examine how stakeholder engagement is used to develop policies in genomics research and public health areas, as well as to identify future priorities for conducting evidence-based stakeholder engagements. We focus on exemplars in biobanking and newborn screening to illustrate a variety of current stakeholder engagement in policy-making efforts. Each setting provides an important context for examining the methods of obtaining and integrating informed stakeholder voices into the policy-making process. While many organizations have an interest in engaging stakeholders with regard to genomic policy issues, there is broad divergence with respect to the stakeholders involved, the purpose of engagements, when stakeholders are engaged during policy development, methods of engagement, and the outcomes reported. Stakeholder engagement in genomics policy development is still at a nascent stage. Several challenges of using stakeholder engagement as a tool for genomics policy development remain, and little evidence regarding how to best incorporate stakeholder feedback into policy-making processes is currently available. PMID:25764215

  14. Stakeholder engagement in policy development: challenges and opportunities for human genomics.

    PubMed

    Lemke, Amy A; Harris-Wai, Julie N

    2015-12-01

    Along with rapid advances in human genomics, policies governing genomic data and clinical technologies have proliferated. Stakeholder engagement is widely lauded as an important methodology for improving clinical, scientific, and public health policy decision making. The purpose of this paper is to examine how stakeholder engagement is used to develop policies in genomics research and public health areas, as well as to identify future priorities for conducting evidence-based stakeholder engagements. We focus on exemplars in biobanking and newborn screening to illustrate a variety of current stakeholder engagement in policy-making efforts. Each setting provides an important context for examining the methods of obtaining and integrating informed stakeholder voices into the policy-making process. While many organizations have an interest in engaging stakeholders with regard to genomic policy issues, there is broad divergence with respect to the stakeholders involved, the purpose of engagements, when stakeholders are engaged during policy development, methods of engagement, and the outcomes reported. Stakeholder engagement in genomics policy development is still at a nascent stage. Several challenges of using stakeholder engagement as a tool for genomics policy development remain, and little evidence regarding how to best incorporate stakeholder feedback into policy-making processes is currently available.

  15. Invited review: Inbreeding in the genomics era: Inbreeding, inbreeding depression, and management of genomic variability.

    PubMed

    Howard, Jeremy T; Pryce, Jennie E; Baes, Christine; Maltecca, Christian

    2017-08-01

    Traditionally, pedigree-based relationship coefficients have been used to manage the inbreeding and degree of inbreeding depression that exists within a population. The widespread incorporation of genomic information in dairy cattle genetic evaluations allows for the opportunity to develop and implement methods to manage populations at the genomic level. As a result, the realized proportion of the genome that 2 individuals share can be more accurately estimated instead of using pedigree information to estimate the expected proportion of shared alleles. Furthermore, genomic information allows genome-wide relationship or inbreeding estimates to be augmented to characterize relationships for specific regions of the genome. Region-specific stretches can be used to more effectively manage areas of low genetic diversity or areas that, when homozygous, result in reduced performance across economically important traits. The use of region-specific metrics should allow breeders to more precisely manage the trade-off between the genetic value of the progeny and undesirable side effects associated with inbreeding. Methods tailored toward more effectively identifying regions affected by inbreeding and their associated use to manage the genome at the herd level, however, still need to be developed. We have reviewed topics related to inbreeding, measures of relatedness, genetic diversity and methods to manage populations at the genomic level, and we discuss future challenges related to managing populations through implementing genomic methods at the herd and population levels. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  16. Evidence synthesis and guideline development in genomic medicine: current status and future prospects.

    PubMed

    Schully, Sheri D; Lam, Tram Kim; Dotson, W David; Chang, Christine Q; Aronson, Naomi; Birkeland, Marian L; Brewster, Stephanie Jo; Boccia, Stefania; Buchanan, Adam H; Calonge, Ned; Calzone, Kathleen; Djulbegovic, Benjamin; Goddard, Katrina A B; Klein, Roger D; Klein, Teri E; Lau, Joseph; Long, Rochelle; Lyman, Gary H; Morgan, Rebecca L; Palmer, Christina G S; Relling, Mary V; Rubinstein, Wendy S; Swen, Jesse J; Terry, Sharon F; Williams, Marc S; Khoury, Muin J

    2015-01-01

    With the accelerated implementation of genomic medicine, health-care providers will depend heavily on professional guidelines and recommendations. Because genomics affects many diseases across the life span, no single professional group covers the entirety of this rapidly developing field. To pursue a discussion of the minimal elements needed to develop evidence-based guidelines in genomics, the Centers for Disease Control and Prevention and the National Cancer Institute jointly held a workshop to engage representatives from 35 organizations with interest in genomics (13 of which make recommendations). The workshop explored methods used in evidence synthesis and guideline development and initiated a dialogue to compare these methods and to assess whether they are consistent with the Institute of Medicine report "Clinical Practice Guidelines We Can Trust." The participating organizations that develop guidelines or recommendations all had policies to manage guideline development and group membership, and processes to address conflicts of interests. However, there was wide variation in the reliance on external reviews, regular updating of recommendations, and use of systematic reviews to assess the strength of scientific evidence. Ongoing efforts are required to establish criteria for guideline development in genomic medicine as proposed by the Institute of Medicine.

  17. A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation

    PubMed Central

    2012-01-01

    Background Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation. Methods An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis. Results Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored. Conclusions The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations. PMID:22462519

  18. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data

    PubMed Central

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-01-01

    Abstract Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets. PMID:29618048

  19. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data.

    PubMed

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-03-01

    Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets.

  20. Development of microbial genome-probing microarrays using digital multiple displacement amplification of uncultivated microbial single cells.

    PubMed

    Chang, Ho-Won; Sung, Youlboong; Kim, Kyoung-Ho; Nam, Young-Do; Roh, Seong Woon; Kim, Min-Soo; Jeon, Che Ok; Bae, Jin-Woo

    2008-08-15

    A crucial problem in the use of previously developed genome-probing microarrays (GPM) has been the inability to use uncultivated bacterial genomes to take advantage of the high sensitivity and specificity of GPM in microbial detection and monitoring. We show here a method, digital multiple displacement amplification (MDA), to amplify and analyze various genomes obtained from single uncultivated bacterial cells. We used 15 genomes from key microbes involved in dichloromethane (DCM)-dechlorinating enrichment as microarray probes to uncover the bacterial population dynamics of samples without PCR amplification. Genomic DNA amplified from single cells originating from uncultured bacteria with 80.3-99.4% similarity to 16S rRNA genes of cultivated bacteria. The digital MDA-GPM method successfully monitored the dynamics of DCM-dechlorinating communities from different phases of enrichment status. Without a priori knowledge of microbial diversity, the digital MDA-GPM method could be designed to monitor most microbial populations in a given environmental sample.

  1. Digestion-ligation-only Hi-C is an efficient and cost-effective method for chromosome conformation capture.

    PubMed

    Lin, Da; Hong, Ping; Zhang, Siheng; Xu, Weize; Jamal, Muhammad; Yan, Keji; Lei, Yingying; Li, Liang; Ruan, Yijun; Fu, Zhen F; Li, Guoliang; Cao, Gang

    2018-05-01

    Chromosome conformation capture (3C) technologies can be used to investigate 3D genomic structures. However, high background noise, high costs, and a lack of straightforward noise evaluation in current methods impede the advancement of 3D genomic research. Here we developed a simple digestion-ligation-only Hi-C (DLO Hi-C) technology to explore the 3D landscape of the genome. This method requires only two rounds of digestion and ligation, without the need for biotin labeling and pulldown. Non-ligated DNA was efficiently removed in a cost-effective step by purifying specific linker-ligated DNA fragments. Notably, random ligation could be quickly evaluated in an early quality-control step before sequencing. Moreover, an in situ version of DLO Hi-C using a four-cutter restriction enzyme has been developed. We applied DLO Hi-C to delineate the genomic architecture of THP-1 and K562 cells and uncovered chromosomal translocations. This technology may facilitate investigation of genomic organization, gene regulation, and (meta)genome assembly.

  2. BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

    PubMed

    Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

    2015-08-18

    Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  3. Computational Prediction of the Global Functional Genomic Landscape: Applications, Methods and Challenges

    PubMed Central

    Zhou, Weiqiang; Sherwood, Ben; Ji, Hongkai

    2017-01-01

    Technological advances have led to an explosive growth of high-throughput functional genomic data. Exploiting the correlation among different data types, it is possible to predict one functional genomic data type from other data types. Prediction tools are valuable in understanding the relationship among different functional genomic signals. They also provide a cost-efficient solution to inferring the unknown functional genomic profiles when experimental data are unavailable due to resource or technological constraints. The predicted data may be used for generating hypotheses, prioritizing targets, interpreting disease variants, facilitating data integration, quality control, and many other purposes. This article reviews various applications of prediction methods in functional genomics, discusses analytical challenges, and highlights some common and effective strategies used to develop prediction methods for functional genomic data. PMID:28076869

  4. RATT: Rapid Annotation Transfer Tool

    PubMed Central

    Otto, Thomas D.; Dillon, Gary P.; Degrave, Wim S.; Berriman, Matthew

    2011-01-01

    Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net. PMID:21306991

  5. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis

    PubMed Central

    Down, Thomas A.; Rakyan, Vardhman K.; Turner, Daniel J.; Flicek, Paul; Li, Heng; Kulesha, Eugene; Gräf, Stefan; Johnson, Nathan; Herrero, Javier; Tomazou, Eleni M.; Thorne, Natalie P.; Bäckdahl, Liselotte; Herberth, Marlis; Howe, Kevin L.; Jackson, David K.; Miretti, Marcos M.; Marioni, John C.; Birney, Ewan; Hubbard, Tim J. P.; Durbin, Richard; Tavaré, Simon; Beck, Stephan

    2009-01-01

    DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation. PMID:18612301

  6. A Tool for Multiple Targeted Genome Deletions that Is Precise, Scar-Free, and Suitable for Automation.

    PubMed

    Aubrey, Wayne; Riley, Michael C; Young, Michael; King, Ross D; Oliver, Stephen G; Clare, Amanda

    2015-01-01

    Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method's primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome.

  7. Flow cytometry sorting of nuclei enables the first global characterization of Paramecium germline DNA and transposable elements.

    PubMed

    Guérin, Frédéric; Arnaiz, Olivier; Boggetto, Nicole; Denby Wilkes, Cyril; Meyer, Eric; Sperling, Linda; Duharcourt, Sandra

    2017-04-26

    DNA elimination is developmentally programmed in a wide variety of eukaryotes, including unicellular ciliates, and leads to the generation of distinct germline and somatic genomes. The ciliate Paramecium tetraurelia harbors two types of nuclei with different functions and genome structures. The transcriptionally inactive micronucleus contains the complete germline genome, while the somatic macronucleus contains a reduced genome streamlined for gene expression. During development of the somatic macronucleus, the germline genome undergoes massive and reproducible DNA elimination events. Availability of both the somatic and germline genomes is essential to examine the genome changes that occur during programmed DNA elimination and ultimately decipher the mechanisms underlying the specific removal of germline-limited sequences. We developed a novel experimental approach that uses flow cell imaging and flow cytometry to sort subpopulations of nuclei to high purity. We sorted vegetative micronuclei and macronuclei during development of P. tetraurelia. We validated the method by flow cell imaging and by high throughput DNA sequencing. Our work establishes the proof of principle that developing somatic macronuclei can be sorted from a complex biological sample to high purity based on their size, shape and DNA content. This method enabled us to sequence, for the first time, the germline DNA from pure micronuclei and to identify novel transposable elements. Sequencing the germline DNA confirms that the Pgm domesticated transposase is required for the excision of all ~45,000 Internal Eliminated Sequences. Comparison of the germline DNA and unrearranged DNA obtained from PGM-silenced cells reveals that the latter does not provide a faithful representation of the germline genome. We developed a flow cytometry-based method to purify P. tetraurelia nuclei to high purity and provided quality control with flow cell imaging and high throughput DNA sequencing. We identified 61 germline transposable elements including the first Paramecium retrotransposons. This approach paves the way to sequence the germline genomes of P. aurelia sibling species for future comparative genomic studies.

  8. Variation block-based genomics method for crop plants.

    PubMed

    Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon

    2014-06-15

    In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.

  9. Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking.

    PubMed

    Daetwyler, Hans D; Calus, Mario P L; Pong-Wong, Ricardo; de Los Campos, Gustavo; Hickey, John M

    2013-02-01

    The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.

  10. Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking

    PubMed Central

    Daetwyler, Hans D.; Calus, Mario P. L.; Pong-Wong, Ricardo; de los Campos, Gustavo; Hickey, John M.

    2013-01-01

    The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals. PMID:23222650

  11. A streamlined workflow for single-cells genome-wide copy-number profiling by low-pass sequencing of LM-PCR whole-genome amplification products.

    PubMed

    Ferrarini, Alberto; Forcato, Claudio; Buson, Genny; Tononi, Paola; Del Monaco, Valentina; Terracciano, Mario; Bolognesi, Chiara; Fontana, Francesca; Medoro, Gianni; Neves, Rui; Möhlendick, Birte; Rihawi, Karim; Ardizzoni, Andrea; Sumanasuriya, Semini; Flohr, Penny; Lambros, Maryou; de Bono, Johann; Stoecklein, Nikolas H; Manaresi, Nicolò

    2018-01-01

    Chromosomal instability and associated chromosomal aberrations are hallmarks of cancer and play a critical role in disease progression and development of resistance to drugs. Single-cell genome analysis has gained interest in latest years as a source of biomarkers for targeted-therapy selection and drug resistance, and several methods have been developed to amplify the genomic DNA and to produce libraries suitable for Whole Genome Sequencing (WGS). However, most protocols require several enzymatic and cleanup steps, thus increasing the complexity and length of protocols, while robustness and speed are key factors for clinical applications. To tackle this issue, we developed a single-tube, single-step, streamlined protocol, exploiting ligation mediated PCR (LM-PCR) Whole Genome Amplification (WGA) method, for low-pass genome sequencing with the Ion Torrent™ platform and copy number alterations (CNAs) calling from single cells. The method was evaluated on single cells isolated from 6 aberrant cell lines of the NCI-H series. In addition, to demonstrate the feasibility of the workflow on clinical samples, we analyzed single circulating tumor cells (CTCs) and white blood cells (WBCs) isolated from the blood of patients affected by prostate cancer or lung adenocarcinoma. The results obtained show that the developed workflow generates data accurately representing whole genome absolute copy number profiles of single cell and allows alterations calling at resolutions down to 100 Kbp with as few as 200,000 reads. The presented data demonstrate the feasibility of the Ampli1™ WGA-based low-pass workflow for detection of CNAs in single tumor cells which would be of particular interest for genome-driven targeted therapy selection and for monitoring of disease progression.

  12. Genome-scale engineering of Saccharomyces cerevisiae with single-nucleotide precision.

    PubMed

    Bao, Zehua; HamediRad, Mohammad; Xue, Pu; Xiao, Han; Tasan, Ipek; Chao, Ran; Liang, Jing; Zhao, Huimin

    2018-07-01

    We developed a CRISPR-Cas9- and homology-directed-repair-assisted genome-scale engineering method named CHAnGE that can rapidly output tens of thousands of specific genetic variants in yeast. More than 98% of target sequences were efficiently edited with an average frequency of 82%. We validate the single-nucleotide resolution genome-editing capability of this technology by creating a genome-wide gene disruption collection and apply our method to improve tolerance to growth inhibitors.

  13. Phylo_dCor: distance correlation as a novel metric for phylogenetic profiling.

    PubMed

    Sferra, Gabriella; Fratini, Federica; Ponzi, Marta; Pizzi, Elisabetta

    2017-09-05

    Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.

  14. Anonymizing patient genomic data for public sharing association studies.

    PubMed

    Fernandez-Lozano, Carlos; Lopez-Campos, Guillermo; Seoane, Jose A; Lopez-Alonso, Victoria; Dorado, Julian; Martín-Sanchez, Fernando; Pazos, Alejandro

    2013-01-01

    The development of personalized medicine is tightly linked with the correct exploitation of molecular data, especially those associated with the genome sequence along with these use of genomic data there is an increasing demand to share these data for research purposes. Transition of clinical data to research is based in the anonymization of these data so the patient cannot be identified, the use of genomic data poses a great challenge because its nature of identifying data. In this work we have analyzed current methods for genome anonymization and propose a one way encryption method that may enable the process of genomic data sharing accessing only to certain regions of genomes for research purposes.

  15. A Tool for Multiple Targeted Genome Deletions that Is Precise, Scar-Free, and Suitable for Automation

    PubMed Central

    Aubrey, Wayne; Riley, Michael C.; Young, Michael; King, Ross D.; Oliver, Stephen G.; Clare, Amanda

    2015-01-01

    Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method’s primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome. PMID:26630677

  16. De novo assembly of a haplotype-resolved human genome.

    PubMed

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

    2015-06-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

  17. Development of a genome editing technique using the CRISPR/Cas9 system in the industrial filamentous fungus Aspergillus oryzae.

    PubMed

    Katayama, Takuya; Tanaka, Yuki; Okabe, Tomoya; Nakamura, Hidetoshi; Fujii, Wataru; Kitamoto, Katsuhiko; Maruyama, Jun-Ichi

    2016-04-01

    To develop a genome editing method using the CRISPR/Cas9 system in Aspergillus oryzae, the industrial filamentous fungus used in Japanese traditional fermentation and for the production of enzymes and heterologous proteins. To develop the CRISPR/Cas9 system as a genome editing technique for A. oryzae, we constructed plasmids expressing the gene encoding Cas9 nuclease and single guide RNAs for the mutagenesis of target genes. We introduced these into an A. oryzae strain and obtained transformants containing mutations within each target gene that exhibited expected phenotypes. The mutational rates ranged from 10 to 20 %, and 1 bp deletions or insertions were the most commonly induced mutations. We developed a functional and versatile genome editing method using the CRISPR/Cas9 system in A. oryzae. This technique will contribute to the use of efficient targeted mutagenesis in many A. oryzae industrial strains.

  18. Identification of copy number variants in whole-genome data using Reference Coverage Profiles

    PubMed Central

    Glusman, Gustavo; Severson, Alissa; Dhankani, Varsha; Robinson, Max; Farrah, Terry; Mauldin, Denise E.; Stittrich, Anna B.; Ament, Seth A.; Roach, Jared C.; Brunkow, Mary E.; Bodian, Dale L.; Vockley, Joseph G.; Shmulevich, Ilya; Niederhuber, John E.; Hood, Leroy

    2015-01-01

    The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation. PMID:25741365

  19. Advances in yeast genome engineering.

    PubMed

    David, Florian; Siewers, Verena

    2015-02-01

    Genome engineering based on homologous recombination has been applied to yeast for many years. However, the growing importance of yeast as a cell factory in metabolic engineering and chassis in synthetic biology demands methods for fast and efficient introduction of multiple targeted changes such as gene knockouts and introduction of multistep metabolic pathways. In this review, we summarize recent improvements of existing genome engineering methods, the development of novel techniques, for example for advanced genome redesign and evolution, and the importance of endonucleases as genome engineering tools. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.

  20. An empirical Bayes method for updating inferences in analysis of quantitative trait loci using information from related genome scans.

    PubMed

    Zhang, Kui; Wiener, Howard; Beasley, Mark; George, Varghese; Amos, Christopher I; Allison, David B

    2006-08-01

    Individual genome scans for quantitative trait loci (QTL) mapping often suffer from low statistical power and imprecise estimates of QTL location and effect. This lack of precision yields large confidence intervals for QTL location, which are problematic for subsequent fine mapping and positional cloning. In prioritizing areas for follow-up after an initial genome scan and in evaluating the credibility of apparent linkage signals, investigators typically examine the results of other genome scans of the same phenotype and informally update their beliefs about which linkage signals in their scan most merit confidence and follow-up via a subjective-intuitive integration approach. A method that acknowledges the wisdom of this general paradigm but formally borrows information from other scans to increase confidence in objectivity would be a benefit. We developed an empirical Bayes analytic method to integrate information from multiple genome scans. The linkage statistic obtained from a single genome scan study is updated by incorporating statistics from other genome scans as prior information. This technique does not require that all studies have an identical marker map or a common estimated QTL effect. The updated linkage statistic can then be used for the estimation of QTL location and effect. We evaluate the performance of our method by using extensive simulations based on actual marker spacing and allele frequencies from available data. Results indicate that the empirical Bayes method can account for between-study heterogeneity, estimate the QTL location and effect more precisely, and provide narrower confidence intervals than results from any single individual study. We also compared the empirical Bayes method with a method originally developed for meta-analysis (a closely related but distinct purpose). In the face of marked heterogeneity among studies, the empirical Bayes method outperforms the comparator.

  1. Development and in-house validation of the event-specific polymerase chain reaction detection methods for genetically modified soybean MON89788 based on the cloned integration flanking sequence.

    PubMed

    Liu, Jia; Guo, Jinchao; Zhang, Haibo; Li, Ning; Yang, Litao; Zhang, Dabing

    2009-11-25

    Various polymerase chain reaction (PCR) methods were developed for the execution of genetically modified organism (GMO) labeling policies, of which an event-specific PCR detection method based on the flanking sequence of exogenous integration is the primary trend in GMO detection due to its high specificity. In this study, the 5' and 3' flanking sequences of the exogenous integration of MON89788 soybean were revealed by thermal asymmetric interlaced PCR. The event-specific PCR primers and TaqMan probe were designed based upon the revealed 5' flanking sequence, and the qualitative and quantitative PCR assays were established employing these designed primers and probes. In qualitative PCR, the limit of detection (LOD) was about 0.01 ng of genomic DNA corresponding to 10 copies of haploid soybean genomic DNA. In the quantitative PCR assay, the LOD was as low as two haploid genome copies, and the limit of quantification was five haploid genome copies. Furthermore, the developed PCR methods were in-house validated by five researchers, and the validated results indicated that the developed event-specific PCR methods can be used for identification and quantification of MON89788 soybean and its derivates.

  2. A new strategy for genome assembly using short sequence reads and reduced representation libraries.

    PubMed

    Young, Andrew L; Abaan, Hatice Ozel; Zerbino, Daniel; Mullikin, James C; Birney, Ewan; Margulies, Elliott H

    2010-02-01

    We have developed a novel approach for using massively parallel short-read sequencing to generate fast and inexpensive de novo genomic assemblies comparable to those generated by capillary-based methods. The ultrashort (<100 base) sequences generated by this technology pose specific biological and computational challenges for de novo assembly of large genomes. To account for this, we devised a method for experimentally partitioning the genome using reduced representation (RR) libraries prior to assembly. We use two restriction enzymes independently to create a series of overlapping fragment libraries, each containing a tractable subset of the genome. Together, these libraries allow us to reassemble the entire genome without the need of a reference sequence. As proof of concept, we applied this approach to sequence and assembled the majority of the 125-Mb Drosophila melanogaster genome. We subsequently demonstrate the accuracy of our assembly method with meaningful comparisons against the current available D. melanogaster reference genome (dm3). The ease of assembly and accuracy for comparative genomics suggest that our approach will scale to future mammalian genome-sequencing efforts, saving both time and money without sacrificing quality.

  3. Genome-wide engineering of an infectious clone of herpes simplex virus type 1 using synthetic genomics assembly methods.

    PubMed

    Oldfield, Lauren M; Grzesik, Peter; Voorhies, Alexander A; Alperovich, Nina; MacMath, Derek; Najera, Claudia D; Chandra, Diya Sabrina; Prasad, Sanjana; Noskov, Vladimir N; Montague, Michael G; Friedman, Robert M; Desai, Prashant J; Vashee, Sanjay

    2017-10-17

    Here, we present a transformational approach to genome engineering of herpes simplex virus type 1 (HSV-1), which has a large DNA genome, using synthetic genomics tools. We believe this method will enable more rapid and complex modifications of HSV-1 and other large DNA viruses than previous technologies, facilitating many useful applications. Yeast transformation-associated recombination was used to clone 11 fragments comprising the HSV-1 strain KOS 152 kb genome. Using overlapping sequences between the adjacent pieces, we assembled the fragments into a complete virus genome in yeast, transferred it into an Escherichia coli host, and reconstituted infectious virus following transfection into mammalian cells. The virus derived from this yeast-assembled genome, KOS YA , replicated with kinetics similar to wild-type virus. We demonstrated the utility of this modular assembly technology by making numerous modifications to a single gene, making changes to two genes at the same time and, finally, generating individual and combinatorial deletions to a set of five conserved genes that encode virion structural proteins. While the ability to perform genome-wide editing through assembly methods in large DNA virus genomes raises dual-use concerns, we believe the incremental risks are outweighed by potential benefits. These include enhanced functional studies, generation of oncolytic virus vectors, development of delivery platforms of genes for vaccines or therapy, as well as more rapid development of countermeasures against potential biothreats.

  4. Genome-wide engineering of an infectious clone of herpes simplex virus type 1 using synthetic genomics assembly methods

    PubMed Central

    Grzesik, Peter; Voorhies, Alexander A.; Alperovich, Nina; MacMath, Derek; Najera, Claudia D.; Chandra, Diya Sabrina; Prasad, Sanjana; Noskov, Vladimir N.; Montague, Michael G.; Friedman, Robert M.; Desai, Prashant J.

    2017-01-01

    Here, we present a transformational approach to genome engineering of herpes simplex virus type 1 (HSV-1), which has a large DNA genome, using synthetic genomics tools. We believe this method will enable more rapid and complex modifications of HSV-1 and other large DNA viruses than previous technologies, facilitating many useful applications. Yeast transformation-associated recombination was used to clone 11 fragments comprising the HSV-1 strain KOS 152 kb genome. Using overlapping sequences between the adjacent pieces, we assembled the fragments into a complete virus genome in yeast, transferred it into an Escherichia coli host, and reconstituted infectious virus following transfection into mammalian cells. The virus derived from this yeast-assembled genome, KOSYA, replicated with kinetics similar to wild-type virus. We demonstrated the utility of this modular assembly technology by making numerous modifications to a single gene, making changes to two genes at the same time and, finally, generating individual and combinatorial deletions to a set of five conserved genes that encode virion structural proteins. While the ability to perform genome-wide editing through assembly methods in large DNA virus genomes raises dual-use concerns, we believe the incremental risks are outweighed by potential benefits. These include enhanced functional studies, generation of oncolytic virus vectors, development of delivery platforms of genes for vaccines or therapy, as well as more rapid development of countermeasures against potential biothreats. PMID:28928148

  5. Gene context analysis in the Integrated Microbial Genomes (IMG) data management system.

    PubMed

    Mavromatis, Konstantinos; Chu, Ken; Ivanova, Natalia; Hooper, Sean D; Markowitz, Victor M; Kyrpides, Nikos C

    2009-11-24

    Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov.

  6. Apollo: a sequence annotation editor

    PubMed Central

    Lewis, SE; Searle, SMJ; Harris, N; Gibson, M; Iyer, V; Richter, J; Wiel, C; Bayraktaroglu, L; Birney, E; Crosby, MA; Kaminker, JS; Matthews, BB; Prochnik, SE; Smith, CD; Tupy, JL; Rubin, GM; Misra, S; Mungall, CJ; Clamp, ME

    2002-01-01

    The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects. PMID:12537571

  7. SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks.

    PubMed

    Zhu, Feng; Cui, Qian-Qian; Hou, Zhuo-Cheng

    2016-11-15

    Genomic selection and genome-wide association studies need thousands to millions of SNPs. However, many non-model species do not have reference chips for detecting variation. Our goal was to develop and validate an inexpensive but effective method for detecting SNP variation. Genotyping by sequencing (GBS) can be a highly efficient strategy for genome-wide SNP detection, as an alternative to microarray chips. Here, we developed a GBS protocol for ducks and tested it to genotype 49 Pekin ducks. A total of 169,209 SNPs were identified from all animals, with a mean of 55,920 SNPs per individual. The average SNP density reached 1156 SNPs/MB. In this study, the first application of GBS to ducks, we demonstrate the power and simplicity of this method. GBS can be used for genetic studies in to provide an effective method for genome-wide SNP discovery.

  8. Toward the automated generation of genome-scale metabolic networks in the SEED.

    PubMed

    DeJongh, Matthew; Formsma, Kevin; Boillot, Paul; Gould, John; Rycenga, Matthew; Best, Aaron

    2007-04-26

    Current methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process. We have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organism's genome. This focuses manual efforts on that portion of an organism's metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for Staphylococcus aureus. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for S. aureus, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (Escherichia coli, Helicobacter pylori, and Lactococcus lactis). We have implemented our tools and database within the SEED, an open-source software environment for comparative genome annotation and analysis. Our method sets the stage for the automated generation of substantially complete metabolic networks for over 400 complete genome sequences currently in the SEED. With each genome that is processed using our tools, the database of common components grows to cover more of the diversity of metabolic pathways. This increases the likelihood that components of reaction networks for subsequently processed genomes can be retrieved from the database, rather than assembled and verified manually.

  9. Whole-genome alignment.

    PubMed

    Dewey, Colin N

    2012-01-01

    Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.

  10. On the Epistemological Crisis in Genomics

    PubMed Central

    Dougherty, Edward R

    2008-01-01

    There is an epistemological crisis in genomics. At issue is what constitutes scientific knowledge in genomic science, or systems biology in general. Does this crisis require a new perspective on knowledge heretofore absent from science or is it merely a matter of interpreting new scientific developments in an existing epistemological framework? This paper discusses the manner in which the experimental method, as developed and understood over recent centuries, leads naturally to a scientific epistemology grounded in an experimental-mathematical duality. It places genomics into this epistemological framework and examines the current situation in genomics. Meaning and the constitution of scientific knowledge are key concerns for genomics, and the nature of the epistemological crisis in genomics depends on how these are understood. PMID:19440447

  11. Development of a fluorescence-activated cell sorting method coupled with whole genome amplification to analyze minority and trace Dehalococcoides genomes in microbial communities.

    PubMed

    Lee, Patrick K H; Men, Yujie; Wang, Shanquan; He, Jianzhong; Alvarez-Cohen, Lisa

    2015-02-03

    Dehalococcoides mccartyi are functionally important bacteria that catalyze the reductive dechlorination of chlorinated ethenes. However, these anaerobic bacteria are fastidious to isolate, making downstream genomic characterization challenging. In order to facilitate genomic analysis, a fluorescence-activated cell sorting (FACS) method was developed in this study to separate D. mccartyi cells from a microbial community, and the DNA of the isolated cells was processed by whole genome amplification (WGA) and hybridized onto a D. mccartyi microarray for comparative genomics against four sequenced strains. First, FACS was successfully applied to a D. mccartyi isolate as positive control, and then microarray results verified that WGA from 10(6) cells or ∼1 ng of genomic DNA yielded high-quality coverage detecting nearly all genes across the genome. As expected, some inter- and intrasample variability in WGA was observed, but these biases were minimized by performing multiple parallel amplifications. Subsequent application of the FACS and WGA protocols to two enrichment cultures containing ∼10% and ∼1% D. mccartyi cells successfully enabled genomic analysis. As proof of concept, this study demonstrates that coupling FACS with WGA and microarrays is a promising tool to expedite genomic characterization of target strains in environmental communities where the relative concentrations are low.

  12. A simple and efficient method to visualize and quantify the efficiency of chromosomal mutations from genome editing

    PubMed Central

    Fu, Liezhen; Wen, Luan; Luu, Nga; Shi, Yun-Bo

    2016-01-01

    Genome editing with designer nucleases such as TALEN and CRISPR/Cas enzymes has broad applications. Delivery of these designer nucleases into organisms induces various genetic mutations including deletions, insertions and nucleotide substitutions. Characterizing those mutations is critical for evaluating the efficacy and specificity of targeted genome editing. While a number of methods have been developed to identify the mutations, none other than sequencing allows the identification of the most desired mutations, i.e., out-of-frame insertions/deletions that disrupt genes. Here we report a simple and efficient method to visualize and quantify the efficiency of genomic mutations induced by genome-editing. Our approach is based on the expression of a two-color fusion protein in a vector that allows the insertion of the edited region in the genome in between the two color moieties. We show that our approach not only easily identifies developing animals with desired mutations but also efficiently quantifies the mutation rate in vivo. Furthermore, by using LacZα and GFP as the color moieties, our approach can even eliminate the need for a fluorescent microscope, allowing the analysis with simple bright field visualization. Such an approach will greatly simplify the screen for effective genome-editing enzymes and identify the desired mutant cells/animals. PMID:27748423

  13. [Artificial Intelligence in Drug Discovery].

    PubMed

    Fujiwara, Takeshi; Kamada, Mayumi; Okuno, Yasushi

    2018-04-01

    According to the increase of data generated from analytical instruments, application of artificial intelligence(AI)technology in medical field is indispensable. In particular, practical application of AI technology is strongly required in "genomic medicine" and "genomic drug discovery" that conduct medical practice and novel drug development based on individual genomic information. In our laboratory, we have been developing a database to integrate genome data and clinical information obtained by clinical genome analysis and a computational support system for clinical interpretation of variants using AI. In addition, with the aim of creating new therapeutic targets in genomic drug discovery, we have been also working on the development of a binding affinity prediction system for mutated proteins and drugs by molecular dynamics simulation using supercomputer "Kei". We also have tackled for problems in a drug virtual screening. Our developed AI technology has successfully generated virtual compound library, and deep learning method has enabled us to predict interaction between compound and target protein.

  14. Public data and open source tools for multi-assay genomic investigation of disease.

    PubMed

    Kannan, Lavanya; Ramos, Marcel; Re, Angela; El-Hachem, Nehme; Safikhani, Zhaleh; Gendoo, Deena M A; Davis, Sean; Gomez-Cabrero, David; Castelo, Robert; Hansen, Kasper D; Carey, Vincent J; Morgan, Martin; Culhane, Aedín C; Haibe-Kains, Benjamin; Waldron, Levi

    2016-07-01

    Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods. © The Author 2015. Published by Oxford University Press.

  15. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain

    PubMed Central

    Schrider, Daniel R.; Kern, Andrew D.

    2015-01-01

    The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods. PMID:26590212

  16. A genome-wide 3C-method for characterizing the three-dimensional architectures of genomes.

    PubMed

    Duan, Zhijun; Andronescu, Mirela; Schutz, Kevin; Lee, Choli; Shendure, Jay; Fields, Stanley; Noble, William S; Anthony Blau, C

    2012-11-01

    Accumulating evidence demonstrates that the three-dimensional (3D) organization of chromosomes within the eukaryotic nucleus reflects and influences genomic activities, including transcription, DNA replication, recombination and DNA repair. In order to uncover structure-function relationships, it is necessary first to understand the principles underlying the folding and the 3D arrangement of chromosomes. Chromosome conformation capture (3C) provides a powerful tool for detecting interactions within and between chromosomes. A high throughput derivative of 3C, chromosome conformation capture on chip (4C), executes a genome-wide interrogation of interaction partners for a given locus. We recently developed a new method, a derivative of 3C and 4C, which, similar to Hi-C, is capable of comprehensively identifying long-range chromosome interactions throughout a genome in an unbiased fashion. Hence, our method can be applied to decipher the 3D architectures of genomes. Here, we provide a detailed protocol for this method. Published by Elsevier Inc.

  17. Methods to approximate reliabilities in single-step genomic evaluation

    USDA-ARS?s Scientific Manuscript database

    Reliability of predictions from single-step genomic BLUP (ssGBLUP) can be calculated by inversion, but that is not feasible for large data sets. Two methods of approximating reliability were developed based on decomposition of a function of reliability into contributions from records, pedigrees, and...

  18. Development of Mycoplasma synoviae (MS) core genome multilocus sequence typing (cgMLST) scheme.

    PubMed

    Ghanem, Mostafa; El-Gazzar, Mohamed

    2018-05-01

    Mycoplasma synoviae (MS) is a poultry pathogen with reported increased prevalence and virulence in recent years. MS strain identification is essential for prevention, control efforts and epidemiological outbreak investigations. Multiple multilocus based sequence typing schemes have been developed for MS, yet the resolution of these schemes could be limited for outbreak investigation. The cost of whole genome sequencing became close to that of sequencing the seven MLST targets; however, there is no standardized method for typing MS strains based on whole genome sequences. In this paper, we propose a core genome multilocus sequence typing (cgMLST) scheme as a standardized and reproducible method for typing MS based whole genome sequences. A diverse set of 25 MS whole genome sequences were used to identify 302 core genome genes as cgMLST targets (35.5% of MS genome) and 44 whole genome sequences of MS isolates from six countries in four continents were used for typing applying this scheme. cgMLST based phylogenetic trees displayed a high degree of agreement with core genome SNP based analysis and available epidemiological information. cgMLST allowed evaluation of two conventional MLST schemes of MS. The high discriminatory power of cgMLST allowed differentiation between samples of the same conventional MLST type. cgMLST represents a standardized, accurate, highly discriminatory, and reproducible method for differentiation between MS isolates. Like conventional MLST, it provides stable and expandable nomenclature, allowing for comparing and sharing the typing results between different laboratories worldwide. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  19. Applications of CRISPR Genome Engineering in Cell Biology

    PubMed Central

    Wang, Fangyuan; Qi, Lei S.

    2016-01-01

    Recent advances in genome engineering are starting a revolution in biological research and translational applications. The CRISPR-associated RNA-guided endonuclease Cas9 and its variants enable diverse manipulations of genome function. In this review, we describe the development of Cas9 tools for a variety of applications in cell biology research, including the study of functional genomics, the creation of transgenic animal models, and genomic imaging. Novel genome engineering methods offer a new avenue to understand the causality between genome and phenotype, thus promising a fuller understanding of cell biology. PMID:27599850

  20. NGSPanPipe: A Pipeline for Pan-genome Identification in Microbial Strains from Experimental Reads.

    PubMed

    Kulsum, Umay; Kapil, Arti; Singh, Harpreet; Kaur, Punit

    2018-01-01

    Recent advancements in sequencing technologies have decreased both time span and cost for sequencing the whole bacterial genome. High-throughput Next-Generation Sequencing (NGS) technology has led to the generation of enormous data concerning microbial populations publically available across various repositories. As a consequence, it has become possible to study and compare the genomes of different bacterial strains within a species or genus in terms of evolution, ecology and diversity. Studying the pan-genome provides insights into deciphering microevolution, global composition and diversity in virulence and pathogenesis of a species. It can also assist in identifying drug targets and proposing vaccine candidates. The effective analysis of these large genome datasets necessitates the development of robust tools. Current methods to develop pan-genome do not support direct input of raw reads from the sequencer machine but require preprocessing of reads as an assembled protein/gene sequence file or the binary matrix of orthologous genes/proteins. We have designed an easy-to-use integrated pipeline, NGSPanPipe, which can directly identify the pan-genome from short reads. The output from the pipeline is compatible with other pan-genome analysis tools. We evaluated our pipeline with other methods for developing pan-genome, i.e. reference-based assembly and de novo assembly using simulated reads of Mycobacterium tuberculosis. The single script pipeline (pipeline.pl) is applicable for all bacterial strains. It integrates multiple in-house Perl scripts and is freely accessible from https://github.com/Biomedinformatics/NGSPanPipe .

  1. [Parental genome imprinting].

    PubMed

    Babinet, C

    1993-01-01

    Genetical as well as experimental embryology methods have permitted, in recent years, to uncover a very important feature of mammalian embryonic development: it has been shown that female and male genomic complements are differentially imprinted in such a way that contribution of both a maternally and a paternally derived genome are absolutely necessary for the embryo to complete its normal development. Differential genomic imprinting seems therefore to impose some new and essential kind of information to the one already contained in the genomic sequences. The differential imprinting should be imposed on the genetic material during gametogenesis and persist throughout somatic development after fertilization. It should then be erased in the germ cell line and be established again in sperm and egg genomes. The recent discovery of several mouse genes which are imprinted should permit to address the question of the molecular mechanisms of imprinting.

  2. Low-coverage, whole-genome sequencing of Artocarpus camansi (Moraceae) for phylogenetic marker development and gene discovery1

    PubMed Central

    Gardner, Elliot M.; Johnson, Matthew G.; Ragone, Diane; Wickett, Norman J.; Zerega, Nyree J. C.

    2016-01-01

    Premise of the study: We used moderately low-coverage (17×) whole-genome sequencing of Artocarpus camansi (Moraceae) to develop genomic resources for Artocarpus and Moraceae. Methods and Results: A de novo assembly of Illumina short reads (251,378,536 pairs, 2 × 100 bp) accounted for 93% of the predicted genome size. Predicted coding regions were used in a three-way orthology search with published genomes of Morus notabilis and Cannabis sativa. Phylogenetic markers for Moraceae were developed from 333 inferred single-copy exons. Ninety-eight putative MADS-box genes were identified. Analysis of all predicted coding regions resulted in preliminary annotation of 49,089 genes. An analysis of synonymous substitutions for pairs of orthologs (Ks analysis) in M. notabilis and A. camansi strongly suggested a lineage-specific whole-genome duplication in Artocarpus. Conclusions: This study substantially increases the genomic resources available for Artocarpus and Moraceae and demonstrates the value of low-coverage de novo assemblies for nonmodel organisms with moderately large genomes. PMID:27437173

  3. High-throughput physical mapping of chromosomes using automated in situ hybridization.

    PubMed

    George, Phillip; Sharakhova, Maria V; Sharakhov, Igor V

    2012-06-28

    Projects to obtain whole-genome sequences for 10,000 vertebrate species and for 5,000 insect and related arthropod species are expected to take place over the next 5 years. For example, the sequencing of the genomes for 15 malaria mosquitospecies is currently being done using an Illumina platform. This Anopheles species cluster includes both vectors and non-vectors of malaria. When the genome assemblies become available, researchers will have the unique opportunity to perform comparative analysis for inferring evolutionary changes relevant to vector ability. However, it has proven difficult to use next-generation sequencing reads to generate high-quality de novo genome assemblies. Moreover, the existing genome assemblies for Anopheles gambiae, although obtained using the Sanger method, are gapped or fragmented. Success of comparative genomic analyses will be limited if researchers deal with numerous sequencing contigs, rather than with chromosome-based genome assemblies. Fragmented, unmapped sequences create problems for genomic analyses because: (i) unidentified gaps cause incorrect or incomplete annotation of genomic sequences; (ii) unmapped sequences lead to confusion between paralogous genes and genes from different haplotypes; and (iii) the lack of chromosome assignment and orientation of the sequencing contigs does not allow for reconstructing rearrangement phylogeny and studying chromosome evolution. Developing high-resolution physical maps for species with newly sequenced genomes is a timely and cost-effective investment that will facilitate genome annotation, evolutionary analysis, and re-sequencing of individual genomes from natural populations. Here, we present innovative approaches to chromosome preparation, fluorescent in situ hybridization (FISH), and imaging that facilitate rapid development of physical maps. Using An. gambiae as an example, we demonstrate that the development of physical chromosome maps can potentially improve genome assemblies and, thus, the quality of genomic analyses. First, we use a high-pressure method to prepare polytene chromosome spreads. This method, originally developed for Drosophila, allows the user to visualize more details on chromosomes than the regular squashing technique. Second, a fully automated, front-end system for FISH is used for high-throughput physical genome mapping. The automated slide staining system runs multiple assays simultaneously and dramatically reduces hands-on time. Third, an automatic fluorescent imaging system, which includes a motorized slide stage, automatically scans and photographs labeled chromosomes after FISH. This system is especially useful for identifying and visualizing multiple chromosomal plates on the same slide. In addition, the scanning process captures a more uniform FISH result. Overall, the automated high-throughput physical mapping protocol is more efficient than a standard manual protocol.

  4. Sequencing intractable DNA to close microbial genomes.

    PubMed

    Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  5. The Undergraduate Training in Genomics (UTRIG) Initiative: early & active training for physicians in the genomic medicine era.

    PubMed

    Wilcox, Rebecca L; Adem, Patricia V; Afshinnekoo, Ebrahim; Atkinson, James B; Burke, Leah W; Cheung, Hoiwan; Dasgupta, Shoumita; DeLaGarza, Julia; Joseph, Loren; LeGallo, Robin; Lew, Madelyn; Lockwood, Christina M; Meiss, Alice; Norman, Jennifer; Markwood, Priscilla; Rizvi, Hasan; Shane-Carson, Kate P; Sobel, Mark E; Suarez, Eric; Tafe, Laura J; Wang, Jason; Haspel, Richard L

    2018-05-01

    Genomic medicine is transforming patient care. However, the speed of development has left a knowledge gap between discovery and effective implementation into clinical practice. Since 2010, the Training Residents in Genomics (TRIG) Working Group has found success in building a rigorous genomics curriculum with implementation tools aimed at pathology residents in postgraduate training years 1-4. Based on the TRIG model, the interprofessional Undergraduate Training in Genomics (UTRIG) Working Group was formed. Under the aegis of the Undergraduate Medical Educators Section of the Association of Pathology Chairs and representation from nine additional professional societies, UTRIG's collaborative goal is building medical student genomic literacy through development of a ready-to-use genomics curriculum. Key elements to the UTRIG curriculum are expert consensus-driven objectives, active learning methods, rigorous assessment and integration.

  6. TCGA4U: A Web-Based Genomic Analysis Platform To Explore And Mine TCGA Genomic Data For Translational Research.

    PubMed

    Huang, Zhenzhen; Duan, Huilong; Li, Haomin

    2015-01-01

    Large-scale human cancer genomics projects, such as TCGA, generated large genomics data for further study. Exploring and mining these data to obtain meaningful analysis results can help researchers find potential genomics alterations that intervene the development and metastasis of tumors. We developed a web-based gene analysis platform, named TCGA4U, which used statistics methods and models to help translational investigators explore, mine and visualize human cancer genomic characteristic information from the TCGA datasets. Furthermore, through Gene Ontology (GO) annotation and clinical data integration, the genomic data were transformed into biological process, molecular function, cellular component and survival curves to help researchers identify potential driver genes. Clinical researchers without expertise in data analysis will benefit from such a user-friendly genomic analysis platform.

  7. Assessing the Robustness of Complete Bacterial Genome Segmentations

    NASA Astrophysics Data System (ADS)

    Devillers, Hugo; Chiapello, Hélène; Schbath, Sophie; El Karoui, Meriem

    Comparison of closely related bacterial genomes has revealed the presence of highly conserved sequences forming a "backbone" that is interrupted by numerous, less conserved, DNA fragments. Segmentation of bacterial genomes into backbone and variable regions is particularly useful to investigate bacterial genome evolution. Several software tools have been designed to compare complete bacterial chromosomes and a few online databases store pre-computed genome comparisons. However, very few statistical methods are available to evaluate the reliability of these software tools and to compare the results obtained with them. To fill this gap, we have developed two local scores to measure the robustness of bacterial genome segmentations. Our method uses a simulation procedure based on random perturbations of the compared genomes. The scores presented in this paper are simple to implement and our results show that they allow to discriminate easily between robust and non-robust bacterial genome segmentations when using aligners such as MAUVE and MGA.

  8. Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes.

    PubMed

    Winsor, Geoffrey L; Van Rossum, Thea; Lo, Raymond; Khaira, Bhavjinder; Whiteside, Matthew D; Hancock, Robert E W; Brinkman, Fiona S L

    2009-01-01

    Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license.

  9. Microarray Genomic Systems Development

    DTIC Science & Technology

    2008-06-01

    11 species), Escherichia coli TOP10 (7 strains), and Geobacillus stearothermophilus . Using standard molecular biology methods, we isolated genomic...comparisons. Results: Different species of bacteria, including Escherichia coli, Bacillus bacteria, and Geobacillus stearothermophilus produce qualitatively...oligonucleotides to labelled genomic DNA from a set of test samples, including eleven Bacillus species, Geobacillus stearothermophilus , and seven Escherichia

  10. An Efficient, Rapid, and Recyclable System for CRISPR-Mediated Genome Editing in Candida albicans.

    PubMed

    Nguyen, Namkha; Quail, Morgan M F; Hernday, Aaron D

    2017-01-01

    Candida albicans is the most common fungal pathogen of humans. Historically, molecular genetic analysis of this important pathogen has been hampered by the lack of stable plasmids or meiotic cell division, limited selectable markers, and inefficient methods for generating gene knockouts. The recent development of clustered regularly interspaced short palindromic repeat(s) (CRISPR)-based tools for use with C. albicans has opened the door to more efficient genome editing; however, previously reported systems have specific limitations. We report the development of an optimized CRISPR-based genome editing system for use with C. albicans . Our system is highly efficient, does not require molecular cloning, does not leave permanent markers in the genome, and supports rapid, precise genome editing in C. albicans . We also demonstrate the utility of our system for generating two independent homozygous gene knockouts in a single transformation and present a method for generating homozygous wild-type gene addbacks at the native locus. Furthermore, each step of our protocol is compatible with high-throughput strain engineering approaches, thus opening the door to the generation of a complete C. albicans gene knockout library. IMPORTANCE Candida albicans is the major fungal pathogen of humans and is the subject of intense biomedical and discovery research. Until recently, the pace of research in this field has been hampered by the lack of efficient methods for genome editing. We report the development of a highly efficient and flexible genome editing system for use with C. albicans . This system improves upon previously published C. albicans CRISPR systems and enables rapid, precise genome editing without the use of permanent markers. This new tool kit promises to expedite the pace of research on this important fungal pathogen.

  11. Genome-scale engineering for systems and synthetic biology

    PubMed Central

    Esvelt, Kevin M; Wang, Harris H

    2013-01-01

    Genome-modification technologies enable the rational engineering and perturbation of biological systems. Historically, these methods have been limited to gene insertions or mutations at random or at a few pre-defined locations across the genome. The handful of methods capable of targeted gene editing suffered from low efficiencies, significant labor costs, or both. Recent advances have dramatically expanded our ability to engineer cells in a directed and combinatorial manner. Here, we review current technologies and methodologies for genome-scale engineering, discuss the prospects for extending efficient genome modification to new hosts, and explore the implications of continued advances toward the development of flexibly programmable chasses, novel biochemistries, and safer organismal and ecological engineering. PMID:23340847

  12. Global methylation screening in the Arabidopsis thaliana and Mus musculus genome: applications of virtual image restriction landmark genomic scanning (Vi-RLGS)

    PubMed Central

    Matsuyama, Tomoki; Kimura, Makoto T.; Koike, Kuniaki; Abe, Tomoko; Nakano, Takeshi; Asami, Tadao; Ebisuzaki, Toshikazu; Held, William A.; Yoshida, Shigeo; Nagase, Hiroki

    2003-01-01

    Understanding the role of ‘epigenetic’ changes such as DNA methylation and chromatin remodeling has now become critical in understanding many biological processes. In order to delineate the global methylation pattern in a given genomic DNA, computer software has been developed to create a virtual image of restriction landmark genomic scanning (Vi-RLGS). When using a methylation- sensitive enzyme such as NotI as the restriction landmark, the comparison between real and in silico RLGS profiles of the genome provides a methylation map of genomic NotI sites. A methylation map of the Arabidopsis genome was created that could be confirmed by a methylation-sensitive PCR assay. The method has also been applied to the mouse genome. Although a complete methylation map has not been completed, a region of methylation difference between two tissues has been tested and confirmed by bisulfite sequencing. Vi-RLGS in conjunction with real RLGS will make it possible to develop a more complete map of genomic sites that are methylated or demethylated as a consequence of normal or abnormal development. PMID:12888509

  13. MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.

    PubMed

    Chiapello, Hélène; Gendrault, Annie; Caron, Christophe; Blum, Jérome; Petit, Marie-Agnès; El Karoui, Meriem

    2008-11-27

    The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.

  14. Identification of coding and non-coding mutational hotspots in cancer genomes.

    PubMed

    Piraino, Scott W; Furney, Simon J

    2017-01-05

    The identification of mutations that play a causal role in tumour development, so called "driver" mutations, is of critical importance for understanding how cancers form and how they might be treated. Several large cancer sequencing projects have identified genes that are recurrently mutated in cancer patients, suggesting a role in tumourigenesis. While the landscape of coding drivers has been extensively studied and many of the most prominent driver genes are well characterised, comparatively less is known about the role of mutations in the non-coding regions of the genome in cancer development. The continuing fall in genome sequencing costs has resulted in a concomitant increase in the number of cancer whole genome sequences being produced, facilitating systematic interrogation of both the coding and non-coding regions of cancer genomes. To examine the mutational landscapes of tumour genomes we have developed a novel method to identify mutational hotspots in tumour genomes using both mutational data and information on evolutionary conservation. We have applied our methodology to over 1300 whole cancer genomes and show that it identifies prominent coding and non-coding regions that are known or highly suspected to play a role in cancer. Importantly, we applied our method to the entire genome, rather than relying on predefined annotations (e.g. promoter regions) and we highlight recurrently mutated regions that may have resulted from increased exposure to mutational processes rather than selection, some of which have been identified previously as targets of selection. Finally, we implicate several pan-cancer and cancer-specific candidate non-coding regions, which could be involved in tumourigenesis. We have developed a framework to identify mutational hotspots in cancer genomes, which is applicable to the entire genome. This framework identifies known and novel coding and non-coding mutional hotspots and can be used to differentiate candidate driver regions from likely passenger regions susceptible to somatic mutation.

  15. Exploration of the Drosophila buzzatii transposable element content suggests underestimation of repeats in Drosophila genomes.

    PubMed

    Rius, Nuria; Guillén, Yolanda; Delprat, Alejandra; Kapusta, Aurélie; Feschotte, Cédric; Ruiz, Alfredo

    2016-05-10

    Many new Drosophila genomes have been sequenced in recent years using new-generation sequencing platforms and assembly methods. Transposable elements (TEs), being repetitive sequences, are often misassembled, especially in the genomes sequenced with short reads. Consequently, the mobile fraction of many of the new genomes has not been analyzed in detail or compared with that of other genomes sequenced with different methods, which could shed light into the understanding of genome and TE evolution. Here we compare the TE content of three genomes: D. buzzatii st-1, j-19, and D. mojavensis. We have sequenced a new D. buzzatii genome (j-19) that complements the D. buzzatii reference genome (st-1) already published, and compared their TE contents with that of D. mojavensis. We found an underestimation of TE sequences in Drosophila genus NGS-genomes when compared to Sanger-genomes. To be able to compare genomes sequenced with different technologies, we developed a coverage-based method and applied it to the D. buzzatii st-1 and j-19 genome. Between 10.85 and 11.16 % of the D. buzzatii st-1 genome is made up of TEs, between 7 and 7,5 % of D. buzzatii j-19 genome, while TEs represent 15.35 % of the D. mojavensis genome. Helitrons are the most abundant order in the three genomes. TEs in D. buzzatii are less abundant than in D. mojavensis, as expected according to the genome size and TE content positive correlation. However, TEs alone do not explain the genome size difference. TEs accumulate in the dot chromosomes and proximal regions of D. buzzatii and D. mojavensis chromosomes. We also report a significantly higher TE density in D. buzzatii and D. mojavensis X chromosomes, which is not expected under the current models. Our easy-to-use correction method allowed us to identify recently active families in D. buzzatii st-1 belonging to the LTR-retrotransposon superfamily Gypsy.

  16. Evaluation method for the potential functionome harbored in the genome and metagenome.

    PubMed

    Takami, Hideto; Taniguchi, Takeaki; Moriya, Yuki; Kuwahara, Tomomi; Kanehisa, Minoru; Goto, Susumu

    2012-12-12

    One of the main goals of genomic analysis is to elucidate the comprehensive functions (functionome) in individual organisms or a whole community in various environments. However, a standard evaluation method for discerning the functional potentials harbored within the genome or metagenome has not yet been established. We have developed a new evaluation method for the potential functionome, based on the completion ratio of Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules. Distribution of the completion ratio of the KEGG functional modules in 768 prokaryotic species varied greatly with the kind of module, and all modules primarily fell into 4 patterns (universal, restricted, diversified and non-prokaryotic modules), indicating the universal and unique nature of each module, and also the versatility of the KEGG Orthology (KO) identifiers mapped to each one. The module completion ratio in 8 phenotypically different bacilli revealed that some modules were shared only in phenotypically similar species. Metagenomes of human gut microbiomes from 13 healthy individuals previously determined by the Sanger method were analyzed based on the module completion ratio. Results led to new discoveries in the nutritional preferences of gut microbes, believed to be one of the mutualistic representations of gut microbiomes to avoid nutritional competition with the host. The method developed in this study could characterize the functionome harbored in genomes and metagenomes. As this method also provided taxonomical information from KEGG modules as well as the gene hosts constructing the modules, interpretation of completion profiles was simplified and we could identify the complementarity between biochemical functions in human hosts and the nutritional preferences in human gut microbiomes. Thus, our method has the potential to be a powerful tool for comparative functional analysis in genomics and metagenomics, able to target unknown environments containing various uncultivable microbes within unidentified phyla.

  17. A comprehensive and quantitative exploration of thousands of viral genomes

    PubMed Central

    Mahmoudabadi, Gita

    2018-01-01

    The complete assembly of viral genomes from metagenomic datasets (short genomic sequences gathered from environmental samples) has proven to be challenging, so there are significant blind spots when we view viral genomes through the lens of metagenomics. One approach to overcoming this problem is to leverage the thousands of complete viral genomes that are publicly available. Here we describe our efforts to assemble a comprehensive resource that provides a quantitative snapshot of viral genomic trends – such as gene density, noncoding percentage, and abundances of functional gene categories – across thousands of viral genomes. We have also developed a coarse-grained method for visualizing viral genome organization for hundreds of genomes at once, and have explored the extent of the overlap between bacterial and bacteriophage gene pools. Existing viral classification systems were developed prior to the sequencing era, so we present our analysis in a way that allows us to assess the utility of the different classification systems for capturing genomic trends. PMID:29624169

  18. A comprehensive and quantitative exploration of thousands of viral genomes.

    PubMed

    Mahmoudabadi, Gita; Phillips, Rob

    2018-04-19

    The complete assembly of viral genomes from metagenomic datasets (short genomic sequences gathered from environmental samples) has proven to be challenging, so there are significant blind spots when we view viral genomes through the lens of metagenomics. One approach to overcoming this problem is to leverage the thousands of complete viral genomes that are publicly available. Here we describe our efforts to assemble a comprehensive resource that provides a quantitative snapshot of viral genomic trends - such as gene density, noncoding percentage, and abundances of functional gene categories - across thousands of viral genomes. We have also developed a coarse-grained method for visualizing viral genome organization for hundreds of genomes at once, and have explored the extent of the overlap between bacterial and bacteriophage gene pools. Existing viral classification systems were developed prior to the sequencing era, so we present our analysis in a way that allows us to assess the utility of the different classification systems for capturing genomic trends. © 2018, Mahmoudabadi et al.

  19. Development of a Method to Implement Whole-Genome Bisulfite Sequencing of cfDNA from Cancer Patients and a Mouse Tumor Model.

    PubMed

    Maggi, Elaine C; Gravina, Silvia; Cheng, Haiying; Piperdi, Bilal; Yuan, Ziqiang; Dong, Xiao; Libutti, Steven K; Vijg, Jan; Montagna, Cristina

    2018-01-01

    The goal of this study was to develop a method for whole genome cell-free DNA (cfDNA) methylation analysis in humans and mice with the ultimate goal to facilitate the identification of tumor derived DNA methylation changes in the blood. Plasma or serum from patients with pancreatic neuroendocrine tumors or lung cancer, and plasma from a murine model of pancreatic adenocarcinoma was used to develop a protocol for cfDNA isolation, library preparation and whole-genome bisulfite sequencing of ultra low quantities of cfDNA, including tumor-specific DNA. The protocol developed produced high quality libraries consistently generating a conversion rate >98% that will be applicable for the analysis of human and mouse plasma or serum to detect tumor-derived changes in DNA methylation.

  20. The Development of Chromosome Microdissection and Microcloning Technique and its Applications in Genomic Research

    PubMed Central

    Zhou, Ruo-Nan; Hu, Zan-Min

    2007-01-01

    The technique of chromosome microdissection and microcloning has been developed for more than 20 years. As a bridge between cytogenetics and molecular genetics, it leads to a number of applications: chromosome painting probe isolation, genetic linkage map and physical map construction, and expressed sequence tags generation. During those 20 years, this technique has not only been benefited from other technological advances but also cross-fertilized with other techniques. Today, it becomes a practicality with extensive uses. The purpose of this article is to review the development of this technique and its application in the field of genomic research. Moreover, a new method of generating ESTs of specific chromosomes developed by our lab is introduced. By using this method, the technique of chromosome microdissection and microcloning would be more valuable in the advancement of genomic research. PMID:18645627

  1. SigHunt: horizontal gene transfer finder optimized for eukaryotic genomes.

    PubMed

    Jaron, Kamil S; Moravec, Jiří C; Martínková, Natália

    2014-04-15

    Genomic islands (GIs) are DNA fragments incorporated into a genome through horizontal gene transfer (also called lateral gene transfer), often with functions novel for a given organism. While methods for their detection are well researched in prokaryotes, the complexity of eukaryotic genomes makes direct utilization of these methods unreliable, and so labour-intensive phylogenetic searches are used instead. We present a surrogate method that investigates nucleotide base composition of the DNA sequence in a eukaryotic genome and identifies putative GIs. We calculate a genomic signature as a vector of tetranucleotide (4-mer) frequencies using a sliding window approach. Extending the neighbourhood of the sliding window, we establish a local kernel density estimate of the 4-mer frequency. We score the number of 4-mer frequencies in the sliding window that deviate from the credibility interval of their local genomic density using a newly developed discrete interval accumulative score (DIAS). To further improve the effectiveness of DIAS, we select informative 4-mers in a range of organisms using the tetranucleotide quality score developed herein. We show that the SigHunt method is computationally efficient and able to detect GIs in eukaryotic genomes that represent non-ameliorated integration. Thus, it is suited to scanning for change in organisms with different DNA composition. Source code and scripts freely available for download at http://www.iba.muni.cz/index-en.php?pg=research-data-analysis-tools-sighunt are implemented in C and R and are platform-independent. 376090@mail.muni.cz or martinkova@ivb.cz. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study.

    PubMed

    Lane, William J; Westhoff, Connie M; Gleadall, Nicholas S; Aguad, Maria; Smeland-Wagman, Robin; Vege, Sunitha; Simmons, Daimon P; Mah, Helen H; Lebo, Matthew S; Walter, Klaudia; Soranzo, Nicole; Di Angelantonio, Emanuele; Danesh, John; Roberts, David J; Watkins, Nick A; Ouwehand, Willem H; Butterworth, Adam S; Kaufman, Richard M; Rehm, Heidi L; Silberstein, Leslie E; Green, Robert C

    2018-06-01

    There are more than 300 known red blood cell (RBC) antigens and 33 platelet antigens that differ between individuals. Sensitisation to antigens is a serious complication that can occur in prenatal medicine and after blood transfusion, particularly for patients who require multiple transfusions. Although pre-transfusion compatibility testing largely relies on serological methods, reagents are not available for many antigens. Methods based on single-nucleotide polymorphism (SNP) arrays have been used, but typing for ABO and Rh-the most important blood groups-cannot be done with SNP typing alone. We aimed to develop a novel method based on whole-genome sequencing to identify RBC and platelet antigens. This whole-genome sequencing study is a subanalysis of data from patients in the whole-genome sequencing arm of the MedSeq Project randomised controlled trial (NCT01736566) with no measured patient outcomes. We created a database of molecular changes in RBC and platelet antigens and developed an automated antigen-typing algorithm based on whole-genome sequencing (bloodTyper). This algorithm was iteratively improved to address cis-trans haplotype ambiguities and homologous gene alignments. Whole-genome sequencing data from 110 MedSeq participants (30 × depth) were used to initially validate bloodTyper through comparison with conventional serology and SNP methods for typing of 38 RBC antigens in 12 blood-group systems and 22 human platelet antigens. bloodTyper was further validated with whole-genome sequencing data from 200 INTERVAL trial participants (15 × depth) with serological comparisons. We iteratively improved bloodTyper by comparing its typing results with conventional serological and SNP typing in three rounds of testing. The initial whole-genome sequencing typing algorithm was 99·5% concordant across the first 20 MedSeq genomes. Addressing discordances led to development of an improved algorithm that was 99·8% concordant for the remaining 90 MedSeq genomes. Additional modifications led to the final algorithm, which was 99·2% concordant across 200 INTERVAL genomes (or 99·9% after adjustment for the lower depth of coverage). By enabling more precise antigen-matching of patients with blood donors, antigen typing based on whole-genome sequencing provides a novel approach to improve transfusion outcomes with the potential to transform the practice of transfusion medicine. National Human Genome Research Institute, Doris Duke Charitable Foundation, National Health Service Blood and Transplant, National Institute for Health Research, and Wellcome Trust. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. TEA: the epigenome platform for Arabidopsis methylome study.

    PubMed

    Su, Sheng-Yao; Chen, Shu-Hwa; Lu, I-Hsuan; Chiang, Yih-Shien; Wang, Yu-Bin; Chen, Pao-Yang; Lin, Chung-Yen

    2016-12-22

    Bisulfite sequencing (BS-seq) has become a standard technology to profile genome-wide DNA methylation at single-base resolution. It allows researchers to conduct genome-wise cytosine methylation analyses on issues about genomic imprinting, transcriptional regulation, cellular development and differentiation. One single data from a BS-Seq experiment is resolved into many features according to the sequence contexts, making methylome data analysis and data visualization a complex task. We developed a streamlined platform, TEA, for analyzing and visualizing data from whole-genome BS-Seq (WGBS) experiments conducted in the model plant Arabidopsis thaliana. To capture the essence of the genome methylation level and to meet the efficiency for running online, we introduce a straightforward method for measuring genome methylation in each sequence context by gene. The method is scripted in Java to process BS-Seq mapping results. Through a simple data uploading process, the TEA server deploys a web-based platform for deep analysis by linking data to an updated Arabidopsis annotation database and toolkits. TEA is an intuitive and efficient online platform for analyzing the Arabidopsis genomic DNA methylation landscape. It provides several ways to help users exploit WGBS data. TEA is freely accessible for academic users at: http://tea.iis.sinica.edu.tw .

  4. GStream: Improving SNP and CNV Coverage on Genome-Wide Association Studies

    PubMed Central

    Alonso, Arnald; Marsal, Sara; Tortosa, Raül; Canela-Xandri, Oriol; Julià, Antonio

    2013-01-01

    We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method. PMID:23844243

  5. Precision Editing of Large Animal Genomes

    PubMed Central

    Tan, Wenfang (Spring); Carlson, Daniel F.; Walton, Mark W.; Fahrenkrug, Scott C.; Hackett, Perry B.

    2013-01-01

    Transgenic animals are an important source of protein and nutrition for most humans and will play key roles in satisfying the increasing demand for food in an ever-increasing world population. The past decade has experienced a revolution in the development of methods that permit the introduction of specific alterations to complex genomes. This precision will enhance genome-based improvement of farm animals for food production. Precision genetics also will enhance the development of therapeutic biomaterials and models of human disease as resources for the development of advanced patient therapies. PMID:23084873

  6. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing

    PubMed Central

    Keinath, Melissa C.; Timoshevskiy, Vladimir A.; Timoshevskaya, Nataliya Y.; Tsonis, Panagiotis A.; Voss, S. Randal; Smith, Jeramiah J.

    2015-01-01

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes. PMID:26553646

  7. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    PubMed

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-11-10

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes.

  8. Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

    PubMed Central

    Conte, Matthieu G; Gaillard, Sylvain; Droc, Gaetan; Perin, Christophe

    2008-01-01

    Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods. PMID:18426584

  9. MIPS bacterial genomes functional annotation benchmark dataset.

    PubMed

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  10. An Efficient Method for Genomic DNA Extraction from Different Molluscs Species

    PubMed Central

    Pereira, Jorge C.; Chaves, Raquel; Bastos, Estela; Leitão, Alexandra; Guedes-Pinto, Henrique

    2011-01-01

    The selection of a DNA extraction method is a critical step when subsequent analysis depends on the DNA quality and quantity. Unlike mammals, for which several capable DNA extraction methods have been developed, for molluscs the availability of optimized genomic DNA extraction protocols is clearly insufficient. Several aspects such as animal physiology, the type (e.g., adductor muscle or gills) or quantity of tissue, can explain the lack of efficiency (quality and yield) in molluscs genomic DNA extraction procedure. In an attempt to overcome these aspects, this work describes an efficient method for molluscs genomic DNA extraction that was tested in several species from different orders: Veneridae, Ostreidae, Anomiidae, Cardiidae (Bivalvia) and Muricidae (Gastropoda), with different weight sample tissues. The isolated DNA was of high molecular weight with high yield and purity, even with reduced quantities of tissue. Moreover, the genomic DNA isolated, demonstrated to be suitable for several downstream molecular techniques, such as PCR sequencing among others. PMID:22174651

  11. [Technology of analysis of epigenetic and structural changes of epithelial tumors genome with NotI-microarrays by the example of human chromosome].

    PubMed

    Pavlova, T V; Kashuba, V I; Muravenko, O V; Yenamandra, S P; Ivanova, T A; Zabarovskaia, V I; Rakhmanaliev, E R; Petrenko, L A; Pronina, I V; Loginov, V I; Iurkevich, O Iu; Kiselev, L L; Zelenin, A V; Zabarovskiĭ, E R

    2009-01-01

    New comparative genome hybridization technology on NotI-microarrays is presented (Karolinska Institute International Patent WO02/086163). The method is based on comparative genome hybridization of NotI-probes from tumor and normal genomic DNA with the principle of new DNA NotI-microarrays. Using this method 181 NotI linking loci from human chromosome 3 were analyzed in 200 malignant tumor samples from different organs: kidney, lung, breast, ovary, cervical, prostate. Most frequently (more than in 30%) aberrations--deletions, methylation,--were identified in NotI-sites located in MINT24, BHLHB2, RPL15, RARbeta1, ITGA9, RBSP3, VHL, ZIC4 genes, that suggests they probably are involved in cancer development. Methylation of these genomic loci was confirmed by methylation-specific PCR and bisulfite sequencing. The results demonstrate perspective of using this method to solve some oncogenomic problems.

  12. WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures

    PubMed Central

    2010-01-01

    Background An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. Methods This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. Results A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. Conclusion WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data. PMID:21210985

  13. The future of genomics in polar and alpine cyanobacteria

    PubMed Central

    Anesio, Alexandre M; Sánchez-Baracaldo, Patricia

    2018-01-01

    Abstract In recent years, genomic analyses have arisen as an exciting way of investigating the functional capacity and environmental adaptations of numerous micro-organisms of global relevance, including cyanobacteria. In the extreme cold of Arctic, Antarctic and alpine environments, cyanobacteria are of fundamental ecological importance as primary producers and ecosystem engineers. While their role in biogeochemical cycles is well appreciated, little is known about the genomic makeup of polar and alpine cyanobacteria. In this article, we present ways that genomic techniques might be used to further our understanding of cyanobacteria in cold environments in terms of their evolution and ecology. Existing examples from other environments (e.g. marine/hot springs) are used to discuss how methods developed there might be used to investigate specific questions in the cryosphere. Phylogenomics, comparative genomics and population genomics are identified as methods for understanding the evolution and biogeography of polar and alpine cyanobacteria. Transcriptomics will allow us to investigate gene expression under extreme environmental conditions, and metagenomics can be used to complement tradition amplicon-based methods of community profiling. Finally, new techniques such as single cell genomics and metagenome assembled genomes will also help to expand our understanding of polar and alpine cyanobacteria that cannot readily be cultured. PMID:29506259

  14. Development and in-house validation of the event-specific qualitative and quantitative PCR detection methods for genetically modified cotton MON15985.

    PubMed

    Jiang, Lingxi; Yang, Litao; Rao, Jun; Guo, Jinchao; Wang, Shu; Liu, Jia; Lee, Seonghun; Zhang, Dabing

    2010-02-01

    To implement genetically modified organism (GMO) labeling regulations, an event-specific analysis method based on the junction sequence between exogenous integration and host genomic DNA has become the preferential approach for GMO identification and quantification. In this study, specific primers and TaqMan probes based on the revealed 5'-end junction sequence of GM cotton MON15985 were designed, and qualitative and quantitative polymerase chain reaction (PCR) assays were established employing the designed primers and probes. In the qualitative PCR assay, the limit of detection (LOD) was 0.5 g kg(-1) in 100 ng total cotton genomic DNA, corresponding to about 17 copies of haploid cotton genomic DNA, and the LOD and limit of quantification (LOQ) for quantitative PCR assay were 10 and 17 copies of haploid cotton genomic DNA, respectively. Furthermore, the developed quantitative PCR assays were validated in-house by five different researchers. Also, five practical samples with known GM contents were quantified using the developed PCR assay in in-house validation, and the bias between the true and quantification values ranged from 2.06% to 12.59%. This study shows that the developed qualitative and quantitative PCR methods are applicable for the identification and quantification of GM cotton MON15985 and its derivates.

  15. Detection and correction of false segmental duplications caused by genome mis-assembly

    PubMed Central

    2010-01-01

    Diploid genomes with divergent chromosomes present special problems for assembly software as two copies of especially polymorphic regions may be mistakenly constructed, creating the appearance of a recent segmental duplication. We developed a method for identifying such false duplications and applied it to four vertebrate genomes. For each genome, we corrected mis-assemblies, improved estimates of the amount of duplicated sequence, and recovered polymorphisms between the sequenced chromosomes. PMID:20219098

  16. Applications of CRISPR Genome Engineering in Cell Biology.

    PubMed

    Wang, Fangyuan; Qi, Lei S

    2016-11-01

    Recent advances in genome engineering are starting a revolution in biological research and translational applications. The clustered regularly interspaced short palindromic repeats (CRISPR)-associated RNA-guided endonuclease CRISPR associated protein 9 (Cas9) and its variants enable diverse manipulations of genome function. In this review, we describe the development of Cas9 tools for a variety of applications in cell biology research, including the study of functional genomics, the creation of transgenic animal models, and genomic imaging. Novel genome engineering methods offer a new avenue to understand the causality between the genome and phenotype, thus promising a fuller understanding of cell biology. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Medulloblastoma | Office of Cancer Genomics

    Cancer.gov

    The Medulloblastoma Project was developed to apply newly emerging genomic methods towards the discovery of novel genetic alterations in medulloblastoma (MB). MB is the most common malignant brain tumor in children, accounting for approximately 20% of all pediatric brain tumors.

  18. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives.

    PubMed

    Zhao, Min; Wang, Qingguo; Wang, Quan; Jia, Peilin; Zhao, Zhongming

    2013-01-01

    Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.

  19. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

    PubMed Central

    2013-01-01

    Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development. PMID:24564169

  20. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    PubMed

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  1. Direct Capture Technologies for Genomics-Guided Discovery of Natural Products.

    PubMed

    Chan, Andrew N; Santa Maria, Kevin C; Li, Bo

    2016-01-01

    Microbes are important producers of natural products, which have played key roles in understanding biology and treating disease. However, the full potential of microbes to produce natural products has yet to be realized; the overwhelming majority of natural product gene clusters encoded in microbial genomes remain "cryptic", and have not been expressed or characterized. In contrast to the fast-growing number of genomic sequences and bioinformatic tools, methods to connect these genes to natural product molecules are still limited, creating a bottleneck in genome-mining efforts to discover novel natural products. Here we review developing technologies that leverage the power of homologous recombination to directly capture natural product gene clusters and express them in model hosts for isolation and structural characterization. Although direct capture is still in its early stages of development, it has been successfully utilized in several different classes of natural products. These early successes will be reviewed, and the methods will be compared and contrasted with existing traditional technologies. Lastly, we will discuss the opportunities for the development of direct capture in other organisms, and possibilities to integrate direct capture with emerging genome-editing techniques to accelerate future study of natural products.

  2. Live visualization of genomic loci with BiFC-TALE

    PubMed Central

    Hu, Huan; Zhang, Hongmin; Wang, Sheng; Ding, Miao; An, Hui; Hou, Yingping; Yang, Xiaojing; Wei, Wensheng; Sun, Yujie; Tang, Chao

    2017-01-01

    Tracking the dynamics of genomic loci is important for understanding the mechanisms of fundamental intracellular processes. However, fluorescent labeling and imaging of such loci in live cells have been challenging. One of the major reasons is the low signal-to-background ratio (SBR) of images mainly caused by the background fluorescence from diffuse full-length fluorescent proteins (FPs) in the living nucleus, hampering the application of live cell genomic labeling methods. Here, combining bimolecular fluorescence complementation (BiFC) and transcription activator-like effector (TALE) technologies, we developed a novel method for labeling genomic loci (BiFC-TALE), which largely reduces the background fluorescence level. Using BiFC-TALE, we demonstrated a significantly improved SBR by imaging telomeres and centromeres in living cells in comparison with the methods using full-length FP. PMID:28074901

  3. Live visualization of genomic loci with BiFC-TALE.

    PubMed

    Hu, Huan; Zhang, Hongmin; Wang, Sheng; Ding, Miao; An, Hui; Hou, Yingping; Yang, Xiaojing; Wei, Wensheng; Sun, Yujie; Tang, Chao

    2017-01-11

    Tracking the dynamics of genomic loci is important for understanding the mechanisms of fundamental intracellular processes. However, fluorescent labeling and imaging of such loci in live cells have been challenging. One of the major reasons is the low signal-to-background ratio (SBR) of images mainly caused by the background fluorescence from diffuse full-length fluorescent proteins (FPs) in the living nucleus, hampering the application of live cell genomic labeling methods. Here, combining bimolecular fluorescence complementation (BiFC) and transcription activator-like effector (TALE) technologies, we developed a novel method for labeling genomic loci (BiFC-TALE), which largely reduces the background fluorescence level. Using BiFC-TALE, we demonstrated a significantly improved SBR by imaging telomeres and centromeres in living cells in comparison with the methods using full-length FP.

  4. CRISPR-mediated Ophthalmic Genome Surgery.

    PubMed

    Cho, Galaxy Y; Abdulla, Yazeed; Sengillo, Jesse D; Justus, Sally; Schaefer, Kellie A; Bassuk, Alexander G; Tsang, Stephen H; Mahajan, Vinit B

    2017-09-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) is a genome engineering system with great potential for clinical applications due to its versatility and programmability. This review highlights the development and use of CRISPR-mediated ophthalmic genome surgery in recent years. Diverse CRISPR techniques are in development to target a wide array of ophthalmic conditions, including inherited and acquired conditions. Preclinical disease modeling and recent successes in gene editing suggest potential efficacy of CRISPR as a therapeutic for inherited conditions. In particular, the treatment of Leber congenital amaurosis with CRISPR-mediated genome surgery is expected to reach clinical trials in the near future. Treatment options for inherited retinal dystrophies are currently limited. CRISPR-mediated genome surgery methods may be able to address this unmet need in the future.

  5. Accurate evaluation and analysis of functional genomics data and methods

    PubMed Central

    Greene, Casey S.; Troyanskaya, Olga G.

    2016-01-01

    The development of technology capable of inexpensively performing large-scale measurements of biological systems has generated a wealth of data. Integrative analysis of these data holds the promise of uncovering gene function, regulation, and, in the longer run, understanding complex disease. However, their analysis has proved very challenging, as it is difficult to quickly and effectively assess the relevance and accuracy of these data for individual biological questions. Here, we identify biases that present challenges for the assessment of functional genomics data and methods. We then discuss evaluation methods that, taken together, begin to address these issues. We also argue that the funding of systematic data-driven experiments and of high-quality curation efforts will further improve evaluation metrics so that they more-accurately assess functional genomics data and methods. Such metrics will allow researchers in the field of functional genomics to continue to answer important biological questions in a data-driven manner. PMID:22268703

  6. Epigenetic Segregation of Microbial Genomes from Complex Samples Using Restriction Endonucleases HpaII and McrB.

    PubMed

    Liu, Guohong; Weston, Christopher Q; Pham, Long K; Waltz, Shannon; Barnes, Helen; King, Paula; Sphar, Dan; Yamamoto, Robert T; Forsyth, R Allyn

    2016-01-01

    We describe continuing work to develop restriction endonucleases as tools to enrich targeted genomes of interest from diverse populations. Two approaches were developed in parallel to segregate genomic DNA based on cytosine methylation. First, the methyl-sensitive endonuclease HpaII was used to bind non-CG methylated DNA. Second, a truncated fragment of McrB was used to bind CpG methylated DNA. Enrichment levels of microbial genomes can exceed 100-fold with HpaII allowing improved genomic detection and coverage of otherwise trace microbial genomes from sputum. Additionally, we observe interesting enrichment results that correlate with the methylation states not only of bacteria, but of fungi, viruses, a protist and plants. The methods presented here offer promise for testing biological samples for pathogens and global analysis of population methylomes.

  7. ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.

    PubMed

    Oluwadare, Oluwatosin; Cheng, Jianlin

    2017-11-14

    With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .

  8. Design of a Genomics Curriculum: Competencies for Practicing Pathologists.

    PubMed

    Laudadio, Jennifer; McNeal, Jeffrey L; Boyd, Scott D; Le, Long Phi; Lockwood, Christina; McCloskey, Cindy B; Sharma, Gaurav; Voelkerding, Karl V; Haspel, Richard L

    2015-07-01

    The field of genomics is rapidly impacting medical care across specialties. To help guide test utilization and interpretation, pathologists must be knowledgeable about genomic techniques and their clinical utility. The technology allowing timely generation of genomic data is relatively new to patient care and the clinical laboratory, and therefore, many currently practicing pathologists have been trained without any molecular or genomics exposure. Furthermore, the exposure that current and recent trainees receive in this field remains inconsistent. To assess pathologists' learning needs in genomics and to develop a curriculum to address these educational needs. A working group formed by the College of American Pathologists developed an initial list of genomics competencies (knowledge and skills statements) that a practicing pathologist needs to be successful. Experts in genomics were then surveyed to rate the importance of each competency. These data were used to create a final list of prioritized competencies. A subset of the working group defined subtopics and tasks for each competency. Appropriate delivery methods for the educational material were also proposed. A final list of 32 genomics competency statements was developed. A prioritized curriculum was created with designated subtopics and tasks associated with each competency. We present a genomics curriculum designed as a first step toward providing practicing pathologists with the competencies needed to practice successfully.

  9. Quantifying Genome Editing Outcomes at Endogenous Loci using SMRT Sequencing

    PubMed Central

    Clark, Joseph; Punjya, Niraj; Sebastiano, Vittorio; Bao, Gang; Porteus, Matthew H

    2014-01-01

    SUMMARY Targeted genome editing with engineered nucleases has transformed the ability to introduce precise sequence modifications at almost any site within the genome. A major obstacle to probing the efficiency and consequences of genome editing is that no existing method enables the frequency of different editing events to be simultaneously measured across a cell population at any endogenous genomic locus. We have developed a novel method for quantifying individual genome editing outcomes at any site of interest using single molecule real time (SMRT) DNA sequencing. We show that this approach can be applied at various loci, using multiple engineered nuclease platforms including TALENs, RNA guided endonucleases (CRISPR/Cas9), and ZFNs, and in different cell lines to identify conditions and strategies in which the desired engineering outcome has occurred. This approach facilitates the evaluation of new gene editing technologies and permits sensitive quantification of editing outcomes in almost every experimental system used. PMID:24685129

  10. [Genome-editing: focus on the off-target effects].

    PubMed

    He, Xiubin; Gu, Feng

    2017-10-25

    Breakthroughs of genome-editing in recent years have paved the way to develop new therapeutic strategies. These genome-editing tools mainly include Zinc-finger nucleases (ZFNs), Transcription activator-like effector nucleases (TALENs), and clustered regulatory interspaced short palindromic repeat (CRISPR)/Cas-based RNA-guided DNA endonucleases. However, off-target effects are still the major issue in genome editing, and limit the application in gene therapy. Here, we summarized the cause and compared different detection methods of off-targets.

  11. De novo assembly of human genomes with massively parallel short read sequencing.

    PubMed

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

    2010-02-01

    Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

  12. Synthetic Zinc Finger Proteins: The Advent of Targeted Gene Regulation and Genome Modification Technologies

    PubMed Central

    2015-01-01

    Conspectus The understanding of gene regulation and the structure and function of the human genome increased dramatically at the end of the 20th century. Yet the technologies for manipulating the genome have been slower to develop. For instance, the field of gene therapy has been focused on correcting genetic diseases and augmenting tissue repair for more than 40 years. However, with the exception of a few very low efficiency approaches, conventional genetic engineering methods have only been able to add auxiliary genes to cells. This has been a substantial obstacle to the clinical success of gene therapies and has also led to severe unintended consequences in several cases. Therefore, technologies that facilitate the precise modification of cellular genomes have diverse and significant implications in many facets of research and are essential for translating the products of the Genomic Revolution into tangible benefits for medicine and biotechnology. To address this need, in the 1990s, we embarked on a mission to develop technologies for engineering protein–DNA interactions with the aim of creating custom tools capable of targeting any DNA sequence. Our goal has been to allow researchers to reach into genomes to specifically regulate, knock out, or replace any gene. To realize these goals, we initially focused on understanding and manipulating zinc finger proteins. In particular, we sought to create a simple and straightforward method that enables unspecialized laboratories to engineer custom DNA-modifying proteins using only defined modular components, a web-based utility, and standard recombinant DNA technology. Two significant challenges we faced were (i) the development of zinc finger domains that target sequences not recognized by naturally occurring zinc finger proteins and (ii) determining how individual zinc finger domains could be tethered together as polydactyl proteins to recognize unique locations within complex genomes. We and others have since used this modular assembly method to engineer artificial proteins and enzymes that activate, repress, or create defined changes to user-specified genes in human cells, plants, and other organisms. We have also engineered novel methods for externally controlling protein activity and delivery, as well as developed new strategies for the directed evolution of protein and enzyme function. This Account summarizes our work in these areas and highlights independent studies that have successfully used the modular assembly approach to create proteins with novel function. We also discuss emerging alternative methods for genomic targeting, including transcription activator-like effectors (TALEs) and CRISPR/Cas systems, and how they complement the synthetic zinc finger protein technology. PMID:24877793

  13. The truth about mouse, human, worms and yeast

    PubMed Central

    2004-01-01

    Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years. PMID:15601543

  14. The truth about mouse, human, worms and yeast.

    PubMed

    Nelson, David R; Nebert, Daniel W

    2004-01-01

    Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years.

  15. Characterization of noncoding regulatory DNA in the human genome.

    PubMed

    Elkon, Ran; Agami, Reuven

    2017-08-08

    Genetic variants associated with common diseases are usually located in noncoding parts of the human genome. Delineation of the full repertoire of functional noncoding elements, together with efficient methods for probing their biological roles, is therefore of crucial importance. Over the past decade, DNA accessibility and various epigenetic modifications have been associated with regulatory functions. Mapping these features across the genome has enabled researchers to begin to document the full complement of putative regulatory elements. High-throughput reporter assays to probe the functions of regulatory regions have also been developed but these methods separate putative regulatory elements from the chromosome so that any effects of chromatin context and long-range regulatory interactions are lost. Definitive assignment of function(s) to putative cis-regulatory elements requires perturbation of these elements. Genome-editing technologies are now transforming our ability to perturb regulatory elements across entire genomes. Interpretation of high-throughput genetic screens that incorporate genome editors might enable the construction of an unbiased map of functional noncoding elements in the human genome.

  16. A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta1

    PubMed Central

    Gostel, Morgan R.; Kelloff, Carol; Wallick, Kyle; Funk, Vicki A.

    2016-01-01

    Premise of the study: Internationally, gardens hold diverse living collections that can be preserved for genomic research. Workflows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate for plants. We outline a workflow for tissue sampling intended for two audiences: botanists interested in genomics research and garden staff who plan to voucher living collections. Methods and Results: Standard herbarium methods are used to collect vouchers, label information and images are entered into a publicly accessible database, and leaf tissue is preserved in silica and liquid nitrogen. A five-step approach for genomic tissue sampling is presented for sampling from living collections according to current best practices. Conclusions: Collecting genome-quality samples from gardens is an economical and rapid way to make available for scientific research tissue from the diversity of plants on Earth. The Global Genome Initiative will facilitate and lead this endeavor through international partnerships. PMID:27672517

  17. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

    PubMed

    Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-09-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.

  18. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

    PubMed Central

    Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-01-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341

  19. GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations

    PubMed Central

    Paila, Umadevi; Chapman, Brad A.; Kirchner, Rory; Quinlan, Aaron R.

    2013-01-01

    Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics. PMID:23874191

  20. Protein domain analysis of genomic sequence data reveals regulation of LRR related domains in plant transpiration in Ficus.

    PubMed

    Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H; Du, Fang K

    2014-01-01

    Predicting protein domains is essential for understanding a protein's function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.

  1. Purification of High Molecular Weight Genomic DNA from Powdery Mildew for Long-Read Sequencing.

    PubMed

    Feehan, Joanna M; Scheibel, Katherine E; Bourras, Salim; Underwood, William; Keller, Beat; Somerville, Shauna C

    2017-03-31

    The powdery mildew fungi are a group of economically important fungal plant pathogens. Relatively little is known about the molecular biology and genetics of these pathogens, in part due to a lack of well-developed genetic and genomic resources. These organisms have large, repetitive genomes, which have made genome sequencing and assembly prohibitively difficult. Here, we describe methods for the collection, extraction, purification and quality control assessment of high molecular weight genomic DNA from one powdery mildew species, Golovinomyces cichoracearum. The protocol described includes mechanical disruption of spores followed by an optimized phenol/chloroform genomic DNA extraction. A typical yield was 7 µg DNA per 150 mg conidia. The genomic DNA that is isolated using this procedure is suitable for long-read sequencing (i.e., > 48.5 kbp). Quality control measures to ensure the size, yield, and purity of the genomic DNA are also described in this method. Sequencing of the genomic DNA of the quality described here will allow for the assembly and comparison of multiple powdery mildew genomes, which in turn will lead to a better understanding and improved control of this agricultural pathogen.

  2. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    NASA Astrophysics Data System (ADS)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  3. Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

    DOE PAGES

    Zhang, Qian; Jun, Se -Ran; Leuze, Michael; ...

    2017-01-19

    The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less

  4. Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Qian; Jun, Se -Ran; Leuze, Michael

    The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less

  5. Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

    PubMed Central

    Zhang, Qian; Jun, Se-Ran; Leuze, Michael; Ussery, David; Nookaew, Intawat

    2017-01-01

    The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. PMID:28102365

  6. Evaluation method for the potential functionome harbored in the genome and metagenome

    PubMed Central

    2012-01-01

    Background One of the main goals of genomic analysis is to elucidate the comprehensive functions (functionome) in individual organisms or a whole community in various environments. However, a standard evaluation method for discerning the functional potentials harbored within the genome or metagenome has not yet been established. We have developed a new evaluation method for the potential functionome, based on the completion ratio of Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules. Results Distribution of the completion ratio of the KEGG functional modules in 768 prokaryotic species varied greatly with the kind of module, and all modules primarily fell into 4 patterns (universal, restricted, diversified and non-prokaryotic modules), indicating the universal and unique nature of each module, and also the versatility of the KEGG Orthology (KO) identifiers mapped to each one. The module completion ratio in 8 phenotypically different bacilli revealed that some modules were shared only in phenotypically similar species. Metagenomes of human gut microbiomes from 13 healthy individuals previously determined by the Sanger method were analyzed based on the module completion ratio. Results led to new discoveries in the nutritional preferences of gut microbes, believed to be one of the mutualistic representations of gut microbiomes to avoid nutritional competition with the host. Conclusions The method developed in this study could characterize the functionome harbored in genomes and metagenomes. As this method also provided taxonomical information from KEGG modules as well as the gene hosts constructing the modules, interpretation of completion profiles was simplified and we could identify the complementarity between biochemical functions in human hosts and the nutritional preferences in human gut microbiomes. Thus, our method has the potential to be a powerful tool for comparative functional analysis in genomics and metagenomics, able to target unknown environments containing various uncultivable microbes within unidentified phyla. PMID:23234305

  7. Characterizing genomic alterations in cancer by complementary functional associations | Office of Cancer Genomics

    Cancer.gov

    Systematic efforts to sequence the cancer genome have identified large numbers of mutations and copy number alterations in human cancers. However, elucidating the functional consequences of these variants, and their interactions to drive or maintain oncogenic states, remains a challenge in cancer research. We developed REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene dependency of oncogenic pathways or sensitivity to a drug treatment.

  8. Bridging the divide between genomic science and indigenous peoples.

    PubMed

    Jacobs, Bette; Roffenbender, Jason; Collmann, Jeff; Cherry, Kate; Bitsói, LeManuel Lee; Bassett, Kim; Evans, Charles H

    2010-01-01

    The new science of genomics endeavors to chart the genomes of individuals around the world, with the dual goals of understanding the role genetic factors play in human health and solving problems of disease and disability. From the perspective of indigenous peoples and developing countries, the promises and perils of genomic science appear against a backdrop of global health disparity and political vulnerability. These conditions pose a dilemma for many communities when attempting to decide about participating in genomic research or any other biomedical research. Genomic research offers the possibility of improved technologies for managing the acute and chronic diseases that plague their members. Yet, the history of particularly biomedical research among people in indigenous and developing nations offers salient examples of unethical practice, misuse of data, and failed promises. This dilemma creates risks for communities who decide either to participate or not to participate in genomic science research. Some argue that the history of poor scientific practice justifies refusal to join genomic research projects. Others argue that disease poses such great threats to the well-being of people in indigenous communities and developing nations that not participating in genomic research risks irrevocable harm. Thus, some communities particularly among indigenous peoples have declined to participate as subjects in genomic research. At the same time, some communities have begun developing new guidelines, procedures, and practices for engaging with the scientific community that offer opportunities to bridge the gap between genomic science and indigenous and/or developing communities. Four new approaches warrant special attention and further support: consulting with local communities; negotiating the complexities of consent; training members of local communities in science and health care; and training scientists to work with indigenous communities. Implicit is a new definition of "rigorous scientific research," one that includes both community development and scientific progress as legitimate objectives of genomic research. Innovative translational research is needed to develop practical, mutually acceptable methods for crossing the divide between genomic researchers and indigenous communities. This may mean the difference between success and failure in genomic science, and in improving health for all peoples. © 2010 American Society of Law, Medicine & Ethics, Inc.

  9. Deep landscape update of dispersed and tandem repeats in the genome model of the red jungle fowl, Gallus gallus, using a series of de novo investigating tools.

    PubMed

    Guizard, Sébastien; Piégu, Benoît; Arensburger, Peter; Guillou, Florian; Bigot, Yves

    2016-08-19

    The program RepeatMasker and the database Repbase-ISB are part of the most widely used strategy for annotating repeats in animal genomes. They have been used to show that avian genomes have a lower repeat content (8-12 %) than the sequenced genomes of many vertebrate species (30-55 %). However, the efficiency of such a library-based strategies is dependent on the quality and completeness of the sequences in the database that is used. An alternative to these library based methods are methods that identify repeats de novo. These alternative methods have existed for a least a decade and may be more powerful than the library based methods. We have used an annotation strategy involving several complementary de novo tools to determine the repeat content of the model genome galGal4 (1.04 Gbp), including identifying simple sequence repeats (SSRs), tandem repeats and transposable elements (TEs). We annotated over one Gbp. of the galGal4 genome and showed that it is composed of approximately 19 % SSRs and TEs repeats. Furthermore, we estimate that the actual genome of the red jungle fowl contains about 31-35 % repeats. We find that library-based methods tend to overestimate TE diversity. These results have a major impact on the current understanding of repeats distributions throughout chromosomes in the red jungle fowl. Our results are a proof of concept of the reliability of using de novo tools to annotate repeats in large animal genomes. They have also revealed issues that will need to be resolved in order to develop gold-standard methodologies for annotating repeats in eukaryote genomes.

  10. Genome Engineering with TALE and CRISPR Systems in Neuroscience

    PubMed Central

    Lee, Han B.; Sundberg, Brynn N.; Sigafoos, Ashley N.; Clark, Karl J.

    2016-01-01

    Recent advancement in genome engineering technology is changing the landscape of biological research and providing neuroscientists with an opportunity to develop new methodologies to ask critical research questions. This advancement is highlighted by the increased use of programmable DNA-binding agents (PDBAs) such as transcription activator-like effector (TALE) and RNA-guided clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR associated (Cas) systems. These PDBAs fused or co-expressed with various effector domains allow precise modification of genomic sequences and gene expression levels. These technologies mirror and extend beyond classic gene targeting methods contributing to the development of novel tools for basic and clinical neuroscience. In this Review, we discuss the recent development in genome engineering and potential applications of this technology in the field of neuroscience. PMID:27092173

  11. Genome Engineering with TALE and CRISPR Systems in Neuroscience.

    PubMed

    Lee, Han B; Sundberg, Brynn N; Sigafoos, Ashley N; Clark, Karl J

    2016-01-01

    Recent advancement in genome engineering technology is changing the landscape of biological research and providing neuroscientists with an opportunity to develop new methodologies to ask critical research questions. This advancement is highlighted by the increased use of programmable DNA-binding agents (PDBAs) such as transcription activator-like effector (TALE) and RNA-guided clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR associated (Cas) systems. These PDBAs fused or co-expressed with various effector domains allow precise modification of genomic sequences and gene expression levels. These technologies mirror and extend beyond classic gene targeting methods contributing to the development of novel tools for basic and clinical neuroscience. In this Review, we discuss the recent development in genome engineering and potential applications of this technology in the field of neuroscience.

  12. Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

    PubMed Central

    Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2015-01-01

    Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593

  13. A high-throughput next-generation sequencing-based method for detecting the mutational fingerprint of carcinogens

    PubMed Central

    Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella

    2012-01-01

    Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue® mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents. PMID:22735701

  14. A high-throughput next-generation sequencing-based method for detecting the mutational fingerprint of carcinogens.

    PubMed

    Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella

    2012-08-01

    Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents.

  15. Using DNase Hi-C techniques to map global and local three-dimensional genome architecture at high resolution.

    PubMed

    Ma, Wenxiu; Ay, Ferhat; Lee, Choli; Gulsoy, Gunhan; Deng, Xinxian; Cook, Savannah; Hesson, Jennifer; Cavanaugh, Christopher; Ware, Carol B; Krumm, Anton; Shendure, Jay; Blau, C Anthony; Disteche, Christine M; Noble, William S; Duan, ZhiJun

    2018-06-01

    The folding and three-dimensional (3D) organization of chromatin in the nucleus critically impacts genome function. The past decade has witnessed rapid advances in genomic tools for delineating 3D genome architecture. Among them, chromosome conformation capture (3C)-based methods such as Hi-C are the most widely used techniques for mapping chromatin interactions. However, traditional Hi-C protocols rely on restriction enzymes (REs) to fragment chromatin and are therefore limited in resolution. We recently developed DNase Hi-C for mapping 3D genome organization, which uses DNase I for chromatin fragmentation. DNase Hi-C overcomes RE-related limitations associated with traditional Hi-C methods, leading to improved methodological resolution. Furthermore, combining this method with DNA capture technology provides a high-throughput approach (targeted DNase Hi-C) that allows for mapping fine-scale chromatin architecture at exceptionally high resolution. Hence, targeted DNase Hi-C will be valuable for delineating the physical landscapes of cis-regulatory networks that control gene expression and for characterizing phenotype-associated chromatin 3D signatures. Here, we provide a detailed description of method design and step-by-step working protocols for these two methods. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. Microbial genomics, transcriptomics and proteomics: new discoveries in decomposition research using complementary methods.

    PubMed

    Baldrian, Petr; López-Mondéjar, Rubén

    2014-02-01

    Molecular methods for the analysis of biomolecules have undergone rapid technological development in the last decade. The advent of next-generation sequencing methods and improvements in instrumental resolution enabled the analysis of complex transcriptome, proteome and metabolome data, as well as a detailed annotation of microbial genomes. The mechanisms of decomposition by model fungi have been described in unprecedented detail by the combination of genome sequencing, transcriptomics and proteomics. The increasing number of available genomes for fungi and bacteria shows that the genetic potential for decomposition of organic matter is widespread among taxonomically diverse microbial taxa, while expression studies document the importance of the regulation of expression in decomposition efficiency. Importantly, high-throughput methods of nucleic acid analysis used for the analysis of metagenomes and metatranscriptomes indicate the high diversity of decomposer communities in natural habitats and their taxonomic composition. Today, the metaproteomics of natural habitats is of interest. In combination with advanced analytical techniques to explore the products of decomposition and the accumulation of information on the genomes of environmentally relevant microorganisms, advanced methods in microbial ecophysiology should increase our understanding of the complex processes of organic matter transformation.

  17. A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

    PubMed Central

    Luo, Li; Zhu, Yun

    2012-01-01

    Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812

  18. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    PubMed

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  19. Genomic signal analysis of pathogen variability

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan

    2006-02-01

    The paper presents results in the study of pathogen variability by using genomic signals. The conversion of symbolic nucleotide sequences into digital signals offers the possibility to apply signal processing methods to the analysis of genomic data. The method is particularly well suited to characterize small size genomic sequences, such as those found in viruses and bacteria, being a promising tool in tracking the variability of pathogens, especially in the context of developing drug resistance. The paper is based on data downloaded from GenBank [32], and comprises results on the variability of the eight segments of the influenza type A, subtype H5N1, virus genome, and of the Hemagglutinin (HA) gene, for the H1, H2, H3, H4, H5 and H16 types. Data from human and avian virus isolates are used.

  20. Rapid calculation of genomic evaluations for new animals

    USDA-ARS?s Scientific Manuscript database

    A method was developed to calculate preliminary genomic evaluations daily or weekly before the release of official monthly evaluations by processing only newly genotyped animals using estimates of SNP effects from the previous official evaluation. To minimize computing time, reliabilities and genomi...

  1. Single nucleotide polymorphisms in the Mycobacterium bovis genome resolve phylogenetic relationships

    USDA-ARS?s Scientific Manuscript database

    Mycobacterium bovis isolates carry restricted allelic variation yet exhibit a range of disease phenotypes and host preferences. Conventional genotyping methods target small hyper-variable regions of their genome and provide anonymous biallelic information insufficient to develop phylogeny. To resolv...

  2. Repeat-aware modeling and correction of short read errors.

    PubMed

    Yang, Xiao; Aluru, Srinivas; Dorman, Karin S

    2011-02-15

    High-throughput short read sequencing is revolutionizing genomics and systems biology research by enabling cost-effective deep coverage sequencing of genomes and transcriptomes. Error detection and correction are crucial to many short read sequencing applications including de novo genome sequencing, genome resequencing, and digital gene expression analysis. Short read error detection is typically carried out by counting the observed frequencies of kmers in reads and validating those with frequencies exceeding a threshold. In case of genomes with high repeat content, an erroneous kmer may be frequently observed if it has few nucleotide differences with valid kmers with multiple occurrences in the genome. Error detection and correction were mostly applied to genomes with low repeat content and this remains a challenging problem for genomes with high repeat content. We develop a statistical model and a computational method for error detection and correction in the presence of genomic repeats. We propose a method to infer genomic frequencies of kmers from their observed frequencies by analyzing the misread relationships among observed kmers. We also propose a method to estimate the threshold useful for validating kmers whose estimated genomic frequency exceeds the threshold. We demonstrate that superior error detection is achieved using these methods. Furthermore, we break away from the common assumption of uniformly distributed errors within a read, and provide a framework to model position-dependent error occurrence frequencies common to many short read platforms. Lastly, we achieve better error correction in genomes with high repeat content. The software is implemented in C++ and is freely available under GNU GPL3 license and Boost Software V1.0 license at "http://aluru-sun.ece.iastate.edu/doku.php?id = redeem". We introduce a statistical framework to model sequencing errors in next-generation reads, which led to promising results in detecting and correcting errors for genomes with high repeat content.

  3. Comparative Genomics of Oral Isolates of Streptococcus mutans by in silico Genome Subtraction Does Not Reveal Accessory DNA Associated with Severe Early Childhood Caries

    PubMed Central

    Argimón, Silvia; Konganti, Kranti; Chen, Hao; Alekseyenko, Alexander V.; Brown, Stuart; Caufield, Page W.

    2014-01-01

    Comparative genomics is a popular method for the identification of microbial virulence determinants, especially since the sequencing of a large number of whole bacterial genomes from pathogenic and non-pathogenic strains has become relatively inexpensive. The bioinformatics pipelines for comparative genomics usually include gene prediction and annotation and can require significant computer power. To circumvent this, we developed a rapid method for genome-scale in silico subtractive hybridization, based on blastn and independent of feature identification and annotation. Whole genome comparisons by in silico genome subtraction were performed to identify genetic loci specific to Streptococcus mutans strains associated with severe early childhood caries (S-ECC), compared to strains isolated from caries-free (CF) children. The genome similarity of the 20 S. mutans strains included in this study, calculated by Simrank k-mer sharing, ranged from 79.5 to 90.9%, confirming this is a genetically heterogeneous group of strains. We identified strain-specific genetic elements in 19 strains, with sizes ranging from 200 bp to 39 kb. These elements contained protein-coding regions with functions mostly associated with mobile DNA. We did not, however, identify any genetic loci consistently associated with dental caries, i.e., shared by all the S-ECC strains and absent in the CF strains. Conversely, we did not identify any genetic loci specific with the healthy group. Comparison of previously published genomes from pathogenic and carriage strains of Neisseria meningitidis with our in silico genome subtraction yielded the same set of genes specific to the pathogenic strains, thus validating our method. Our results suggest that S. mutans strains derived from caries active or caries free dentitions cannot be differentiated based on the presence or absence of specific genetic elements. Our in silico genome subtraction method is available as the Microbial Genome Comparison (MGC) tool, with a user-friendly JAVA graphical interface. PMID:24291226

  4. A CRISPR view of development

    PubMed Central

    Harrison, Melissa M.; Jenkins, Brian V.; O’Connor-Giles, Kate M.

    2014-01-01

    The CRISPR (clustered regularly interspaced short palindromic repeat)–Cas9 (CRISPR-associated nuclease 9) system is poised to transform developmental biology by providing a simple, efficient method to precisely manipulate the genome of virtually any developing organism. This RNA-guided nuclease (RGN)-based approach already has been effectively used to induce targeted mutations in multiple genes simultaneously, create conditional alleles, and generate endogenously tagged proteins. Illustrating the adaptability of RGNs, the genomes of >20 different plant and animal species as well as multiple cell lines and primary cells have been successfully modified. Here we review the current and potential uses of RGNs to investigate genome function during development. PMID:25184674

  5. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms

    PubMed Central

    Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-01-01

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. PMID:26377960

  6. Best practices for evaluating single nucleotide variant calling methods for microbial genomics

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Colman, Rebecca E.; Foster, Jeffrey T.; Sahl, Jason W.; Schupp, James M.; Keim, Paul; Morrow, Jayne B.; Salit, Marc L.; Zook, Justin M.

    2015-01-01

    Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards. PMID:26217378

  7. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    PubMed

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  8. A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

    PubMed

    Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi

    2014-01-01

    With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).

  9. Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing

    PubMed Central

    Lee, Ciaran M; Cradick, Thomas J; Fine, Eli J; Bao, Gang

    2016-01-01

    The rapid advancement in targeted genome editing using engineered nucleases such as ZFNs, TALENs, and CRISPR/Cas9 systems has resulted in a suite of powerful methods that allows researchers to target any genomic locus of interest. A complementary set of design tools has been developed to aid researchers with nuclease design, target site selection, and experimental validation. Here, we review the various tools available for target selection in designing engineered nucleases, and for quantifying nuclease activity and specificity, including web-based search tools and experimental methods. We also elucidate challenges in target selection, especially in predicting off-target effects, and discuss future directions in precision genome editing and its applications. PMID:26750397

  10. Development of a rapid, robust, and universal picogreen-based method to titer adeno-associated vectors.

    PubMed

    Piedra, Jose; Ontiveros, Maria; Miravet, Susana; Penalva, Cristina; Monfar, Mercè; Chillon, Miguel

    2015-02-01

    Recombinant adeno-associated viruses (rAAVs) are promising vectors in preclinical and clinical assays for the treatment of diseases with gene therapy strategies. Recent technological advances in amplification and purification have allowed the production of highly purified rAAV vector preparations. Although quantitative polymerase chain reaction (qPCR) is the current method of choice for titrating rAAV genomes, it shows high variability. In this work, we report a rapid and robust rAAV titration method based on the quantitation of encapsidated DNA with the fluorescent dye PicoGreen®. This method allows detection from 3×10(10) viral genome/ml up to 2.4×10(13) viral genome/ml in a linear range. Contrasted with dot blot or qPCR, the PicoGreen-based assay has less intra- and interassay variability. Moreover, quantitation is rapid, does not require specific primers or probes, and is independent of the rAAV pseudotype analyzed. In summary, development of this universal rAAV-titering method may have substantive implications in rAAV technology.

  11. Data Mining Approaches for Genomic Biomarker Development: Applications Using Drug Screening Data from the Cancer Genome Project and the Cancer Cell Line Encyclopedia.

    PubMed

    Covell, David G

    2015-01-01

    Developing reliable biomarkers of tumor cell drug sensitivity and resistance can guide hypothesis-driven basic science research and influence pre-therapy clinical decisions. A popular strategy for developing biomarkers uses characterizations of human tumor samples against a range of cancer drug responses that correlate with genomic change; developed largely from the efforts of the Cancer Cell Line Encyclopedia (CCLE) and Sanger Cancer Genome Project (CGP). The purpose of this study is to provide an independent analysis of this data that aims to vet existing and add novel perspectives to biomarker discoveries and applications. Existing and alternative data mining and statistical methods will be used to a) evaluate drug responses of compounds with similar mechanism of action (MOA), b) examine measures of gene expression (GE), copy number (CN) and mutation status (MUT) biomarkers, combined with gene set enrichment analysis (GSEA), for hypothesizing biological processes important for drug response, c) conduct global comparisons of GE, CN and MUT as biomarkers across all drugs screened in the CGP dataset, and d) assess the positive predictive power of CGP-derived GE biomarkers as predictors of drug response in CCLE tumor cells. The perspectives derived from individual and global examinations of GEs, MUTs and CNs confirm existing and reveal unique and shared roles for these biomarkers in tumor cell drug sensitivity and resistance. Applications of CGP-derived genomic biomarkers to predict the drug response of CCLE tumor cells finds a highly significant ROC, with a positive predictive power of 0.78. The results of this study expand the available data mining and analysis methods for genomic biomarker development and provide additional support for using biomarkers to guide hypothesis-driven basic science research and pre-therapy clinical decisions.

  12. GeneBreak: detection of recurrent DNA copy number aberration-associated chromosomal breakpoints within genes.

    PubMed

    van den Broek, Evert; van Lieshout, Stef; Rausch, Christian; Ylstra, Bauke; van de Wiel, Mark A; Meijer, Gerrit A; Fijneman, Remond J A; Abeln, Sanne

    2016-01-01

    Development of cancer is driven by somatic alterations, including numerical and structural chromosomal aberrations. Currently, several computational methods are available and are widely applied to detect numerical copy number aberrations (CNAs) of chromosomal segments in tumor genomes. However, there is lack of computational methods that systematically detect structural chromosomal aberrations by virtue of the genomic location of CNA-associated chromosomal breaks and identify genes that appear non-randomly affected by chromosomal breakpoints across (large) series of tumor samples. 'GeneBreak' is developed to systematically identify genes recurrently affected by the genomic location of chromosomal CNA-associated breaks by a genome-wide approach, which can be applied to DNA copy number data obtained by array-Comparative Genomic Hybridization (CGH) or by (low-pass) whole genome sequencing (WGS). First, 'GeneBreak' collects the genomic locations of chromosomal CNA-associated breaks that were previously pinpointed by the segmentation algorithm that was applied to obtain CNA profiles. Next, a tailored annotation approach for breakpoint-to-gene mapping is implemented. Finally, dedicated cohort-based statistics is incorporated with correction for covariates that influence the probability to be a breakpoint gene. In addition, multiple testing correction is integrated to reveal recurrent breakpoint events. This easy-to-use algorithm, 'GeneBreak', is implemented in R ( www.cran.r-project.org ) and is available from Bioconductor ( www.bioconductor.org/packages/release/bioc/html/GeneBreak.html ).

  13. Genomic Copy Number Variation in Disorders of Cognitive Development

    ERIC Educational Resources Information Center

    Morrow, Eric M.

    2010-01-01

    Objective: To highlight recent discoveries in the area of genomic copy number variation in neuropsychiatric disorders including intellectual disability, autism, and schizophrenia. To emphasize new principles emerging from this area, involving the genetic architecture of disease, pathophysiology, and diagnosis. Method: Review of studies published…

  14. CRISPR-enabled tools for engineering microbial genomes and phenotypes.

    PubMed

    Tarasava, Katia; Oh, Eun Joong; Eckert, Carrie A; Gill, Ryan T

    2018-06-19

    In recent years CRISPR-Cas technologies have revolutionized microbial engineering approaches. Genome editing and non-editing applications of various CRISPR-Cas systems have expanded the throughput and scale of engineering efforts, as well as opened up new avenues for manipulating genomes of non-model organisms. As we expand the range of organisms used for biotechnological applications, we need to develop better, more versatile tools for manipulation of these systems. Here we summarize the current advances in microbial gene editing using CRISPR-Cas based tools, and highlight state-of-the-art methods for high-throughput, efficient genome-scale engineering in model organisms Escherichia coli and Saccharomyces cerevisiae. We also review non-editing CRISPR-Cas applications available for gene expression manipulation, epigenetic remodeling, RNA editing, labeling and synthetic gene circuit design. Finally, we point out the areas of research that need further development in order to expand the range of applications and increase the utility of these new methods. This article is protected by copyright. All rights reserved.

  15. Genome editing for crop improvement: Challenges and opportunities

    PubMed Central

    Abdallah, Naglaa A; Prakash, Channapatna S; McHughen, Alan G

    2015-01-01

    ABSTRACT Genome or gene editing includes several new techniques to help scientists precisely modify genome sequences. The techniques also enables us to alter the regulation of gene expression patterns in a pre-determined region and facilitates novel insights into the functional genomics of an organism. Emergence of genome editing has brought considerable excitement especially among agricultural scientists because of its simplicity, precision and power as it offers new opportunities to develop improved crop varieties with clear-cut addition of valuable traits or removal of undesirable traits. Research is underway to improve crop varieties with higher yields, strengthen stress tolerance, disease and pest resistance, decrease input costs, and increase nutritional value. Genome editing encompasses a wide variety of tools using either a site-specific recombinase (SSR) or a site-specific nuclease (SSN) system. Both systems require recognition of a known sequence. The SSN system generates single or double strand DNA breaks and activates endogenous DNA repair pathways. SSR technology, such as Cre/loxP and Flp/FRT mediated systems, are able to knockdown or knock-in genes in the genome of eukaryotes, depending on the orientation of the specific sites (loxP, FLP, etc.) flanking the target site. There are 4 main classes of SSN developed to cleave genomic sequences, mega-nucleases (homing endonuclease), zinc finger nucleases (ZFNs), transcriptional activator-like effector nucleases (TALENs), and the CRISPR/Cas nuclease system (clustered regularly interspaced short palindromic repeat/CRISPR-associated protein). The recombinase mediated genome engineering depends on recombinase (sub-) family and target-site and induces high frequencies of homologous recombination. Improving crops with gene editing provides a range of options: by altering only a few nucleotides from billions found in the genomes of living cells, altering the full allele or by inserting a new gene in a targeted region of the genome. Due to its precision, gene editing is more precise than either conventional crop breeding methods or standard genetic engineering methods. Thus this technology is a very powerful tool that can be used toward securing the world's food supply. In addition to improving the nutritional value of crops, it is the most effective way to produce crops that can resist pests and thrive in tough climates. There are 3 types of modifications produced by genome editing; Type I includes altering a few nucleotides, Type II involves replacing an allele with a pre-existing one and Type III allows for the insertion of new gene(s) in predetermined regions in the genome. Because most genome-editing techniques can leave behind traces of DNA alterations evident in a small number of nucleotides, crops created through gene editing could avoid the stringent regulation procedures commonly associated with GM crop development. For this reason many scientists believe plants improved with the more precise gene editing techniques will be more acceptable to the public than transgenic plants. With genome editing comes the promise of new crops being developed more rapidly with a very low risk of off-target effects. It can be performed in any laboratory with any crop, even those that have complex genomes and are not easily bred using conventional methods. PMID:26930114

  16. Methods for Optimizing CRISPR-Cas9 Genome Editing Specificity

    PubMed Central

    Tycko, Josh; Myer, Vic E.; Hsu, Patrick D.

    2016-01-01

    Summary Advances in the development of delivery, repair, and specificity strategies for the CRISPR-Cas9 genome engineering toolbox are helping researchers understand gene function with unprecedented precision and sensitivity. CRISPR-Cas9 also holds enormous therapeutic potential for the treatment of genetic disorders by directly correcting disease-causing mutations. Although the Cas9 protein has been shown to bind and cleave DNA at off-target sites, the field of Cas9 specificity is rapidly progressing with marked improvements in guide RNA selection, protein and guide engineering, novel enzymes, and off-target detection methods. We review important challenges and breakthroughs in the field as a comprehensive practical guide to interested users of genome editing technologies, highlighting key tools and strategies for optimizing specificity. The genome editing community should now strive to standardize such methods for measuring and reporting off-target activity, while keeping in mind that the goal for specificity should be continued improvement and vigilance. PMID:27494557

  17. Whole-genome multiple displacement amplification from single cells.

    PubMed

    Spits, Claudia; Le Caignec, Cédric; De Rycke, Martine; Van Haute, Lindsey; Van Steirteghem, André; Liebaers, Inge; Sermon, Karen

    2006-01-01

    Multiple displacement amplification (MDA) is a recently described method of whole-genome amplification (WGA) that has proven efficient in the amplification of small amounts of DNA, including DNA from single cells. Compared with PCR-based WGA methods, MDA generates DNA with a higher molecular weight and shows better genome coverage. This protocol was developed for preimplantation genetic diagnosis, and details a method for performing single-cell MDA using the phi29 DNA polymerase. It can also be useful for the amplification of other minute quantities of DNA, such as from forensic material or microdissected tissue. The protocol includes the collection and lysis of single cells, and all materials and steps involved in the MDA reaction. The whole procedure takes 3 h and generates 1-2 microg of DNA from a single cell, which is suitable for multiple downstream applications, such as sequencing, short tandem repeat analysis or array comparative genomic hybridization.

  18. A Statistical Framework for the Functional Analysis of Metagenomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sharon, Itai; Pati, Amrita; Markowitz, Victor

    2008-10-01

    Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements.more » They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.« less

  19. Bridging the Resolution Gap in Structural Modeling of 3D Genome Organization

    PubMed Central

    Marti-Renom, Marc A.; Mirny, Leonid A.

    2011-01-01

    Over the last decade, and especially after the advent of fluorescent in situ hybridization imaging and chromosome conformation capture methods, the availability of experimental data on genome three-dimensional organization has dramatically increased. We now have access to unprecedented details of how genomes organize within the interphase nucleus. Development of new computational approaches to leverage this data has already resulted in the first three-dimensional structures of genomic domains and genomes. Such approaches expand our knowledge of the chromatin folding principles, which has been classically studied using polymer physics and molecular simulations. Our outlook describes computational approaches for integrating experimental data with polymer physics, thereby bridging the resolution gap for structural determination of genomes and genomic domains. PMID:21779160

  20. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    DOE PAGES

    McNair, Katelyn; Edwards, Robert A.

    2015-06-16

    As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping errormore » rates low, as well as offering unique data visualization options.« less

  1. Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods.

    PubMed

    Kamoun, Choumouss; Payen, Thibaut; Hua-Van, Aurélie; Filée, Jonathan

    2013-10-11

    Insertion Sequences (ISs) and their non-autonomous derivatives (MITEs) are important components of prokaryotic genomes inducing duplication, deletion, rearrangement or lateral gene transfers. Although ISs and MITEs are relatively simple and basic genetic elements, their detection remains a difficult task due to their remarkable sequence diversity. With the advent of high-throughput genome and metagenome sequencing technologies, the development of fast, reliable and sensitive methods of ISs and MITEs detection become an important challenge. So far, almost all studies dealing with prokaryotic transposons have used classical BLAST-based detection methods against reference libraries. Here we introduce alternative methods of detection either taking advantages of the structural properties of the elements (de novo methods) or using an additional library-based method using profile HMM searches. In this study, we have developed three different work flows dedicated to ISs and MITEs detection: the first two use de novo methods detecting either repeated sequences or presence of Inverted Repeats; the third one use 28 in-house transposase alignment profiles with HMM search methods. We have compared the respective performances of each method using a reference dataset of 30 archaeal and 30 bacterial genomes in addition to simulated and real metagenomes. Compared to a BLAST-based method using ISFinder as library, de novo methods significantly improve ISs and MITEs detection. For example, in the 30 archaeal genomes, we discovered 30 new elements (+20%) in addition to the 141 multi-copies elements already detected by the BLAST approach. Many of the new elements correspond to ISs belonging to unknown or highly divergent families. The total number of MITEs has even doubled with the discovery of elements displaying very limited sequence similarities with their respective autonomous partners (mainly in the Inverted Repeats of the elements). Concerning metagenomes, with the exception of short reads data (<300 bp) for which both techniques seem equally limited, profile HMM searches considerably ameliorate the detection of transposase encoding genes (up to +50%) generating low level of false positives compare to BLAST-based methods. Compared to classical BLAST-based methods, the sensitivity of de novo and profile HMM methods developed in this study allow a better and more reliable detection of transposons in prokaryotic genomes and metagenomes. We believed that future studies implying ISs and MITEs identification in genomic data should combine at least one de novo and one library-based method, with optimal results obtained by running the two de novo methods in addition to a library-based search. For metagenomic data, profile HMM search should be favored, a BLAST-based step is only useful to the final annotation into groups and families.

  2. Identifying and mitigating batch effects in whole genome sequencing data.

    PubMed

    Tom, Jennifer A; Reeder, Jens; Forrest, William F; Graham, Robert R; Hunkapiller, Julie; Behrens, Timothy W; Bhangale, Tushar R

    2017-07-24

    Large sample sets of whole genome sequencing with deep coverage are being generated, however assembling datasets from different sources inevitably introduces batch effects. These batch effects are not well understood and can be due to changes in the sequencing protocol or bioinformatics tools used to process the data. No systematic algorithms or heuristics exist to detect and filter batch effects or remove associations impacted by batch effects in whole genome sequencing data. We describe key quality metrics, provide a freely available software package to compute them, and demonstrate that identification of batch effects is aided by principal components analysis of these metrics. To mitigate batch effects, we developed new site-specific filters that identified and removed variants that falsely associated with the phenotype due to batch effect. These include filtering based on: a haplotype based genotype correction, a differential genotype quality test, and removing sites with missing genotype rate greater than 30% after setting genotypes with quality scores less than 20 to missing. This method removed 96.1% of unconfirmed genome-wide significant SNP associations and 97.6% of unconfirmed genome-wide significant indel associations. We performed analyses to demonstrate that: 1) These filters impacted variants known to be disease associated as 2 out of 16 confirmed associations in an AMD candidate SNP analysis were filtered, representing a reduction in power of 12.5%, 2) In the absence of batch effects, these filters removed only a small proportion of variants across the genome (type I error rate of 3%), and 3) in an independent dataset, the method removed 90.2% of unconfirmed genome-wide SNP associations and 89.8% of unconfirmed genome-wide indel associations. Researchers currently do not have effective tools to identify and mitigate batch effects in whole genome sequencing data. We developed and validated methods and filters to address this deficiency.

  3. Strand-specific transcriptome profiling with directly labeled RNA on genomic tiling microarrays

    PubMed Central

    2011-01-01

    Background With lower manufacturing cost, high spot density, and flexible probe design, genomic tiling microarrays are ideal for comprehensive transcriptome studies. Typically, transcriptome profiling using microarrays involves reverse transcription, which converts RNA to cDNA. The cDNA is then labeled and hybridized to the probes on the arrays, thus the RNA signals are detected indirectly. Reverse transcription is known to generate artifactual cDNA, in particular the synthesis of second-strand cDNA, leading to false discovery of antisense RNA. To address this issue, we have developed an effective method using RNA that is directly labeled, thus by-passing the cDNA generation. This paper describes this method and its application to the mapping of transcriptome profiles. Results RNA extracted from laboratory cultures of Porphyromonas gingivalis was fluorescently labeled with an alkylation reagent and hybridized directly to probes on genomic tiling microarrays specifically designed for this periodontal pathogen. The generated transcriptome profile was strand-specific and produced signals close to background level in most antisense regions of the genome. In contrast, high levels of signal were detected in the antisense regions when the hybridization was done with cDNA. Five antisense areas were tested with independent strand-specific RT-PCR and none to negligible amplification was detected, indicating that the strong antisense cDNA signals were experimental artifacts. Conclusions An efficient method was developed for mapping transcriptome profiles specific to both coding strands of a bacterial genome. This method chemically labels and uses extracted RNA directly in microarray hybridization. The generated transcriptome profile was free of cDNA artifactual signals. In addition, this method requires fewer processing steps and is potentially more sensitive in detecting small amount of RNA compared to conventional end-labeling methods due to the incorporation of more fluorescent molecules per RNA fragment. PMID:21235785

  4. Genomics: The Science and Technology Behind the Human Genome Project (by Charles R. Cantor and Cassandra L. Smith)

    NASA Astrophysics Data System (ADS)

    Serra, Reviewed By Martin J.

    2000-01-01

    Genomics is one of the most rapidly expanding areas of science. This book is an outgrowth of a series of lectures given by one of the former heads (CRC) of the Human Genome Initiative. The book is designed to reach a wide audience, from biologists with little chemical or physical science background through engineers, computer scientists, and physicists with little current exposure to the chemical or biological principles of genetics. The text starts with a basic review of the chemical and biological properties of DNA. However, without either a biochemistry background or a supplemental biochemistry text, this chapter and much of the rest of the text would be difficult to digest. The second chapter is designed to put DNA into the context of the larger chromosomal unit. Specialized chromosomal structures and sequences (centromeres, telomeres) are introduced, leading to a section on chromosome organization and purification. The next 4 chapters cover the physical (hybridization, electrophoresis), chemical (polymerase chain reaction), and biological (genetic) techniques that provide the backbone of genomic analysis. These chapters cover in significant detail the fundamental principles underlying each technique and provide a firm background for the remainder of the text. Chapters 7­9 consider the need and methods for the development of physical maps. Chapter 7 primarily discusses chromosomal localization techniques, including in situ hybridization, FISH, and chromosome paintings. The next two chapters focus on the development of libraries and clones. In particular, Chapter 9 considers the limitations of current mapping and clone production. The current state and future of DNA sequencing is covered in the next three chapters. The first considers the current methods of DNA sequencing - especially gel-based methods of analysis, although other possible approaches (mass spectrometry) are introduced. Much of the chapter addresses the limitations of current methods, including analysis of error in sequencing and current bottlenecks in the sequencing effort. The next chapter describes the steps necessary to scale current technologies for the sequencing of entire genomes. Chapter 12 examines alternate methods for DNA sequencing. Initially, methods of single-molecule sequencing and sequencing by microscopy are introduced; the majority of the chapter is devoted to the development of DNA sequencing methods using chip microarrays and hybridization. The remaining chapters (13-15) consider the uses and analysis of DNA sequence information. The initial focus is on the identification of genes. Several examples are given of the use of DNA sequence information for diagnosis of inherited or infectious diseases. The sequence-specific manipulation of DNA is discussed in Chapter 14. The final chapter deals with the implications of large-scale sequencing, including methods for identifying genes and finding errors in DNA sequences, to the development of computer algorithms for the interpretation of DNA sequence information. The text figures are black and white line drawings that, although clearly done, seem a bit primitive for 1999. While I appreciated the simplicity of the drawings, many students accustomed to more colorful presentations will find them wanting. The four color figures in the center of the text seem an afterthought and add little to the text's clarity. Each chapter has a set of additional reading sources, mostly primary sources. Often, specialized topics are offset into boxes that provide clarification and amplification without cluttering the text. An appendix includes a list of the Web-based database resources. As an undergraduate instructor who has previously taught biochemistry, molecular biology, and a course on the human genome, I found many interesting tidbits and amplifications throughout the text. I would recommend this book as a text for an advanced undergraduate or beginning graduate course in genomics. Although the text works though several examples of genetic and genome analysis, additional problem/homework sets would need to be developed to ensure student comprehension. The text steers clear of the ethical implications of the Human Genome Initiative and remains true to its subtitle The Science and Technology .

  5. The Mitochondrial Genome Sequence and Molecular Phylogeny of the Turkey, Meleagris gallopavo

    PubMed Central

    Guan, Xiaojing; Silva, Pradeepa; Gyenai, Kwaku B.; Xu, Jun; Geng, Tuoyu; Tu, Zhijian; Samuels, David C.; Smith, Edward J.

    2009-01-01

    Summary The mitochondrial genome (mtGenome) has been very little studied in the turkey (Meleagris gallopavo), for which there is no publicly available whole genome mitochondrial sequence. Here, we used PCR-based methods with 19 pairs of primers designed from the chicken and other species to develop a complete turkey mtGenome sequence. A total length of 16, 717 bp of the whole turkey mtGenome was obtained, with 85% similarity to chicken mtGenome. There were 13 genes and 24 RNA (22 tRNA and 2 rRNA) annotated. The mtGenome-based phylogenetic analysis suggests that the turkey is most closely related to the chicken, Gallus gallus, and quail, Corturnix japonica. Given the importance of the mitochondria genome, the present work adds to the growing genomic resources needed to define the genetic mechanisms that underlie some economic traits in the turkey. PMID:19067672

  6. GenomeGraphs: integrated genomic data visualization with R.

    PubMed

    Durinck, Steffen; Bullard, James; Spellman, Paul T; Dudoit, Sandrine

    2009-01-06

    Biological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses. We developed GenomeGraphs, as an add-on software package for the statistical programming environment R, to facilitate integrated visualization of genomic datasets. GenomeGraphs uses the biomaRt package to perform on-line annotation queries to Ensembl and translates these to gene/transcript structures in viewports of the grid graphics package. This allows genomic annotation to be plotted together with experimental data. GenomeGraphs can also be used to plot custom annotation tracks in combination with different experimental data types together in one plot using the same genomic coordinate system. GenomeGraphs is a flexible and extensible software package which can be used to visualize a multitude of genomic datasets within the statistical programming environment R.

  7. Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

    PubMed

    Li, Qing; Hermanson, Peter J; Springer, Nathan M

    2018-01-01

    DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.

  8. QUAST: quality assessment tool for genome assemblies.

    PubMed

    Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay; Tesler, Glenn

    2013-04-15

    Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers have been developed, but none is yet a recognized benchmark. Further, most existing methods for comparing assemblies are only applicable to new assemblies of finished genomes; the problem of evaluating assemblies of previously unsequenced species has not been adequately considered. Here, we present QUAST-a quality assessment tool for evaluating and comparing genome assemblies. This tool improves on leading assembly comparison software with new ideas and quality metrics. QUAST can evaluate assemblies both with a reference genome, as well as without a reference. QUAST produces many reports, summary tables and plots to help scientists in their research and in their publications. In this study, we used QUAST to compare several genome assemblers on three datasets. QUAST tables and plots for all of them are available in the Supplementary Material, and interactive versions of these reports are on the QUAST website. http://bioinf.spbau.ru/quast . Supplementary data are available at Bioinformatics online.

  9. Microbial genomic island discovery, visualization and analysis.

    PubMed

    Bertelli, Claire; Tilley, Keith E; Brinkman, Fiona S L

    2018-06-03

    Horizontal gene transfer (also called lateral gene transfer) is a major mechanism for microbial genome evolution, enabling rapid adaptation and survival in specific niches. Genomic islands (GIs), commonly defined as clusters of bacterial or archaeal genes of probable horizontal origin, are of particular medical, environmental and/or industrial interest, as they disproportionately encode virulence factors and some antimicrobial resistance genes and may harbor entire metabolic pathways that confer a specific adaptation (solvent resistance, symbiosis properties, etc). As large-scale analyses of microbial genomes increases, such as for genomic epidemiology investigations of infectious disease outbreaks in public health, there is increased appreciation of the need to accurately predict and track GIs. Over the past decade, numerous computational tools have been developed to tackle the challenges inherent in accurate GI prediction. We review here the main types of GI prediction methods and discuss their advantages and limitations for a routine analysis of microbial genomes in this era of rapid whole-genome sequencing. An assessment is provided of 20 GI prediction software methods that use sequence-composition bias to identify the GIs, using a reference GI data set from 104 genomes obtained using an independent comparative genomics approach. Finally, we present guidelines to assist researchers in effectively identifying these key genomic regions.

  10. Natural product discovery: past, present, and future.

    PubMed

    Katz, Leonard; Baltz, Richard H

    2016-03-01

    Microorganisms have provided abundant sources of natural products which have been developed as commercial products for human medicine, animal health, and plant crop protection. In the early years of natural product discovery from microorganisms (The Golden Age), new antibiotics were found with relative ease from low-throughput fermentation and whole cell screening methods. Later, molecular genetic and medicinal chemistry approaches were applied to modify and improve the activities of important chemical scaffolds, and more sophisticated screening methods were directed at target disease states. In the 1990s, the pharmaceutical industry moved to high-throughput screening of synthetic chemical libraries against many potential therapeutic targets, including new targets identified from the human genome sequencing project, largely to the exclusion of natural products, and discovery rates dropped dramatically. Nonetheless, natural products continued to provide key scaffolds for drug development. In the current millennium, it was discovered from genome sequencing that microbes with large genomes have the capacity to produce about ten times as many secondary metabolites as was previously recognized. Indeed, the most gifted actinomycetes have the capacity to produce around 30-50 secondary metabolites. With the precipitous drop in cost for genome sequencing, it is now feasible to sequence thousands of actinomycete genomes to identify the "biosynthetic dark matter" as sources for the discovery of new and novel secondary metabolites. Advances in bioinformatics, mass spectrometry, proteomics, transcriptomics, metabolomics and gene expression are driving the new field of microbial genome mining for applications in natural product discovery and development.

  11. Plant functional genomics

    NASA Astrophysics Data System (ADS)

    Holtorf, Hauke; Guitton, Marie-Christine; Reski, Ralf

    2002-04-01

    Functional genome analysis of plants has entered the high-throughput stage. The complete genome information from key species such as Arabidopsis thaliana and rice is now available and will further boost the application of a range of new technologies to functional plant gene analysis. To broadly assign functions to unknown genes, different fast and multiparallel approaches are currently used and developed. These new technologies are based on known methods but are adapted and improved to accommodate for comprehensive, large-scale gene analysis, i.e. such techniques are novel in the sense that their design allows researchers to analyse many genes at the same time and at an unprecedented pace. Such methods allow analysis of the different constituents of the cell that help to deduce gene function, namely the transcripts, proteins and metabolites. Similarly the phenotypic variations of entire mutant collections can now be analysed in a much faster and more efficient way than before. The different methodologies have developed to form their own fields within the functional genomics technological platform and are termed transcriptomics, proteomics, metabolomics and phenomics. Gene function, however, cannot solely be inferred by using only one such approach. Rather, it is only by bringing together all the information collected by different functional genomic tools that one will be able to unequivocally assign functions to unknown plant genes. This review focuses on current technical developments and their impact on the field of plant functional genomics. The lower plant Physcomitrella is introduced as a new model system for gene function analysis, owing to its high rate of homologous recombination.

  12. A Comparative Analysis of the Lyve-SET Phylogenomics Pipeline for Genomic Epidemiology of Foodborne Pathogens

    PubMed Central

    Katz, Lee S.; Griswold, Taylor; Williams-Newkirk, Amanda J.; Wagner, Darlene; Petkau, Aaron; Sieffert, Cameron; Van Domselaar, Gary; Deng, Xiangyu; Carleton, Heather A.

    2017-01-01

    Modern epidemiology of foodborne bacterial pathogens in industrialized countries relies increasingly on whole genome sequencing (WGS) techniques. As opposed to profiling techniques such as pulsed-field gel electrophoresis, WGS requires a variety of computational methods. Since 2013, United States agencies responsible for food safety including the CDC, FDA, and USDA, have been performing whole-genome sequencing (WGS) on all Listeria monocytogenes found in clinical, food, and environmental samples. Each year, more genomes of other foodborne pathogens such as Escherichia coli, Campylobacter jejuni, and Salmonella enterica are being sequenced. Comparing thousands of genomes across an entire species requires a fast method with coarse resolution; however, capturing the fine details of highly related isolates requires a computationally heavy and sophisticated algorithm. Most L. monocytogenes investigations employing WGS depend on being able to identify an outbreak clade whose inter-genomic distances are less than an empirically determined threshold. When the difference between a few single nucleotide polymorphisms (SNPs) can help distinguish between genomes that are likely outbreak-associated and those that are less likely to be associated, we require a fine-resolution method. To achieve this level of resolution, we have developed Lyve-SET, a high-quality SNP pipeline. We evaluated Lyve-SET by retrospectively investigating 12 outbreak data sets along with four other SNP pipelines that have been used in outbreak investigation or similar scenarios. To compare these pipelines, several distance and phylogeny-based comparison methods were applied, which collectively showed that multiple pipelines were able to identify most outbreak clusters and strains. Currently in the US PulseNet system, whole genome multi-locus sequence typing (wgMLST) is the preferred primary method for foodborne WGS cluster detection and outbreak investigation due to its ability to name standardized genomic profiles, its central database, and its ability to be run in a graphical user interface. However, creating a functional wgMLST scheme requires extended up-front development and subject-matter expertise. When a scheme does not exist or when the highest resolution is needed, SNP analysis is used. Using three Listeria outbreak data sets, we demonstrated the concordance between Lyve-SET SNP typing and wgMLST. Availability: Lyve-SET can be found at https://github.com/lskatz/Lyve-SET. PMID:28348549

  13. Developing a Common Framework for Evaluating the Implementation of Genomic Medicine Interventions in Clinical Care: The IGNITE Network’s Common Measures Working Group

    PubMed Central

    Orlando, Lori A.; Sperber, Nina R.; Voils, Corrine; Nichols, Marshall; Myers, Rachel A.; Wu, R. Ryanne; Rakhra-Burris, Tejinder; Levy, Kenneth D.; Levy, Mia; Pollin, Toni I.; Guan, Yue; Horowitz, Carol R.; Ramos, Michelle; Kimmel, Stephen E.; McDonough, Caitrin W.; Madden, Ebony B.; Damschroder, Laura J.

    2017-01-01

    Purpose Implementation research provides a structure for evaluating the clinical integration of genomic medicine interventions. This paper describes the Implementing GeNomics In PracTicE (IGNITE) Network’s efforts to promote: 1) a broader understanding of genomic medicine implementation research; and 2) the sharing of knowledge generated in the network. Methods To facilitate this goal the IGNITE Network Common Measures Working Group (CMG) members adopted the Consolidated Framework for Implementation Research (CFIR) to guide their approach to: identifying constructs and measures relevant to evaluating genomic medicine as a whole, standardizing data collection across projects, and combining data in a centralized resource for cross network analyses. Results CMG identified ten high-priority CFIR constructs as important for genomic medicine. Of those, eight didn’t have standardized measurement instruments. Therefore, we developed four survey tools to address this gap. In addition, we identified seven high-priority constructs related to patients, families, and communities that did not map to CFIR constructs. Both sets of constructs were combined to create a draft genomic medicine implementation model. Conclusion We developed processes to identify constructs deemed valuable for genomic medicine implementation and codified them in a model. These resources are freely available to facilitate knowledge generation and sharing across the field. PMID:28914267

  14. Developing a common framework for evaluating the implementation of genomic medicine interventions in clinical care: the IGNITE Network's Common Measures Working Group.

    PubMed

    Orlando, Lori A; Sperber, Nina R; Voils, Corrine; Nichols, Marshall; Myers, Rachel A; Wu, R Ryanne; Rakhra-Burris, Tejinder; Levy, Kenneth D; Levy, Mia; Pollin, Toni I; Guan, Yue; Horowitz, Carol R; Ramos, Michelle; Kimmel, Stephen E; McDonough, Caitrin W; Madden, Ebony B; Damschroder, Laura J

    2018-06-01

    PurposeImplementation research provides a structure for evaluating the clinical integration of genomic medicine interventions. This paper describes the Implementing Genomics in Practice (IGNITE) Network's efforts to promote (i) a broader understanding of genomic medicine implementation research and (ii) the sharing of knowledge generated in the network.MethodsTo facilitate this goal, the IGNITE Network Common Measures Working Group (CMG) members adopted the Consolidated Framework for Implementation Research (CFIR) to guide its approach to identifying constructs and measures relevant to evaluating genomic medicine as a whole, standardizing data collection across projects, and combining data in a centralized resource for cross-network analyses.ResultsCMG identified 10 high-priority CFIR constructs as important for genomic medicine. Of those, eight did not have standardized measurement instruments. Therefore, we developed four survey tools to address this gap. In addition, we identified seven high-priority constructs related to patients, families, and communities that did not map to CFIR constructs. Both sets of constructs were combined to create a draft genomic medicine implementation model.ConclusionWe developed processes to identify constructs deemed valuable for genomic medicine implementation and codified them in a model. These resources are freely available to facilitate knowledge generation and sharing across the field.

  15. Integration of genomic medicine into pathology residency training: the stanford open curriculum.

    PubMed

    Schrijver, Iris; Natkunam, Yasodha; Galli, Stephen; Boyd, Scott D

    2013-03-01

    Next-generation sequencing methods provide an opportunity for molecular pathology laboratories to perform genomic testing that is far more comprehensive than single-gene analyses. Genome-based test results are expected to develop into an integral component of diagnostic clinical medicine and to provide the basis for individually tailored health care. To achieve these goals, rigorous interpretation of high-quality data must be informed by the medical history and the phenotype of the patient. The discipline of pathology is well positioned to implement genome-based testing and to interpret its results, but new knowledge and skills must be included in the training of pathologists to develop expertise in this area. Pathology residents should be trained in emerging technologies to integrate genomic test results appropriately with more traditional testing, to accelerate clinical studies using genomic data, and to help develop appropriate standards of data quality and evidence-based interpretation of these test results. We have created a genomic pathology curriculum as a first step in helping pathology residents build a foundation for the understanding of genomic medicine and its implications for clinical practice. This curriculum is freely accessible online. Copyright © 2013 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  16. CSP- 5th Champalimaud Neuroscience Symposium

    DTIC Science & Technology

    2017-03-20

    combination of circuit neuroscience and state of the art genomic engineering approaches such as CRISPR are likely to lead to a new wave of exciting...USA presented the power of zebrafish for developing novel technologies. He showed · ho creative the use of genome engineering methods based on CRISPR

  17. Simultaneous non-contiguous deletions using large synthetic DNA and site-specific recombinases

    PubMed Central

    Krishnakumar, Radha; Grose, Carissa; Haft, Daniel H.; Zaveri, Jayshree; Alperovich, Nina; Gibson, Daniel G.; Merryman, Chuck; Glass, John I.

    2014-01-01

    Toward achieving rapid and large scale genome modification directly in a target organism, we have developed a new genome engineering strategy that uses a combination of bioinformatics aided design, large synthetic DNA and site-specific recombinases. Using Cre recombinase we swapped a target 126-kb segment of the Escherichia coli genome with a 72-kb synthetic DNA cassette, thereby effectively eliminating over 54 kb of genomic DNA from three non-contiguous regions in a single recombination event. We observed complete replacement of the native sequence with the modified synthetic sequence through the action of the Cre recombinase and no competition from homologous recombination. Because of the versatility and high-efficiency of the Cre-lox system, this method can be used in any organism where this system is functional as well as adapted to use with other highly precise genome engineering systems. Compared to present-day iterative approaches in genome engineering, we anticipate this method will greatly speed up the creation of reduced, modularized and optimized genomes through the integration of deletion analyses data, transcriptomics, synthetic biology and site-specific recombination. PMID:24914053

  18. Advances in plant gene-targeted and functional markers: a review

    PubMed Central

    2013-01-01

    Public genomic databases have provided new directions for molecular marker development and initiated a shift in the types of PCR-based techniques commonly used in plant science. Alongside commonly used arbitrarily amplified DNA markers, other methods have been developed. Targeted fingerprinting marker techniques are based on the well-established practices of arbitrarily amplified DNA methods, but employ novel methodological innovations such as the incorporation of gene or promoter elements in the primers. These markers provide good reproducibility and increased resolution by the concurrent incidence of dominant and co-dominant bands. Despite their promising features, these semi-random markers suffer from possible problems of collision and non-homology analogous to those found with randomly generated fingerprints. Transposable elements, present in abundance in plant genomes, may also be used to generate fingerprints. These markers provide increased genomic coverage by utilizing specific targeted sites and produce bands that mostly seem to be homologous. The biggest drawback with most of these techniques is that prior genomic information about retrotransposons is needed for primer design, prohibiting universal applications. Another class of recently developed methods exploits length polymorphism present in arrays of multi-copy gene families such as cytochrome P450 and β-tubulin genes to provide cross-species amplification and transferability. A specific class of marker makes use of common features of plant resistance genes to generate bands linked to a given phenotype, or to reveal genetic diversity. Conserved DNA-based strategies have limited genome coverage and may fail to reveal genetic diversity, while resistance genes may be under specific evolutionary selection. Markers may also be generated from functional and/or transcribed regions of the genome using different gene-targeting approaches coupled with the use of RNA information. Such techniques have the potential to generate phenotypically linked functional markers, especially when fingerprints are generated from the transcribed or expressed region of the genome. It is to be expected that these recently developed techniques will generate larger datasets, but their shortcomings should also be acknowledged and carefully investigated. PMID:23406322

  19. Current Advances in Detection and Treatment of Babesiosis

    PubMed Central

    Mosqueda, J; Olvera-Ramírez, A; Aguilar-Tipacamú, G; Cantó, GJ

    2012-01-01

    Babesiosis is a disease with a world-wide distribution affecting many species of mammals principally cattle and man. The major impact occurs in the cattle industry where bovine babesiosis has had a huge economic effect due to loss of meat and beef production of infected animals and death. Nowadays to those costs there must be added the high cost of tick control, disease detection, prevention and treatment. In almost a century and a quarter since the first report of the disease, the truth is: there is no a safe and efficient vaccine available, there are limited chemotherapeutic choices and few low-cost, reliable and fast detection methods. Detection and treatment of babesiosis are important tools to control babesiosis. Microscopy detection methods are still the cheapest and fastest methods used to identify Babesia parasites although their sensitivity and specificity are limited. Newer immunological methods are being developed and they offer faster, more sensitive and more specific options to conventional methods, although the direct immunological diagnoses of parasite antigens in host tissues are still missing. Detection methods based on nucleic acid identification and their amplification are the most sensitive and reliable techniques available today; importantly, most of those methodologies were developed before the genomics and bioinformatics era, which leaves ample room for optimization. For years, babesiosis treatment has been based on the use of very few drugs like imidocarb or diminazene aceturate. Recently, several pharmacological compounds were developed and evaluated, offering new options to control the disease. With the complete sequence of the Babesia bovis genome and the B. bigemina genome project in progress, the post-genomic era brings a new light on the development of diagnosis methods and new chemotherapy targets. In this review, we will present the current advances in detection and treatment of babesiosis in cattle and other animals, with additional reference to several apicomplexan parasites. PMID:22360483

  20. Direct detection of methylation in genomic DNA

    PubMed Central

    Bart, A.; van Passel, M. W. J.; van Amsterdam, K.; van der Ende, A.

    2005-01-01

    The identification of methylated sites on bacterial genomic DNA would be a useful tool to study the major roles of DNA methylation in prokaryotes: distinction of self and nonself DNA, direction of post-replicative mismatch repair, control of DNA replication and cell cycle, and regulation of gene expression. Three types of methylated nucleobases are known: N6-methyladenine, 5-methylcytosine and N4-methylcytosine. The aim of this study was to develop a method to detect all three types of DNA methylation in complete genomic DNA. It was previously shown that N6-methyladenine and 5-methylcytosine in plasmid and viral DNA can be detected by intersequence trace comparison of methylated and unmethylated DNA. We extended this method to include N4-methylcytosine detection in both in vitro and in vivo methylated DNA. Furthermore, application of intersequence trace comparison was extended to bacterial genomic DNA. Finally, we present evidence that intrasequence comparison suffices to detect methylated sites in genomic DNA. In conclusion, we present a method to detect all three natural types of DNA methylation in bacterial genomic DNA. This provides the possibility to define the complete methylome of any prokaryote. PMID:16091626

  1. Challenges in NMR-based structural genomics

    NASA Astrophysics Data System (ADS)

    Sue, Shih-Che; Chang, Chi-Fon; Huang, Yao-Te; Chou, Ching-Yu; Huang, Tai-huang

    2005-05-01

    Understanding the functions of the vast number of proteins encoded in many genomes that have been completely sequenced recently is the main challenge for biologists in the post-genomics era. Since the function of a protein is determined by its exact three-dimensional structure it is paramount to determine the 3D structures of all proteins. This need has driven structural biologists to undertake the structural genomics project aimed at determining the structures of all known proteins. Several centers for structural genomics studies have been established throughout the world. Nuclear magnetic resonance (NMR) spectroscopy has played a major role in determining protein structures in atomic details and in a physiologically relevant solution state. Since the number of new genes being discovered daily far exceeds the number of structures determined by both NMR and X-ray crystallography, a high-throughput method for speeding up the process of protein structure determination is essential for the success of the structural genomics effort. In this article we will describe NMR methods currently being employed for protein structure determination. We will also describe methods under development which may drastically increase the throughput, as well as point out areas where opportunities exist for biophysicists to make significant contribution in this important field.

  2. Revealing Alzheimer's disease genes spectrum in the whole-genome by machine learning.

    PubMed

    Huang, Xiaoyan; Liu, Hankui; Li, Xinming; Guan, Liping; Li, Jiankang; Tellier, Laurent Christian Asker M; Yang, Huanming; Wang, Jian; Zhang, Jianguo

    2018-01-10

    Alzheimer's disease (AD) is an important, progressive neurodegenerative disease, with a complex genetic architecture. A key goal of biomedical research is to seek out disease risk genes, and to elucidate the function of these risk genes in the development of disease. For this purpose, expanding the AD-associated gene set is necessary. In past research, the prediction methods for AD related genes has been limited in their exploration of the target genome regions. We here present a genome-wide method for AD candidate genes predictions. We present a machine learning approach (SVM), based upon integrating gene expression data with human brain-specific gene network data, to discover the full spectrum of AD genes across the whole genome. We classified AD candidate genes with an accuracy and the area under the receiver operating characteristic (ROC) curve of 84.56% and 94%. Our approach provides a supplement for the spectrum of AD-associated genes extracted from more than 20,000 genes in a genome wide scale. In this study, we have elucidated the whole-genome spectrum of AD, using a machine learning approach. Through this method, we expect for the candidate gene catalogue to provide a more comprehensive annotation of AD for researchers.

  3. Use of Computational Functional Genomics in Drug Discovery and Repurposing for Analgesic Indications.

    PubMed

    Lötsch, Jörn; Kringel, Dario

    2018-06-01

    The novel research area of functional genomics investigates biochemical, cellular, or physiological properties of gene products with the goal of understanding the relationship between the genome and the phenotype. These developments have made analgesic drug research a data-rich discipline mastered only by making use of parallel developments in computer science, including the establishment of knowledge bases, mining methods for big data, machine-learning, and artificial intelligence, (Table ) which will be exemplarily introduced in the following. © 2018 The Authors Clinical Pharmacology & Therapeutics published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

  4. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique.

    PubMed

    Chechetkin, V R; Lobzin, V V

    2017-08-07

    Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Life in the fast lane for protein crystallization and X-ray crystallography

    NASA Technical Reports Server (NTRS)

    Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.

    2005-01-01

    The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high-rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today's high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).

  6. Life in the Fast Lane for Protein Crystallization and X-Ray Crystallography

    NASA Technical Reports Server (NTRS)

    Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.

    2004-01-01

    The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today s high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).

  7. Development and evaluation of a genomics training program for community health workers in Texas.

    PubMed

    Chen, Lei-Shih; Zhao, Shixi; Stelzig, Donaji; Dhar, Shweta U; Eble, Tanya; Yeh, Yu-Chen; Kwok, Oi-Man

    2018-01-04

    PurposeGenomics services have the potential to reduce incidence and mortality of diseases by providing individualized, family health history (FHH)-based prevention strategies to clients. These services may benefit from the involvement of community health workers (CHWs) in the provision of FHH-based genomics education and services, as CHWs are frontline public health workers and lay health educators, who share similar ethnicities, languages, socioeconomic statuses, and life experiences with the communities they serve. We developed, implemented, and evaluated the FHH-based genomics training program for CHWs.MethodsThis theory- and evidence-based FHH-focused genomics curriculum was developed by an interdisciplinary team. Full-day workshops in English and Spanish were delivered to 145 Texas CHWs (91.6% were Hispanic/black). Preworkshop, postworkshop, and 3-month follow-up data were collected.ResultsCHWs significantly improved their attitudes, intention, self-efficacy, and knowledge regarding adopting FHH-based genomics into their practice after the workshops. At 3-month follow-up, these scores remained higher, and there was a significant increase in CHWs' genomics practices.ConclusionThis FHH-based genomics training successfully educated Texas CHWs, and the outcomes were promising. Dissemination of training to CHWs in and outside of Texas is needed to promote better access to and delivery of personalized genomics services for the lay and underserved communities.GENETICS in MEDICINE advance online publication, 4 January 2018; doi:10.1038/gim.2017.236.

  8. Divergent Development of Hexaploid Triticale by a Wheat – Rye –Psathyrostachys huashanica Trigeneric Hybrid Method

    PubMed Central

    Huang, Juan; Wang, Yujie; Li, Daiyan; Diao, Chengdou; Zhu, Wei; Tang, Yao; Wang, Yi; Fan, Xing; Zeng, Jian; Xu, Lili; Sha, Lina; Zhang, Haiqin; Zhou, Yonghong

    2016-01-01

    Hexaploid triticale is an important forage crop and a promising energy plant. Some forms were previously reported for developing the hexaploid triticale, such as crossing tetraploid wheat or hexaploid wheat with rye, crossing hexaploid triticale and/or hexaploid wheat with octoploid triticale, and spontaneously appearing in the selfed progenies of octoploid triticale. In the present study, we developed an effective method for production of diverse types of hexaploid triticale via wheat—rye—Psathyrostachys huashanica trigeneric hybrid. Genomic in situ hybridization (GISH) and fluorescence in situ hybridization (FISH) karyotyping revealed that D genome chromosomes were completely eliminated and the whole A, B, and R genome chromosomes were retained in three lines. More interestingly, the composite genome of the line K14-489-2 consisted of complete A and B genomes and chromosomes 1D, 2R, 3R, 4R, 5R, 6R, and 7R, that of line K14-491-2 was 12 A-genome (1A-6A), 14 B-genome (1B-7B), 12 R-genome (1R-3R, 5R-7R), and chromosomes 1D and 3D, and that of the line K14-547-1 had 26A/B and 14R chromosomes, plus one pair of centric 6BL/2DS translocations. This finding implies that some of D genome chromosomes can be spontaneously and stably incorporated into the hexaploid triticale. Additionally, a variety of high-molecular-weight glutenin subunits (HMW-GS) compositions were detected in the six hexaploid triticale lines, respectively. Besides, compared with its recurrent triticale parent Zhongsi828, these lines showed high level of resistance to stripe rust (Puccinia striiformis f. sp. tritici, Pst) pathogens prevalent in China, including V26/Gui 22. These new hexaploid triticales not only enhanced diversification of triticale but also could be utilized as valuable germplasm for wheat improvement. PMID:27182983

  9. Large protein as a potential target for use in rabies diagnostics.

    PubMed

    Santos Katz, I S; Dias, M H; Lima, I F; Chaves, L B; Ribeiro, O G; Scheffer, K C; Iwai, L K

    Rabies is a zoonotic viral disease that remains a serious threat to public health worldwide. The rabies lyssavirus (RABV) genome encodes five structural proteins, multifunctional and significant for pathogenicity. The large protein (L) presents well-conserved genomic regions, which may be a good alternative to generate informative datasets for development of new methods for rabies diagnosis. This paper describes the development of a technique for the identification of L protein in several RABV strains from different hosts, demonstrating that MS-based proteomics is a potential method for antigen identification and a good alternative for rabies diagnosis.

  10. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms.

    PubMed

    Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-09-15

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. Copyright © 2015 Money et al.

  11. Efficient Genome Editing in Induced Pluripotent Stem Cells with Engineered Nucleases In Vitro.

    PubMed

    Termglinchan, Vittavat; Seeger, Timon; Chen, Caressa; Wu, Joseph C; Karakikes, Ioannis

    2017-01-01

    Precision genome engineering is rapidly advancing the application of the induced pluripotent stem cells (iPSCs) technology for in vitro disease modeling of cardiovascular diseases. Targeted genome editing using engineered nucleases is a powerful tool that allows for reverse genetics, genome engineering, and targeted transgene integration experiments to be performed in a precise and predictable manner. However, nuclease-mediated homologous recombination is an inefficient process. Herein, we describe the development of an optimized method combining site-specific nucleases and the piggyBac transposon system for "seamless" genome editing in pluripotent stem cells with high efficiency and fidelity in vitro.

  12. A Syst-OMICS Approach to Ensuring Food Safety and Reducing the Economic Burden of Salmonellosis.

    PubMed

    Emond-Rheault, Jean-Guillaume; Jeukens, Julie; Freschi, Luca; Kukavica-Ibrulj, Irena; Boyle, Brian; Dupont, Marie-Josée; Colavecchio, Anna; Barrere, Virginie; Cadieux, Brigitte; Arya, Gitanjali; Bekal, Sadjia; Berry, Chrystal; Burnett, Elton; Cavestri, Camille; Chapin, Travis K; Crouse, Alanna; Daigle, France; Danyluk, Michelle D; Delaquis, Pascal; Dewar, Ken; Doualla-Bell, Florence; Fliss, Ismail; Fong, Karen; Fournier, Eric; Franz, Eelco; Garduno, Rafael; Gill, Alexander; Gruenheid, Samantha; Harris, Linda; Huang, Carol B; Huang, Hongsheng; Johnson, Roger; Joly, Yann; Kerhoas, Maud; Kong, Nguyet; Lapointe, Gisèle; Larivière, Line; Loignon, Stéphanie; Malo, Danielle; Moineau, Sylvain; Mottawea, Walid; Mukhopadhyay, Kakali; Nadon, Céline; Nash, John; Ngueng Feze, Ida; Ogunremi, Dele; Perets, Ann; Pilar, Ana V; Reimer, Aleisha R; Robertson, James; Rohde, John; Sanderson, Kenneth E; Song, Lingqiao; Stephan, Roger; Tamber, Sandeep; Thomassin, Paul; Tremblay, Denise; Usongo, Valentine; Vincent, Caroline; Wang, Siyun; Weadge, Joel T; Wiedmann, Martin; Wijnands, Lucas; Wilson, Emily D; Wittum, Thomas; Yoshida, Catherine; Youfsi, Khadija; Zhu, Lei; Weimer, Bart C; Goodridge, Lawrence; Levesque, Roger C

    2017-01-01

    The Salmonella Syst-OMICS consortium is sequencing 4,500 Salmonella genomes and building an analysis pipeline for the study of Salmonella genome evolution, antibiotic resistance and virulence genes. Metadata, including phenotypic as well as genomic data, for isolates of the collection are provided through the Salmonella Foodborne Syst-OMICS database (SalFoS), at https://salfos.ibis.ulaval.ca/. Here, we present our strategy and the analysis of the first 3,377 genomes. Our data will be used to draw potential links between strains found in fresh produce, humans, animals and the environment. The ultimate goals are to understand how Salmonella evolves over time, improve the accuracy of diagnostic methods, develop control methods in the field, and identify prognostic markers for evidence-based decisions in epidemiology and surveillance.

  13. Quantitative high-resolution genomic analysis of single cancer cells.

    PubMed

    Hannemann, Juliane; Meyer-Staeckling, Sönke; Kemming, Dirk; Alpers, Iris; Joosse, Simon A; Pospisil, Heike; Kurtz, Stefan; Görndt, Jennifer; Püschel, Klaus; Riethdorf, Sabine; Pantel, Klaus; Brandt, Burkhard

    2011-01-01

    During cancer progression, specific genomic aberrations arise that can determine the scope of the disease and can be used as predictive or prognostic markers. The detection of specific gene amplifications or deletions in single blood-borne or disseminated tumour cells that may give rise to the development of metastases is of great clinical interest but technically challenging. In this study, we present a method for quantitative high-resolution genomic analysis of single cells. Cells were isolated under permanent microscopic control followed by high-fidelity whole genome amplification and subsequent analyses by fine tiling array-CGH and qPCR. The assay was applied to single breast cancer cells to analyze the chromosomal region centred by the therapeutical relevant EGFR gene. This method allows precise quantitative analysis of copy number variations in single cell diagnostics.

  14. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    PubMed

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.

  15. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

    PubMed Central

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096

  16. Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

    PubMed

    Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

    2014-01-01

    Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.

  17. Recent advances in functional perturbation and genome editing techniques in studying sea urchin development.

    PubMed

    Cui, Miao; Lin, Che-Yi; Su, Yi-Hsien

    2017-09-01

    Studies on the gene regulatory networks (GRNs) of sea urchin embryos have provided a basic understanding of the molecular mechanisms controlling animal development. The causal links in GRNs have been verified experimentally through perturbation of gene functions. Microinjection of antisense morpholino oligonucleotides (MOs) into the egg is the most widely used approach for gene knockdown in sea urchin embryos. The modification of MOs into a membrane-permeable form (vivo-MOs) has allowed gene knockdown at later developmental stages. Recent advances in genome editing tools, such as zinc-finger nucleases, transcription activator-like effector-based nucleases and the clustered regularly interspaced short palindromic repeat/clustered regularly interspaced short palindromic repeat-associated protein 9 (CRISPR/Cas9) system, have provided methods for gene knockout in sea urchins. Here, we review the use of vivo-MOs and genome editing tools in sea urchin studies since the publication of its genome in 2006. Various applications of the CRISPR/Cas9 system and their potential in studying sea urchin development are also discussed. These new tools will provide more sophisticated experimental methods for studying sea urchin development. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  18. Pattern Analysis and Decision Support for Cancer through Clinico-Genomic Profiles

    NASA Astrophysics Data System (ADS)

    Exarchos, Themis P.; Giannakeas, Nikolaos; Goletsis, Yorgos; Papaloukas, Costas; Fotiadis, Dimitrios I.

    Advances in genome technology are playing a growing role in medicine and healthcare. With the development of new technologies and opportunities for large-scale analysis of the genome, genomic data have a clear impact on medicine. Cancer prognostics and therapeutics are among the first major test cases for genomic medicine, given that all types of cancer are related with genomic instability. In this paper we present a novel system for pattern analysis and decision support in cancer. The system integrates clinical data from electronic health records and genomic data. Pattern analysis and data mining methods are applied to these integrated data and the discovered knowledge is used for cancer decision support. Through this integration, conclusions can be drawn for early diagnosis, staging and cancer treatment.

  19. Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.

    PubMed

    Oyola, Samuel O; Otto, Thomas D; Gu, Yong; Maslen, Gareth; Manske, Magnus; Campino, Susana; Turner, Daniel J; Macinnis, Bronwyn; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A

    2012-01-03

    Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.

  20. redGEM: Systematic reduction and analysis of genome-scale metabolic reconstructions for development of consistent core metabolic models

    PubMed Central

    Ataman, Meric

    2017-01-01

    Genome-scale metabolic reconstructions have proven to be valuable resources in enhancing our understanding of metabolic networks as they encapsulate all known metabolic capabilities of the organisms from genes to proteins to their functions. However the complexity of these large metabolic networks often hinders their utility in various practical applications. Although reduced models are commonly used for modeling and in integrating experimental data, they are often inconsistent across different studies and laboratories due to different criteria and detail, which can compromise transferability of the findings and also integration of experimental data from different groups. In this study, we have developed a systematic semi-automatic approach to reduce genome-scale models into core models in a consistent and logical manner focusing on the central metabolism or subsystems of interest. The method minimizes the loss of information using an approach that combines graph-based search and optimization methods. The resulting core models are shown to be able to capture key properties of the genome-scale models and preserve consistency in terms of biomass and by-product yields, flux and concentration variability and gene essentiality. The development of these “consistently-reduced” models will help to clarify and facilitate integration of different experimental data to draw new understanding that can be directly extendable to genome-scale models. PMID:28727725

  1. Assembly and Multiplex Genome Integration of Metabolic Pathways in Yeast Using CasEMBLR.

    PubMed

    Jakočiūnas, Tadas; Jensen, Emil D; Jensen, Michael K; Keasling, Jay D

    2018-01-01

    Genome integration is a vital step for implementing large biochemical pathways to build a stable microbial cell factory. Although traditional strain construction strategies are well established for the model organism Saccharomyces cerevisiae, recent advances in CRISPR/Cas9-mediated genome engineering allow much higher throughput and robustness in terms of strain construction. In this chapter, we describe CasEMBLR, a highly efficient and marker-free genome engineering method for one-step integration of in vivo assembled expression cassettes in multiple genomic sites simultaneously. CasEMBLR capitalizes on the CRISPR/Cas9 technology to generate double-strand breaks in genomic loci, thus prompting native homologous recombination (HR) machinery to integrate exogenously derived homology templates. As proof-of-principle for microbial cell factory development, CasEMBLR was used for one-step assembly and marker-free integration of the carotenoid pathway from 15 exogenously supplied DNA parts into three targeted genomic loci. As a second proof-of-principle, a total of ten DNA parts were assembled and integrated in two genomic loci to construct a tyrosine production strain, and at the same time knocking out two genes. This new method complements and improves the field of genome engineering in S. cerevisiae by providing a more flexible platform for rapid and precise strain building.

  2. Development of genomic SSR markers for fingerprinting lettuce (Lactuca sativa L.) cultivars and mapping genes.

    PubMed

    Rauscher, Gilda; Simko, Ivan

    2013-01-22

    Lettuce (Lactuca sativa L.) is the major crop from the group of leafy vegetables. Several types of molecular markers were developed that are effectively used in lettuce breeding and genetic studies. However only a very limited number of microsattelite-based markers are publicly available. We have employed the method of enriched microsatellite libraries to develop 97 genomic SSR markers. Testing of newly developed markers on a set of 36 Lactuca accession (33 L. sativa, and one of each L. serriola L., L. saligna L., and L. virosa L.) revealed that both the genetic heterozygosity (UHe = 0.56) and the number of loci per SSR (Na = 5.50) are significantly higher for genomic SSR markers than for previously developed EST-based SSR markers (UHe = 0.32, Na = 3.56). Fifty-four genomic SSR markers were placed on the molecular linkage map of lettuce. Distribution of markers in the genome appeared to be random, with the exception of possible cluster on linkage group 6. Any combination of 32 genomic SSRs was able to distinguish genotypes of all 36 accessions. Fourteen of newly developed SSR markers originate from fragments with high sequence similarity to resistance gene candidates (RGCs) and RGC pseudogenes. Analysis of molecular variance (AMOVA) of L. sativa accessions showed that approximately 3% of genetic diversity was within accessions, 79% among accessions, and 18% among horticultural types. The newly developed genomic SSR markers were added to the pool of previously developed EST-SSRs markers. These two types of SSR-based markers provide useful tools for lettuce cultivar fingerprinting, development of integrated molecular linkage maps, and mapping of genes.

  3. Development of genomic SSR markers for fingerprinting lettuce (Lactuca sativa L.) cultivars and mapping genes

    PubMed Central

    2013-01-01

    Background Lettuce (Lactuca sativa L.) is the major crop from the group of leafy vegetables. Several types of molecular markers were developed that are effectively used in lettuce breeding and genetic studies. However only a very limited number of microsattelite-based markers are publicly available. We have employed the method of enriched microsatellite libraries to develop 97 genomic SSR markers. Results Testing of newly developed markers on a set of 36 Lactuca accession (33 L. sativa, and one of each L. serriola L., L. saligna L., and L. virosa L.) revealed that both the genetic heterozygosity (UHe = 0.56) and the number of loci per SSR (Na = 5.50) are significantly higher for genomic SSR markers than for previously developed EST-based SSR markers (UHe = 0.32, Na = 3.56). Fifty-four genomic SSR markers were placed on the molecular linkage map of lettuce. Distribution of markers in the genome appeared to be random, with the exception of possible cluster on linkage group 6. Any combination of 32 genomic SSRs was able to distinguish genotypes of all 36 accessions. Fourteen of newly developed SSR markers originate from fragments with high sequence similarity to resistance gene candidates (RGCs) and RGC pseudogenes. Analysis of molecular variance (AMOVA) of L. sativa accessions showed that approximately 3% of genetic diversity was within accessions, 79% among accessions, and 18% among horticultural types. Conclusions The newly developed genomic SSR markers were added to the pool of previously developed EST-SSRs markers. These two types of SSR-based markers provide useful tools for lettuce cultivar fingerprinting, development of integrated molecular linkage maps, and mapping of genes. PMID:23339733

  4. The emergence of commercial genomics: analysis of the rise of a biotechnology subsector during the Human Genome Project, 1990 to 2004

    PubMed Central

    2013-01-01

    Background Development of the commercial genomics sector within the biotechnology industry relied heavily on the scientific commons, public funding, and technology transfer between academic and industrial research. This study tracks financial and intellectual property data on genomics firms from 1990 through 2004, thus following these firms as they emerged in the era of the Human Genome Project and through the 2000 to 2001 market bubble. Methods A database was created based on an early survey of genomics firms, which was expanded using three web-based biotechnology services, scientific journals, and biotechnology trade and technical publications. Financial data for publicly traded firms was collected through the use of four databases specializing in firm financials. Patent searches were conducted using firm names in the US Patent and Trademark Office website search engine and the DNA Patent Database. Results A biotechnology subsector of genomics firms emerged in parallel to the publicly funded Human Genome Project. Trends among top firms show that hiring, capital improvement, and research and development expenditures continued to grow after a 2000 to 2001 bubble. The majority of firms are small businesses with great diversity in type of research and development, products, and services provided. Over half the public firms holding patents have the majority of their intellectual property portfolio in DNA-based patents. Conclusions These data allow estimates of investment, research and development expenditures, and jobs that paralleled the rise of genomics as a sector within biotechnology between 1990 and 2004. PMID:24050173

  5. Characterizing genomic alterations in cancer by complementary functional associations.

    PubMed

    Kim, Jong Wook; Botvinnik, Olga B; Abudayyeh, Omar; Birger, Chet; Rosenbluh, Joseph; Shrestha, Yashaswi; Abazeed, Mohamed E; Hammerman, Peter S; DiCara, Daniel; Konieczkowski, David J; Johannessen, Cory M; Liberzon, Arthur; Alizad-Rahvar, Amir Reza; Alexe, Gabriela; Aguirre, Andrew; Ghandi, Mahmoud; Greulich, Heidi; Vazquez, Francisca; Weir, Barbara A; Van Allen, Eliezer M; Tsherniak, Aviad; Shao, Diane D; Zack, Travis I; Noble, Michael; Getz, Gad; Beroukhim, Rameen; Garraway, Levi A; Ardakani, Masoud; Romualdi, Chiara; Sales, Gabriele; Barbie, David A; Boehm, Jesse S; Hahn, William C; Mesirov, Jill P; Tamayo, Pablo

    2016-05-01

    Systematic efforts to sequence the cancer genome have identified large numbers of mutations and copy number alterations in human cancers. However, elucidating the functional consequences of these variants, and their interactions to drive or maintain oncogenic states, remains a challenge in cancer research. We developed REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene dependency of oncogenic pathways or sensitivity to a drug treatment. We used REVEALER to uncover complementary genomic alterations associated with the transcriptional activation of β-catenin and NRF2, MEK-inhibitor sensitivity, and KRAS dependency. REVEALER successfully identified both known and new associations, demonstrating the power of combining functional profiles with extensive characterization of genomic alterations in cancer genomes.

  6. Computational intelligence in bioinformatics: SNP/haplotype data in genetic association study for common diseases.

    PubMed

    Kelemen, Arpad; Vasilakos, Athanasios V; Liang, Yulan

    2009-09-01

    Comprehensive evaluation of common genetic variations through association of single-nucleotide polymorphism (SNP) structure with common complex disease in the genome-wide scale is currently a hot area in human genome research due to the recent development of the Human Genome Project and HapMap Project. Computational science, which includes computational intelligence (CI), has recently become the third method of scientific enquiry besides theory and experimentation. There have been fast growing interests in developing and applying CI in disease mapping using SNP and haplotype data. Some of the recent studies have demonstrated the promise and importance of CI for common complex diseases in genomic association study using SNP/haplotype data, especially for tackling challenges, such as gene-gene and gene-environment interactions, and the notorious "curse of dimensionality" problem. This review provides coverage of recent developments of CI approaches for complex diseases in genetic association study with SNP/haplotype data.

  7. Identification of the maize gravitropism gene lazy plant1 by a transposon-tagging genome resequencing strategy.

    PubMed

    Howard, Thomas P; Hayward, Andrew P; Tordillos, Anthony; Fragoso, Christopher; Moreno, Maria A; Tohme, Joe; Kausch, Albert P; Mottinger, John P; Dellaporta, Stephen L

    2014-01-01

    Since their initial discovery, transposons have been widely used as mutagens for forward and reverse genetic screens in a range of organisms. The problems of high copy number and sequence divergence among related transposons have often limited the efficiency at which tagged genes can be identified. A method was developed to identity the locations of Mutator (Mu) transposons in the Zea mays genome using a simple enrichment method combined with genome resequencing to identify transposon junction fragments. The sequencing library was prepared from genomic DNA by digesting with a restriction enzyme that cuts within a perfectly conserved motif of the Mu terminal inverted repeats (TIR). Paired-end reads containing Mu TIR sequences were computationally identified and chromosomal sequences flanking the transposon were mapped to the maize reference genome. This method has been used to identify Mu insertions in a number of alleles and to isolate the previously unidentified lazy plant1 (la1) gene. The la1 gene is required for the negatively gravitropic response of shoots and mutant plants lack the ability to sense gravity. Using bioinformatic and fluorescence microscopy approaches, we show that the la1 gene encodes a cell membrane and nuclear localized protein. Our Mu-Taq method is readily adaptable to identify the genomic locations of any insertion of a known sequence in any organism using any sequencing platform.

  8. Identification of the Maize Gravitropism Gene lazy plant1 by a Transposon-Tagging Genome Resequencing Strategy

    PubMed Central

    Howard, Thomas P.; Hayward, Andrew P.; Tordillos, Anthony; Fragoso, Christopher; Moreno, Maria A.; Tohme, Joe; Kausch, Albert P.; Mottinger, John P.; Dellaporta, Stephen L.

    2014-01-01

    Since their initial discovery, transposons have been widely used as mutagens for forward and reverse genetic screens in a range of organisms. The problems of high copy number and sequence divergence among related transposons have often limited the efficiency at which tagged genes can be identified. A method was developed to identity the locations of Mutator (Mu) transposons in the Zea mays genome using a simple enrichment method combined with genome resequencing to identify transposon junction fragments. The sequencing library was prepared from genomic DNA by digesting with a restriction enzyme that cuts within a perfectly conserved motif of the Mu terminal inverted repeats (TIR). Paired-end reads containing Mu TIR sequences were computationally identified and chromosomal sequences flanking the transposon were mapped to the maize reference genome. This method has been used to identify Mu insertions in a number of alleles and to isolate the previously unidentified lazy plant1 (la1) gene. The la1 gene is required for the negatively gravitropic response of shoots and mutant plants lack the ability to sense gravity. Using bioinformatic and fluorescence microscopy approaches, we show that the la1 gene encodes a cell membrane and nuclear localized protein. Our Mu-Taq method is readily adaptable to identify the genomic locations of any insertion of a known sequence in any organism using any sequencing platform. PMID:24498020

  9. DNA-based identification of spices: DNA isolation, whole genome amplification, and polymerase chain reaction.

    PubMed

    Focke, Felix; Haase, Ilka; Fischer, Markus

    2011-01-26

    Usually spices are identified morphologically using simple methods like magnifying glasses or microscopic instruments. On the other hand, molecular biological methods like the polymerase chain reaction (PCR) enable an accurate and specific detection also in complex matrices. Generally, the origins of spices are plants with diverse genetic backgrounds and relationships. The processing methods used for the production of spices are complex and individual. Consequently, the development of a reliable DNA-based method for spice analysis is a challenging intention. However, once established, this method will be easily adapted to less difficult food matrices. In the current study, several alternative methods for the isolation of DNA from spices have been developed and evaluated in detail with regard to (i) its purity (photometric), (ii) yield (fluorimetric methods), and (iii) its amplifiability (PCR). Whole genome amplification methods were used to preamplify isolates to improve the ratio between amplifiable DNA and inhibiting substances. Specific primer sets were designed, and the PCR conditions were optimized to detect 18 spices selectively. Assays of self-made spice mixtures were performed to proof the applicability of the developed methods.

  10. Deciphering the Epigenetic Code: An Overview of DNA Methylation Analysis Methods

    PubMed Central

    Umer, Muhammad

    2013-01-01

    Abstract Significance: Methylation of cytosine in DNA is linked with gene regulation, and this has profound implications in development, normal biology, and disease conditions in many eukaryotic organisms. A wide range of methods and approaches exist for its identification, quantification, and mapping within the genome. While the earliest approaches were nonspecific and were at best useful for quantification of total methylated cytosines in the chunk of DNA, this field has seen considerable progress and development over the past decades. Recent Advances: Methods for DNA methylation analysis differ in their coverage and sensitivity, and the method of choice depends on the intended application and desired level of information. Potential results include global methyl cytosine content, degree of methylation at specific loci, or genome-wide methylation maps. Introduction of more advanced approaches to DNA methylation analysis, such as microarray platforms and massively parallel sequencing, has brought us closer to unveiling the whole methylome. Critical Issues: Sensitive quantification of DNA methylation from degraded and minute quantities of DNA and high-throughput DNA methylation mapping of single cells still remain a challenge. Future Directions: Developments in DNA sequencing technologies as well as the methods for identification and mapping of 5-hydroxymethylcytosine are expected to augment our current understanding of epigenomics. Here we present an overview of methodologies available for DNA methylation analysis with special focus on recent developments in genome-wide and high-throughput methods. While the application focus relates to cancer research, the methods are equally relevant to broader issues of epigenetics and redox science in this special forum. Antioxid. Redox Signal. 18, 1972–1986. PMID:23121567

  11. Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.

    PubMed

    Martin, Guillaume; Baurens, Franc-Christophe; Droc, Gaëtan; Rouard, Mathieu; Cenci, Alberto; Kilian, Andrzej; Hastie, Alex; Doležel, Jaroslav; Aury, Jean-Marc; Alberti, Adriana; Carreel, Françoise; D'Hont, Angélique

    2016-03-16

    Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.

  12. A versatile genome-scale PCR-based pipeline for high-definition DNA FISH.

    PubMed

    Bienko, Magda; Crosetto, Nicola; Teytelman, Leonid; Klemm, Sandy; Itzkovitz, Shalev; van Oudenaarden, Alexander

    2013-02-01

    We developed a cost-effective genome-scale PCR-based method for high-definition DNA FISH (HD-FISH). We visualized gene loci with diffraction-limited resolution, chromosomes as spot clusters and single genes together with transcripts by combining HD-FISH with single-molecule RNA FISH. We provide a database of over 4.3 million primer pairs targeting the human and mouse genomes that is readily usable for rapid and flexible generation of probes.

  13. Pathway and network analysis of cancer genomes.

    PubMed

    Creixell, Pau; Reimand, Jüri; Haider, Syed; Wu, Guanming; Shibata, Tatsuhiro; Vazquez, Miguel; Mustonen, Ville; Gonzalez-Perez, Abel; Pearson, John; Sander, Chris; Raphael, Benjamin J; Marks, Debora S; Ouellette, B F Francis; Valencia, Alfonso; Bader, Gary D; Boutros, Paul C; Stuart, Joshua M; Linding, Rune; Lopez-Bigas, Nuria; Stein, Lincoln D

    2015-07-01

    Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations.

  14. A Selective Review of Group Selection in High-Dimensional Models

    PubMed Central

    Huang, Jian; Breheny, Patrick; Ma, Shuangge

    2013-01-01

    Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707

  15. The Functional Genomics Network in the evolution of biological text mining over the past decade.

    PubMed

    Blaschke, Christian; Valencia, Alfonso

    2013-03-25

    Different programs of The European Science Foundation (ESF) have contributed significantly to connect researchers in Europe and beyond through several initiatives. This support was particularly relevant for the development of the areas related with extracting information from papers (text-mining) because it supported the field in its early phases long before it was recognized by the community. We review the historical development of text mining research and how it was introduced in bioinformatics. Specific applications in (functional) genomics are described like it's integration in genome annotation pipelines and the support to the analysis of high-throughput genomics experimental data, and we highlight the activities of evaluation of methods and benchmarking for which the ESF programme support was instrumental. Copyright © 2013 Elsevier B.V. All rights reserved.

  16. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  17. How and why should we implement genomics into conservation?

    PubMed Central

    McMahon, Barry J; Teeling, Emma C; Höglund, Jacob

    2014-01-01

    Conservation genetics has provided important information into the dynamics of endangered populations. The rapid development of genomic methods has posed an important question, namely where do genetics and genomics sit in relation to their application in the conservation of species? Although genetics can answer a number of relevant questions related to conservation, the argument for the application of genomics is not yet fully exploited. Here, we explore the transition and rationale for the move from genetic to genomic research in conservation biology and the utility of such research. We explore the idea of a ‘conservation prior’ and how this can be determined by genomic data and used in the management of populations. We depict three different conservation scenarios and describe how genomic data can drive management action in each situation. We conclude that the most effective applications of genomics will be to inform stakeholders with the aim of avoiding ‘emergency room conservation’. PMID:25553063

  18. Complex multi-enhancer contacts captured by Genome Architecture Mapping (GAM)

    PubMed Central

    Beagrie, Robert A.; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C.A.; Chotalia, Mita; Xie, Sheila Q.; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R.; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A.W.; Nicodemi, Mario; Pombo, Ana

    2017-01-01

    Summary The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. We developed a novel genome-wide method, Genome Architecture Mapping (GAM), for measuring chromatin contacts, and other features of three-dimensional chromatin topology, based on sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify an enrichment for specific interactions between active genes and enhancers across very large genomic distances, using a mathematical model ‘SLICE’ (Statistical Inference of Co-segregation). GAM also reveals an abundance of three-way contacts genome-wide, especially between regions that are highly transcribed or contain super-enhancers, highlighting a previously inaccessible complexity in genome architecture and a major role for gene-expression specific contacts in organizing the genome in mammalian nuclei. PMID:28273065

  19. Golden Gate Assembly of CRISPR gRNA expression array for simultaneously targeting multiple genes.

    PubMed

    Vad-Nielsen, Johan; Lin, Lin; Bolund, Lars; Nielsen, Anders Lade; Luo, Yonglun

    2016-11-01

    The engineered CRISPR/Cas9 technology has developed as the most efficient and broadly used genome editing tool. However, simultaneously targeting multiple genes (or genomic loci) in the same individual cells using CRISPR/Cas9 remain one technical challenge. In this article, we have developed a Golden Gate Assembly method for the generation of CRISPR gRNA expression arrays, thus enabling simultaneous gene targeting. Using this method, the generation of CRISPR gRNA expression array can be accomplished in 2 weeks, and contains up to 30 gRNA expression cassettes. We demonstrated in the study that simultaneously targeting 10 genomic loci or simultaneously inhibition of multiple endogenous genes could be achieved using the multiplexed gRNA expression array vector in human cells. The complete set of plasmids is available through the non-profit plasmid repository Addgene.

  20. Completion of the swine genome will simplify the production of swine as a large animal biomedical model

    PubMed Central

    2012-01-01

    Background Anatomic and physiological similarities to the human make swine an excellent large animal model for human health and disease. Methods Cloning from a modified somatic cell, which can be determined in cells prior to making the animal, is the only method available for the production of targeted modifications in swine. Results Since some strains of swine are similar in size to humans, technologies that have been developed for swine can be readily adapted to humans and vice versa. Here the importance of swine as a biomedical model, current technologies to produce genetically enhanced swine, current biomedical models, and how the completion of the swine genome will promote swine as a biomedical model are discussed. Conclusions The completion of the swine genome will enhance the continued use and development of swine as models of human health, syndromes and conditions. PMID:23151353

  1. Quantitative High-Resolution Genomic Analysis of Single Cancer Cells

    PubMed Central

    Hannemann, Juliane; Meyer-Staeckling, Sönke; Kemming, Dirk; Alpers, Iris; Joosse, Simon A.; Pospisil, Heike; Kurtz, Stefan; Görndt, Jennifer; Püschel, Klaus; Riethdorf, Sabine; Pantel, Klaus; Brandt, Burkhard

    2011-01-01

    During cancer progression, specific genomic aberrations arise that can determine the scope of the disease and can be used as predictive or prognostic markers. The detection of specific gene amplifications or deletions in single blood-borne or disseminated tumour cells that may give rise to the development of metastases is of great clinical interest but technically challenging. In this study, we present a method for quantitative high-resolution genomic analysis of single cells. Cells were isolated under permanent microscopic control followed by high-fidelity whole genome amplification and subsequent analyses by fine tiling array-CGH and qPCR. The assay was applied to single breast cancer cells to analyze the chromosomal region centred by the therapeutical relevant EGFR gene. This method allows precise quantitative analysis of copy number variations in single cell diagnostics. PMID:22140428

  2. Identifying candidate drivers of drug response in heterogeneous cancer by mining high throughput genomics data.

    PubMed

    Nabavi, Sheida

    2016-08-15

    With advances in technologies, huge amounts of multiple types of high-throughput genomics data are available. These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases, such as cancer. Integrating, analyzing, and interpreting big and noisy genomics data to obtain biologically meaningful results, however, remains highly challenging. Mining genomics datasets by utilizing advanced computational methods can help to address these issues. To facilitate the identification of a short list of biologically meaningful genes as candidate drivers of anti-cancer drug resistance from an enormous amount of heterogeneous data, we employed statistical machine-learning techniques and integrated genomics datasets. We developed a computational method that integrates gene expression, somatic mutation, and copy number aberration data of sensitive and resistant tumors. In this method, an integrative method based on module network analysis is applied to identify potential driver genes. This is followed by cross-validation and a comparison of the results of sensitive and resistance groups to obtain the final list of candidate biomarkers. We applied this method to the ovarian cancer data from the cancer genome atlas. The final result contains biologically relevant genes, such as COL11A1, which has been reported as a cis-platinum resistant biomarker for epithelial ovarian carcinoma in several recent studies. The described method yields a short list of aberrant genes that also control the expression of their co-regulated genes. The results suggest that the unbiased data driven computational method can identify biologically relevant candidate biomarkers. It can be utilized in a wide range of applications that compare two conditions with highly heterogeneous datasets.

  3. GAPIT: genome association and prediction integrated tool.

    PubMed

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  4. Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    PubMed

    Uchiyama, Ikuo

    2008-10-31

    Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  5. Harnessing CRISPR-Cas systems for bacterial genome editing.

    PubMed

    Selle, Kurt; Barrangou, Rodolphe

    2015-04-01

    Manipulation of genomic sequences facilitates the identification and characterization of key genetic determinants in the investigation of biological processes. Genome editing via clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) constitutes a next-generation method for programmable and high-throughput functional genomics. CRISPR-Cas systems are readily reprogrammed to induce sequence-specific DNA breaks at target loci, resulting in fixed mutations via host-dependent DNA repair mechanisms. Although bacterial genome editing is a relatively unexplored and underrepresented application of CRISPR-Cas systems, recent studies provide valuable insights for the widespread future implementation of this technology. This review summarizes recent progress in bacterial genome editing and identifies fundamental genetic and phenotypic outcomes of CRISPR targeting in bacteria, in the context of tool development, genome homeostasis, and DNA repair. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. A robust, sensitive assay for genomic uracil determination by LC/MS/MS reveals lower levels than previously reported.

    PubMed

    Galashevskaya, Anastasia; Sarno, Antonio; Vågbø, Cathrine B; Aas, Per A; Hagen, Lars; Slupphaug, Geir; Krokan, Hans E

    2013-09-01

    Considerable progress has been made in understanding the origins of genomic uracil and its role in genome stability and host defense; however, the main question concerning the basal level of uracil in DNA remains disputed. Results from assays designed to quantify genomic uracil vary by almost three orders of magnitude. To address the issues leading to this inconsistency, we explored possible shortcomings with existing methods and developed a sensitive LC/MS/MS-based method for the absolute quantification of genomic 2'-deoxyuridine (dUrd). To this end, DNA was enzymatically hydrolyzed to 2'-deoxyribonucleosides and dUrd was purified in a preparative HPLC step and analyzed by LC/MS/MS. The standard curve was linear over four orders of magnitude with a quantification limit of 5 fmol dUrd. Control samples demonstrated high inter-experimental accuracy (94.3%) and precision (CV 9.7%). An alternative method that employed UNG2 to excise uracil from DNA for LC/MS/MS analysis gave similar results, but the intra-assay variability was significantly greater. We quantified genomic dUrd in Ung(+/+) and Ung(-/-) mouse embryonic fibroblasts and human lymphoblastoid cell lines carrying UNG mutations. DNA-dUrd is 5-fold higher in Ung(-/-) than in Ung(+/+) fibroblasts and 11-fold higher in UNG2 dysfunctional than in UNG2 functional lymphoblastoid cells. We report approximately 400-600 dUrd per human or murine genome in repair-proficient cells, which is lower than results using other methods and suggests that genomic uracil levels may have previously been overestimated. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  7. [Genetic system for maintaining the mitochondrial human genome in yeast Yarrowia lipolytica].

    PubMed

    Isakova, E P; Deryabina, Yu I; Velyakova, A V; Biryukova, J K; Teplova, V V; Shevelev, A B

    2016-01-01

    For the first time, the possibility of maintaining an intact human mitochondrial genome in a heterologous system in the mitochondria of yeast Yarrowia lipolytica is shown. A method for introducing directional changes into the structure of the mitochondrial human genome replicating in Y. lipolytica by an artificially induced ability of yeast mitochondria for homologous recombination is proposed. A method of introducing and using phenotypic selection markers for the presence or absence of defects in genes tRNA-Lys and tRNA-Leu of the mitochondrial genome is developed. The proposed system can be used to correct harmful mutations of the human mitochondrial genome associated with mitochondrial diseases and for preparative amplification of intact mitochondrial DNA with an adjusted sequence in yeast cells. The applicability of the new system for the correction of mutations in the genes of Lys- and Leu-specific tRNAs of the human mitochondrial genome associated with serious and widespread human mitochondrial diseases such as myoclonic epilepsy with lactic acidosis (MELAS) and myoclonic epilepsy with ragged-red fibers (MERRF) is shown.

  8. QUAST: quality assessment tool for genome assemblies

    PubMed Central

    Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay; Tesler, Glenn

    2013-01-01

    Summary: Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers have been developed, but none is yet a recognized benchmark. Further, most existing methods for comparing assemblies are only applicable to new assemblies of finished genomes; the problem of evaluating assemblies of previously unsequenced species has not been adequately considered. Here, we present QUAST—a quality assessment tool for evaluating and comparing genome assemblies. This tool improves on leading assembly comparison software with new ideas and quality metrics. QUAST can evaluate assemblies both with a reference genome, as well as without a reference. QUAST produces many reports, summary tables and plots to help scientists in their research and in their publications. In this study, we used QUAST to compare several genome assemblers on three datasets. QUAST tables and plots for all of them are available in the Supplementary Material, and interactive versions of these reports are on the QUAST website. Availability: http://bioinf.spbau.ru/quast Contact: gurevich@bioinf.spbau.ru Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23422339

  9. Nuclease-mediated genome editing: At the front-line of functional genomics technology.

    PubMed

    Sakuma, Tetsushi; Woltjen, Knut

    2014-01-01

    Genome editing with engineered endonucleases is rapidly becoming a staple method in developmental biology studies. Engineered nucleases permit random or designed genomic modification at precise loci through the stimulation of endogenous double-strand break repair. Homology-directed repair following targeted DNA damage is mediated by co-introduction of a custom repair template, allowing the derivation of knock-out and knock-in alleles in animal models previously refractory to classic gene targeting procedures. Currently there are three main types of customizable site-specific nucleases delineated by the source mechanism of DNA binding that guides nuclease activity to a genomic target: zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeats (CRISPR). Among these genome engineering tools, characteristics such as the ease of design and construction, mechanism of inducing DNA damage, and DNA sequence specificity all differ, making their application complementary. By understanding the advantages and disadvantages of each method, one may make the best choice for their particular purpose. © 2014 The Authors Development, Growth & Differentiation © 2014 Japanese Society of Developmental Biologists.

  10. NCI Workshop Report: Clinical and Computational Requirements for Correlating Imaging Phenotypes with Genomics Signatures.

    PubMed

    Colen, Rivka; Foster, Ian; Gatenby, Robert; Giger, Mary Ellen; Gillies, Robert; Gutman, David; Heller, Matthew; Jain, Rajan; Madabhushi, Anant; Madhavan, Subha; Napel, Sandy; Rao, Arvind; Saltz, Joel; Tatum, James; Verhaak, Roeland; Whitman, Gary

    2014-10-01

    The National Cancer Institute (NCI) Cancer Imaging Program organized two related workshops on June 26-27, 2013, entitled "Correlating Imaging Phenotypes with Genomics Signatures Research" and "Scalable Computational Resources as Required for Imaging-Genomics Decision Support Systems." The first workshop focused on clinical and scientific requirements, exploring our knowledge of phenotypic characteristics of cancer biological properties to determine whether the field is sufficiently advanced to correlate with imaging phenotypes that underpin genomics and clinical outcomes, and exploring new scientific methods to extract phenotypic features from medical images and relate them to genomics analyses. The second workshop focused on computational methods that explore informatics and computational requirements to extract phenotypic features from medical images and relate them to genomics analyses and improve the accessibility and speed of dissemination of existing NIH resources. These workshops linked clinical and scientific requirements of currently known phenotypic and genotypic cancer biology characteristics with imaging phenotypes that underpin genomics and clinical outcomes. The group generated a set of recommendations to NCI leadership and the research community that encourage and support development of the emerging radiogenomics research field to address short-and longer-term goals in cancer research.

  11. A Unified and Comprehensible View of Parametric and Kernel Methods for Genomic Prediction with Application to Rice.

    PubMed

    Jacquin, Laval; Cao, Tuong-Vi; Ahmadi, Nourollah

    2016-01-01

    One objective of this study was to provide readers with a clear and unified understanding of parametric statistical and kernel methods, used for genomic prediction, and to compare some of these in the context of rice breeding for quantitative traits. Furthermore, another objective was to provide a simple and user-friendly R package, named KRMM, which allows users to perform RKHS regression with several kernels. After introducing the concept of regularized empirical risk minimization, the connections between well-known parametric and kernel methods such as Ridge regression [i.e., genomic best linear unbiased predictor (GBLUP)] and reproducing kernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulated so as to show and emphasize the advantage of the kernel "trick" concept, exploited by kernel methods in the context of epistatic genetic architectures, over parametric frameworks used by conventional methods. Some parametric and kernel methods; least absolute shrinkage and selection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHS regression were thereupon compared for their genomic predictive ability in the context of rice breeding using three real data sets. Among the compared methods, RKHS regression and SVR were often the most accurate methods for prediction followed by GBLUP and LASSO. An R function which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression, with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time has been developed. Moreover, a modified version of this function, which allows users to tune kernels for RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.

  12. AmphiBase: A new genomic resource for non-model amphibian species.

    PubMed

    Kwon, Taejoon

    2017-01-01

    More than five thousand genes annotated in the recently published Xenopus laevis and Xenopus tropicalis genomes do not have a candidate orthologous counterpart in other vertebrate species. To determine whether these sequences represent genuine amphibian-specific genes or annotation errors, it is necessary to analyze them alongside sequences from other amphibian species. However, due to large genome sizes and an abundance of repeat sequences, there are limited numbers of gene sequences available from amphibian species other than Xenopus. AmphiBase is a new genomic resource covering non-model amphibian species, based on public domain transcriptome data and computational methods developed during the X. laevis genome project. Here, I review the current status of AmphiBase, including amphibian species with available transcriptome data or biological samples, and describe the challenges of building a comprehensive amphibian genomic resource in the absence of genomes. This mini-review will be informative for researchers interested in functional genomic experiments using amphibian model organisms, such as Xenopus and axolotl, and will assist in interpretation of results implicating "orphan genes." Additionally, this study highlights an opportunity for researchers working on non-model amphibian species to collaborate in their future efforts and develop amphibian genomic resources as a community. © 2017 Wiley Periodicals, Inc.

  13. Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions.

    PubMed

    Heslot, Nicolas; Akdemir, Deniz; Sorrells, Mark E; Jannink, Jean-Luc

    2014-02-01

    Development of models to predict genotype by environment interactions, in unobserved environments, using environmental covariates, a crop model and genomic selection. Application to a large winter wheat dataset. Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome-wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data were available increased by 11.1% on average and the variability in prediction accuracy decreased by 10.8%. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios.

  14. Gene Patents and Personalized Cancer Care: Impact of the Myriad Case on Clinical Oncology

    PubMed Central

    Offit, Kenneth; Bradbury, Angela; Storm, Courtney; Merz, Jon F.; Noonan, Kevin E.; Spence, Rebecca

    2013-01-01

    Genomic discoveries have transformed the practice of oncology and cancer prevention. Diagnostic and therapeutic advances based on cancer genomics developed during a time when it was possible to patent genes. A case before the Supreme Court, Association for Molecular Pathology v Myriad Genetics, Inc seeks to overturn patents on isolated genes. Although the outcomes are uncertain, it is suggested here that the Supreme Court decision will have few immediate effects on oncology practice or research but may have more significant long-term impact. The Federal Circuit court has already rejected Myriad's broad diagnostic methods claims, and this is not affected by the Supreme Court decision. Isolated DNA patents were already becoming obsolete on scientific grounds, in an era when human DNA sequence is public knowledge and because modern methods of next-generation sequencing need not involve isolated DNA. The Association for Molecular Pathology v Myriad Supreme Court decision will have limited impact on new drug development, as new drug patents usually involve cellular methods. A nuanced Supreme Court decision acknowledging the scientific distinction between synthetic cDNA and genomic DNA will further mitigate any adverse impact. A Supreme Court decision to include or exclude all types of DNA from patent eligibility could impact future incentives for genomic discovery as well as the future delivery of medical care. Whatever the outcome of this important case, it is important that judicial and legislative actions in this area maximize genomic discovery while also ensuring patients' access to personalized cancer care. PMID:23766521

  15. Research progress of plant population genomics based on high-throughput sequencing.

    PubMed

    Wang, Yun-sheng

    2016-08-01

    Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.

  16. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”

    PubMed Central

    Tettelin, Hervé; Masignani, Vega; Cieslewicz, Michael J.; Donati, Claudio; Medini, Duccio; Ward, Naomi L.; Angiuoli, Samuel V.; Crabtree, Jonathan; Jones, Amanda L.; Durkin, A. Scott; DeBoy, Robert T.; Davidsen, Tanja M.; Mora, Marirosa; Scarselli, Maria; Margarit y Ros, Immaculada; Peterson, Jeremy D.; Hauser, Christopher R.; Sundaram, Jaideep P.; Nelson, William C.; Madupu, Ramana; Brinkac, Lauren M.; Dodson, Robert J.; Rosovitz, Mary J.; Sullivan, Steven A.; Daugherty, Sean C.; Haft, Daniel H.; Selengut, Jeremy; Gwinn, Michelle L.; Zhou, Liwei; Zafar, Nikhat; Khouri, Hoda; Radune, Diana; Dimitrov, George; Watkins, Kisha; O'Connor, Kevin J. B.; Smith, Shannon; Utterback, Teresa R.; White, Owen; Rubens, Craig E.; Grandi, Guido; Madoff, Lawrence C.; Kasper, Dennis L.; Telford, John L.; Wessels, Michael R.; Rappuoli, Rino; Fraser, Claire M.

    2005-01-01

    The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes. PMID:16172379

  17. Multiplex PCR-based DNA array for simultaneous detection of three human herpesviruses, EVB, CMV and KSHV.

    PubMed

    Fujimuro, Masahiro; Nakaso, Kazuhiro; Nakashima, Kenji; Sadanari, Hidetaka; Hisanori, Inoue; Teishikata, Yasuhiro; Hayward, S Diane; Yokosawa, Hideyoshi

    2006-04-01

    Human lymphotropic herpesviruses, Epstein-Barr virus (EBV), cytomegalovirus (CMV) and Kaposi's sarcoma-associated herpesvirus (KSHV) are responsible for a wide variety of human diseases. Due to an increase in diseased states associated with immunosuppression, more instances of co-morbid infections with these herpesviruses have resulted in viral reactivations that have caused numerous fatalities. Therefore, the development of rapid and accurate method to detect these viruses in immunocompromised patients is vital for immediate treatment with antiviral prophylactic drugs. In this study, we developed a new multiplex PCR method coupled to DNA array hybridization, which can simultaneously detect all three human herpesviruses in one single cell sample. Multiplex PCR primers were designed to amplify specific regions of the EBV (EBER1), CMV (IE) and KSHV (LANA) viral genomes. Pre-clinical application of this method revealed that this approach is capable of detecting as few as 1 copy of the viral genomes for KSHV and CMV and 100 copies of the genome for EBV. Furthermore, this highly sensitive test showed no cross-reactivity among the three viruses and is capable of detecting both KSHV and EBV viral genomes simultaneously in the lymphoblastoid cells that have been double infected with both viruses. Thus, this array-based approach serves as a rapid and reliable diagnostic tool for clinical applications.

  18. Event-specific qualitative and quantitative PCR detection of the GMO carnation (Dianthus caryophyllus) variety Moonlite based upon the 5'-transgene integration sequence.

    PubMed

    Li, P; Jia, J W; Jiang, L X; Zhu, H; Bai, L; Wang, J B; Tang, X M; Pan, A H

    2012-04-27

    To ensure the implementation of genetically modified organism (GMO)-labeling regulations, an event-specific detection method was developed based on the junction sequence of an exogenous integrant in the transgenic carnation variety Moonlite. The 5'-transgene integration sequence was isolated by thermal asymmetric interlaced PCR. Based upon the 5'-transgene integration sequence, the event-specific primers and TaqMan probe were designed to amplify the fragments, which spanned the exogenous DNA and carnation genomic DNA. Qualitative and quantitative PCR assays were developed employing the designed primers and probe. The detection limit of the qualitative PCR assay was 0.05% for Moonlite in 100 ng total carnation genomic DNA, corresponding to about 79 copies of the carnation haploid genome; the limit of detection and quantification of the quantitative PCR assay were estimated to be 38 and 190 copies of haploid carnation genomic DNA, respectively. Carnation samples with different contents of genetically modified components were quantified and the bias between the observed and true values of three samples were lower than the acceptance criterion (<25%) of the GMO detection method. These results indicated that these event-specific methods would be useful for the identification and quantification of the GMO carnation Moonlite.

  19. Methods, Tools and Current Perspectives in Proteogenomics *

    PubMed Central

    Ruggles, Kelly V.; Krug, Karsten; Wang, Xiaojing; Clauser, Karl R.; Wang, Jing; Payne, Samuel H.; Fenyö, David; Zhang, Bing; Mani, D. R.

    2017-01-01

    With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications. PMID:28456751

  20. Machine learning applications in genetics and genomics.

    PubMed

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

  1. Building the Evidence Base for Decision-making in Cancer Genomic Medicine Using Comparative Effectiveness Research

    PubMed Central

    Goddard, Katrina A.B.; Knaus, William A.; Whitlock, Evelyn; Lyman, Gary H.; Feigelson, Heather Spencer; Schully, Sheri D.; Ramsey, Scott; Tunis, Sean; Freedman, Andrew N.; Khoury, Muin J.; Veenstra, David L.

    2013-01-01

    Background The clinical utility is uncertain for many cancer genomic applications. Comparative effectiveness research (CER) can provide evidence to clarify this uncertainty. Objectives To identify approaches to help stakeholders make evidence-based decisions, and to describe potential challenges and opportunities using CER to produce evidence-based guidance. Methods We identified general CER approaches for genomic applications through literature review, the authors’ experiences, and lessons learned from a recent, seven-site CER initiative in cancer genomic medicine. Case studies illustrate the use of CER approaches. Results Evidence generation and synthesis approaches include comparative observational and randomized trials, patient reported outcomes, decision modeling, and economic analysis. We identified significant challenges to conducting CER in cancer genomics: the rapid pace of innovation, the lack of regulation, the limited evidence for clinical utility, and the beliefs that genomic tests could have personal utility without having clinical utility. Opportunities to capitalize on CER methods in cancer genomics include improvements in the conduct of evidence synthesis, stakeholder engagement, increasing the number of comparative studies, and developing approaches to inform clinical guidelines and research prioritization. Conclusions CER offers a variety of methodological approaches to address stakeholders’ needs. Innovative approaches are needed to ensure an effective translation of genomic discoveries. PMID:22516979

  2. Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE).

    PubMed

    Paull, Evan O; Carlin, Daniel E; Niepel, Mario; Sorger, Peter K; Haussler, David; Stuart, Joshua M

    2013-11-01

    Identifying the cellular wiring that connects genomic perturbations to transcriptional changes in cancer is essential to gain a mechanistic understanding of disease initiation, progression and ultimately to predict drug response. We have developed a method called Tied Diffusion Through Interacting Events (TieDIE) that uses a network diffusion approach to connect genomic perturbations to gene expression changes characteristic of cancer subtypes. The method computes a subnetwork of protein-protein interactions, predicted transcription factor-to-target connections and curated interactions from literature that connects genomic and transcriptomic perturbations. Application of TieDIE to The Cancer Genome Atlas and a breast cancer cell line dataset identified key signaling pathways, with examples impinging on MYC activity. Interlinking genes are predicted to correspond to essential components of cancer signaling and may provide a mechanistic explanation of tumor character and suggest subtype-specific drug targets. Software is available from the Stuart lab's wiki: https://sysbiowiki.soe.ucsc.edu/tiedie. jstuart@ucsc.edu. Supplementary data are available at Bioinformatics online.

  3. Nanoliter reactors improve multiple displacement amplification of genomes from single cells.

    PubMed

    Marcy, Yann; Ishoey, Thomas; Lasken, Roger S; Stockwell, Timothy B; Walenz, Brian P; Halpern, Aaron L; Beeson, Karen Y; Goldberg, Susanne M D; Quake, Stephen R

    2007-09-01

    Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-microl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.

  4. Bacterial genomes in epidemiology—present and future

    PubMed Central

    Croucher, Nicholas J.; Harris, Simon R.; Grad, Yonatan H.; Hanage, William P.

    2013-01-01

    Sequence data are well established in the reconstruction of the phylogenetic and demographic scenarios that have given rise to outbreaks of viral pathogens. The application of similar methods to bacteria has been hindered in the main by the lack of high-resolution nucleotide sequence data from quality samples. Developing and already available genomic methods have greatly increased the amount of data that can be used to characterize an isolate and its relationship to others. However, differences in sequencing platforms and data analysis mean that these enhanced data come with a cost in terms of portability: results from one laboratory may not be directly comparable with those from another. Moreover, genomic data for many bacteria bear the mark of a history including extensive recombination, which has the potential to greatly confound phylogenetic and coalescent analyses. Here, we discuss the exacting requirements of genomic epidemiology, and means by which the distorting signal of recombination can be minimized to permit the leverage of growing datasets of genomic data from bacterial pathogens. PMID:23382424

  5. CRISPR-Cas9-Mediated Genome Editing and Transcriptional Control in Yarrowia lipolytica.

    PubMed

    Schwartz, Cory; Wheeldon, Ian

    2018-01-01

    The discovery and adaptation of RNA-guided nucleases has resulted in the rapid development of efficient, scalable, and easily accessible synthetic biology tools for targeted genome editing and transcriptional control. In these systems, for example CRISPR-Cas9 from Streptococcus pyogenes, a protein with nuclease activity is targeted to a specific nucleotide sequence by a short RNA molecule, whereupon binding it cleaves the targeted nucleotide strand. To extend this genome-editing ability to the industrially important oleaginous yeast Yarrowia lipolytica, we developed a set of easily usable and effective CRISPR-Cas9 episomal vectors. In this protocols chapter, we first present a method by which arbitrary protein-coding genes can be disrupted via indel formation after CRISPR-Cas9 targeting. A second method demonstrates how the same CRISPR-Cas9 system can be used to induce markerless gene cassette integration into the genome by inducing homologous recombination after DNA cleavage by Cas9. Finally, we describe how a catalytically inactive form of Cas9 fused to a transcriptional repressor can be used to control transcription of native genes in Y. lipolytica. The CRISPR-Cas9 tools and strategies described here greatly increase the types of genome editing and transcriptional control that can be achieved in Y. lipolytica, and promise to facilitate more advanced engineering of this important oleaginous host.

  6. Identification of immunogenic polypeptides from a Mycoplasma hyopneumoniae genome library by phage display.

    PubMed

    Kügler, Jonas; Nieswandt, Simone; Gerlach, Gerald F; Meens, Jochen; Schirrmann, Thomas; Hust, Michael

    2008-09-01

    The identification of immunogenic polypeptides of pathogens is helpful for the development of diagnostic assays and therapeutic applications like vaccines. Routinely, these proteins are identified by two-dimensional polyacrylamide gel electrophoresis and Western blot using convalescent serum, followed by mass spectrometry. This technology, however, is limited, because low or differentially expressed proteins, e.g. dependent on pathogen-host interaction, cannot be identified. In this work, we developed and improved a M13 genomic phage display-based method for the selection of immunogenic polypeptides of Mycoplasma hyopneumoniae, a pathogen causing porcine enzootic pneumonia. The fragmented genome of M. hyopneumoniae was cloned into a phage display vector, and the genomic library was packaged using the helperphage Hyperphage to enrich open reading frames (ORFs). Afterwards, the phage display library was screened by panning using convalescent serum. The analysis of individual phage clones resulted in the identification of five genes encoding immunogenic proteins, only two of which had been previously identified and described as immunogenic. This M13 genomic phage display, directly combining ORF enrichment and the presentation of the corresponding polypeptide on the phage surface, complements proteome-based methods for the identification of immunogenic polypeptides and is particularly well suited for the use in mycoplasma species.

  7. Homoeologous chromosome pairing between the A and B genomes of Musa spp. revealed by genomic in situ hybridization

    PubMed Central

    Jeridi, Mouna; Bakry, Frédéric; Escoute, Jacques; Fondi, Emmanuel; Carreel, Françoise; Ferchichi, Ali; D'Hont, Angélique; Rodier-Goud, Marguerite

    2011-01-01

    Background and Aims Most cooking banana and several desert bananas are interspecific triploid hybrids between Musa acuminata (A genome) and Musa balbisiana (B genome). In addition, M. balbisiana has agronomical characteristics such as resistance to biotic and abiotic stresses that could be useful to improve monospecific acuminata cultivars. To develop efficient breeding strategies for improving Musa cultivars, it is therefore important to understand the possibility of chromosome exchange between these two species. Methods A protocol was developed to prepare chromosome at meiosis metaphase I suitable for genomic in situ hybridization. A series of technical challenges were encountered, the main ones being the hardness of the cell wall and the density of the microsporocyte's cytoplasm, which hampers accessibility of the probes to the chromosomes. Key parameters in solving these problems were addition of macerozyme in the enzyme mix, the duration of digestion and temperature during the spreading phase. Results and Conclusions This method was applied to analyse chromosome pairing in metaphase from triploid interspecific cultivars, and it was clearly demonstrated that interspecific recombinations between M. acuminata and M. balbisiana chromosomes do occur and may be frequent in triploid hybrids. These results provide new insight into Musa cultivar evolution and have important implications for breeding. PMID:21835815

  8. A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers

    PubMed Central

    2012-01-01

    Background Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway. PMID:23216969

  9. A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers.

    PubMed

    Günther, Oliver P; Chen, Virginia; Freue, Gabriela Cohen; Balshaw, Robert F; Tebbutt, Scott J; Hollander, Zsuzsanna; Takhar, Mandeep; McMaster, W Robert; McManus, Bruce M; Keown, Paul A; Ng, Raymond T

    2012-12-08

    Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.

  10. A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.

    PubMed

    Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger

    2018-04-19

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.

  11. The current state of resident training in genomic pathology: a comprehensive analysis utilizing the Resident In-Service Exam (RISE)

    PubMed Central

    Haspel, Richard L.; Rinder, Henry M.; Frank, Karen M.; Wagner, Jay; Ali, Asma M.; Fisher, Patrick B.; Parks, Eric R.

    2014-01-01

    Objectives To determine the current state of pathology resident training in genomic and molecular pathology. Methods The Training Residents in Genomics (TRIG) Working Group developed survey and knowledge questions for the 2013 Pathology Resident In-Service Examination (RISE). Sixteen demographic questions related to amount of training, current and predicted future use, and perceived ability in molecular pathology vs. genomic medicine were included along with five genomic pathology and 19 molecular pathology knowledge questions. Results A total of 2,506 pathology residents took the 2013 RISE with approximately 600 individuals per post-graduate year (PGY). For genomic medicine, 42% of PGY-4 respondents stated they had no training compared to 7% for molecular pathology (p<0.001). PGY-4 resident perceived ability in genomic medicine, comfort in discussing results, and predicted future use as a practicing pathologist were less than reported for molecular pathology (p<0.001). There was a greater increase by PGY in knowledge question scores for molecular than for genomic pathology. Conclusions The RISE is a powerful tool in assessing the state of resident training in genomic pathology and current results suggest a significant deficit. The results also provide a baseline to assess future initiatives to improve genomics education for pathology residents such as those developed by the TRIG Working Group. PMID:25239410

  12. Genome-Wide Fine-Scale Recombination Rate Variation in Drosophila melanogaster

    PubMed Central

    Song, Yun S.

    2012-01-01

    Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity. PMID:23284288

  13. Calibrating genomic and allelic coverage bias in single-cell sequencing.

    PubMed

    Zhang, Cheng-Zhong; Adalsteinsson, Viktor A; Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L; Meyerson, Matthew; Love, J Christopher

    2015-04-16

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1-10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (∼0.1 × ) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples.

  14. Calibrating genomic and allelic coverage bias in single-cell sequencing

    PubMed Central

    Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L.; Meyerson, Matthew; Love, J. Christopher

    2016-01-01

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1–10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (~0.1 ×) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples. PMID:25879913

  15. The IGNITE network: a model for genomic medicine implementation and research.

    PubMed

    Weitzel, Kristin Wiisanen; Alexander, Madeline; Bernhardt, Barbara A; Calman, Neil; Carey, David J; Cavallari, Larisa H; Field, Julie R; Hauser, Diane; Junkins, Heather A; Levin, Phillip A; Levy, Kenneth; Madden, Ebony B; Manolio, Teri A; Odgis, Jacqueline; Orlando, Lori A; Pyeritz, Reed; Wu, R Ryanne; Shuldiner, Alan R; Bottinger, Erwin P; Denny, Joshua C; Dexter, Paul R; Flockhart, David A; Horowitz, Carol R; Johnson, Julie A; Kimmel, Stephen E; Levy, Mia A; Pollin, Toni I; Ginsburg, Geoffrey S

    2016-01-05

    Patients, clinicians, researchers and payers are seeking to understand the value of using genomic information (as reflected by genotyping, sequencing, family history or other data) to inform clinical decision-making. However, challenges exist to widespread clinical implementation of genomic medicine, a prerequisite for developing evidence of its real-world utility. To address these challenges, the National Institutes of Health-funded IGNITE (Implementing GeNomics In pracTicE; www.ignite-genomics.org ) Network, comprised of six projects and a coordinating center, was established in 2013 to support the development, investigation and dissemination of genomic medicine practice models that seamlessly integrate genomic data into the electronic health record and that deploy tools for point of care decision making. IGNITE site projects are aligned in their purpose of testing these models, but individual projects vary in scope and design, including exploring genetic markers for disease risk prediction and prevention, developing tools for using family history data, incorporating pharmacogenomic data into clinical care, refining disease diagnosis using sequence-based mutation discovery, and creating novel educational approaches. This paper describes the IGNITE Network and member projects, including network structure, collaborative initiatives, clinical decision support strategies, methods for return of genomic test results, and educational initiatives for patients and providers. Clinical and outcomes data from individual sites and network-wide projects are anticipated to begin being published over the next few years. The IGNITE Network is an innovative series of projects and pilot demonstrations aiming to enhance translation of validated actionable genomic information into clinical settings and develop and use measures of outcome in response to genome-based clinical interventions using a pragmatic framework to provide early data and proofs of concept on the utility of these interventions. Through these efforts and collaboration with other stakeholders, IGNITE is poised to have a significant impact on the acceleration of genomic information into medical practice.

  16. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    PubMed Central

    2013-01-01

    Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which may help to identify deleterious alleles that are the basis of inbreeding depression in the species. PMID:23324311

  17. Evaluation of Methods for de novo Genome assembly from High-throughput Sequencing Reads Reveals Dependencies that Affect the Quality of the Results

    USDA-ARS?s Scientific Manuscript database

    Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole...

  18. A method of genotyping by pedigree-based training-set for identification of QTLs associated with cucumber fruit size

    USDA-ARS?s Scientific Manuscript database

    Large sets of genomic data are becoming available for cucumber (Cucumis sativus), yet there is no tool for whole genome genotyping. Creation of saturated genetic maps depends on development of good markers. The present cucumber genetic maps are based on several hundreds of markers. However they are ...

  19. 75 FR 1770 - An Approach to Using Toxicogenomic Data in U.S. EPA Human Health Risk Assessments: A Dibutyl...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-01-13

    ... qualitative aspects of the risk assessment because of the type of genomic data available for DBP. It is... Assessment (NCEA) within EPA's Office of Research and Development (ORD). Toxicogenomics is the application of... exploratory methods for analyzing genomic data for application to risk assessment and some preliminary results...

  20. Development of an Efficient Genome Editing Tool in Bacillus licheniformis Using CRISPR-Cas9 Nickase.

    PubMed

    Li, Kaifeng; Cai, Dongbo; Wang, Zhangqian; He, Zhili; Chen, Shouwen

    2018-03-15

    Bacillus strains are important industrial bacteria that can produce various biochemical products. However, low transformation efficiencies and a lack of effective genome editing tools have hindered its widespread application. Recently, clustered regularly interspaced short palindromic repeat (CRISPR)-Cas9 techniques have been utilized in many organisms as genome editing tools because of their high efficiency and easy manipulation. In this study, an efficient genome editing method was developed for Bacillus licheniformis using a CRISPR-Cas9 nickase integrated into the genome of B. licheniformis DW2 with overexpression driven by the P43 promoter. The yvmC gene was deleted using the CRISPR-Cas9n technique with homology arms of 1.0 kb as a representative example, and an efficiency of 100% was achieved. In addition, two genes were simultaneously disrupted with an efficiency of 11.6%, and the large DNA fragment bacABC (42.7 kb) was deleted with an efficiency of 79.0%. Furthermore, the heterologous reporter gene aprN , which codes for nattokinase in Bacillus subtilis , was inserted into the chromosome of B. licheniformis with an efficiency of 76.5%. The activity of nattokinase in the DWc9nΔ7/pP43SNT-S sacC strain reached 59.7 fibrinolytic units (FU)/ml, which was 25.7% higher than that of DWc9n/pP43SNT-S sacC Finally, the engineered strain DWc9nΔ7 (Δ epr Δ wprA Δ mpr Δ aprE Δ vpr Δ bprA Δ bacABC ), with multiple disrupted genes, was constructed using the CRISPR-Cas9n technique. Taken together, we have developed an efficient genome editing tool based on CRISPR-Cas9n in B. licheniformis This tool could be applied to strain improvement for future research. IMPORTANCE As important industrial bacteria, Bacillus strains have attracted significant attention due to their production of biological products. However, genetic manipulation of these bacteria is difficult. The CRISPR-Cas9 system has been applied to genome editing in some bacteria, and CRISPR-Cas9n was proven to be an efficient and precise tool in previous reports. The significance of our research is the development of an efficient, more precise, and systematic genome editing method for single-gene deletion, multiple-gene disruption, large DNA fragment deletion, and single-gene integration in Bacillus licheniformis via Cas9 nickase. We also applied this method to the genetic engineering of the host strain for protein expression. Copyright © 2018 American Society for Microbiology.

  1. Genome editing of Ralstonia eutropha using an electroporation-based CRISPR-Cas9 technique.

    PubMed

    Xiong, Bin; Li, Zhongkang; Liu, Li; Zhao, Dongdong; Zhang, Xueli; Bi, Changhao

    2018-01-01

    Ralstonia eutropha is an important bacterium for the study of polyhydroxyalkanoates (PHAs) synthesis and CO 2 fixation, which makes it a potential strain for industrial PHA production and attractive host for CO 2 conversion. Although the bacterium is not recalcitrant to genetic manipulation, current methods for genome editing based on group II introns or single crossover integration of a suicide plasmid are inefficient and time-consuming, which limits the genetic engineering of this organism. Thus, developing an efficient and convenient method for R. eutropha genome editing is imperative. An efficient genome editing method for R. eutropha was developed using an electroporation-based CRISPR-Cas9 technique. In our study, the electroporation efficiency of R. eutropha was found to be limited by its restriction-modification (RM) systems. By searching the putative RM systems in R. eutropha H16 using REBASE database and comparing with that in E. coli MG1655, five putative restriction endonuclease genes which are related to the RM systems in R. eutropha were predicated and disrupted. It was found that deletion of H16_A0006 and H16_A0008 - 9 increased the electroporation efficiency 1658 and 4 times, respectively. Fructose was found to reduce the leaky expression of the arabinose-inducible pBAD promoter, which was used to optimize the expression of cas9 , enabling genome editing via homologous recombination based on CRISPR-Cas9 in R. eutropha . A total of five genes were edited with efficiencies ranging from 78.3 to 100%. The CRISPR-Cpf1 system and the non-homologous end joining mechanism were also investigated, but failed to yield edited strains. We present the first genome editing method for R. eutropha using an electroporation-based CRISPR-Cas9 approach, which significantly increased the efficiency and decreased time to manipulate this facultative chemolithoautotrophic microbe. The novel technique will facilitate more advanced researches and applications of R. eutropha for PHA production and CO 2 conversion.

  2. Informing the Design of Direct-to-Consumer Interactive Personal Genomics Reports

    PubMed Central

    Shaer, Orit; Okerlund, Johanna; Balestra, Martina; Stowell, Elizabeth; Ascher, Laura; Bi, Joanna; Schlenker, Claire; Ball, Madeleine

    2015-01-01

    Background In recent years, people who sought direct-to-consumer genetic testing services have been increasingly confronted with an unprecedented amount of personal genomic information, which influences their decisions, emotional state, and well-being. However, these users of direct-to-consumer genetic services, who vary in their education and interests, frequently have little relevant experience or tools for understanding, reasoning about, and interacting with their personal genomic data. Online interactive techniques can play a central role in making personal genomic data useful for these users. Objective We sought to (1) identify the needs of diverse users as they make sense of their personal genomic data, (2) consequently develop effective interactive visualizations of genomic trait data to address these users’ needs, and (3) evaluate the effectiveness of the developed visualizations in facilitating comprehension. Methods The first two user studies, conducted with 63 volunteers in the Personal Genome Project and with 36 personal genomic users who participated in a design workshop, respectively, employed surveys and interviews to identify the needs and expectations of diverse users. Building on the two initial studies, the third study was conducted with 730 Amazon Mechanical Turk users and employed a controlled experimental design to examine the effectiveness of different design interventions on user comprehension. Results The first two studies identified searching, comparing, sharing, and organizing data as fundamental to users’ understanding of personal genomic data. The third study demonstrated that interactive and visual design interventions could improve the understandability of personal genomic reports for consumers. In particular, results showed that a new interactive bubble chart visualization designed for the study resulted in the highest comprehension scores, as well as the highest perceived comprehension scores. These scores were significantly higher than scores received using the industry standard tabular reports currently used for communicating personal genomic information. Conclusions Drawing on multiple research methods and populations, the findings of the studies reported in this paper offer deep understanding of users’ needs and practices, and demonstrate that interactive online design interventions can improve the understandability of personal genomic reports for consumers. We discuss implications for designers and researchers. PMID:26070951

  3. Single-Cell Sequencing for Precise Cancer Research: Progress and Prospects.

    PubMed

    Zhang, Xiaoyan; Marjani, Sadie L; Hu, Zhaoyang; Weissman, Sherman M; Pan, Xinghua; Wu, Shixiu

    2016-03-15

    Advances in genomic technology have enabled the faithful detection and measurement of mutations and the gene expression profile of cancer cells at the single-cell level. Recently, several single-cell sequencing methods have been developed that permit the comprehensive and precise analysis of the cancer-cell genome, transcriptome, and epigenome. The use of these methods to analyze cancer cells has led to a series of unanticipated discoveries, such as the high heterogeneity and stochastic changes in cancer-cell populations, the new driver mutations and the complicated clonal evolution mechanisms, and the novel identification of biomarkers of variant tumors. These methods and the knowledge gained from their utilization could potentially improve the early detection and monitoring of rare cancer cells, such as circulating tumor cells and disseminated tumor cells, and promote the development of personalized and highly precise cancer therapy. Here, we discuss the current methods for single cancer-cell sequencing, with a strong focus on those practically used or potentially valuable in cancer research, including single-cell isolation, whole genome and transcriptome amplification, epigenome profiling, multi-dimensional sequencing, and next-generation sequencing and analysis. We also examine the current applications, challenges, and prospects of single cancer-cell sequencing. ©2016 American Association for Cancer Research.

  4. DNA methylation at hepatitis B viral integrants is associated with methylation at flanking human genomic sequences

    PubMed Central

    Watanabe, Yoshiyuki; Yamamoto, Hiroyuki; Oikawa, Ritsuko; Toyota, Minoru; Yamamoto, Masakazu; Kokudo, Norihiro; Tanaka, Shinji; Arii, Shigeki; Yotsuyanagi, Hiroshi; Koike, Kazuhiko; Itoh, Fumio

    2015-01-01

    Integration of DNA viruses into the human genome plays an important role in various types of tumors, including hepatitis B virus (HBV)–related hepatocellular carcinoma. However, the molecular details and clinical impact of HBV integration on either human or HBV epigenomes are unknown. Here, we show that methylation of the integrated HBV DNA is related to the methylation status of the flanking human genome. We developed a next-generation sequencing-based method for structural methylation analysis of integrated viral genomes (denoted G-NaVI). This method is a novel approach that enables enrichment of viral fragments for sequencing using unique baits based on the sequence of the HBV genome. We detected integrated HBV sequences in the genome of the PLC/PRF/5 cell line and found variable levels of methylation within the integrated HBV genomes. Allele-specific methylation analysis revealed that the HBV genome often became significantly methylated when integrated into highly methylated host sites. After integration into unmethylated human genome regions such as promoters, however, the HBV DNA remains unmethylated and may eventually play an important role in tumorigenesis. The observed dynamic changes in DNA methylation of the host and viral genomes may functionally affect the biological behavior of HBV. These findings may impact public health given that millions of people worldwide are carriers of HBV. We also believe our assay will be a powerful tool to increase our understanding of the various types of DNA virus-associated tumorigenesis. PMID:25653310

  5. A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta.

    PubMed

    Gostel, Morgan R; Kelloff, Carol; Wallick, Kyle; Funk, Vicki A

    2016-09-01

    Internationally, gardens hold diverse living collections that can be preserved for genomic research. Workflows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate for plants. We outline a workflow for tissue sampling intended for two audiences: botanists interested in genomics research and garden staff who plan to voucher living collections. Standard herbarium methods are used to collect vouchers, label information and images are entered into a publicly accessible database, and leaf tissue is preserved in silica and liquid nitrogen. A five-step approach for genomic tissue sampling is presented for sampling from living collections according to current best practices. Collecting genome-quality samples from gardens is an economical and rapid way to make available for scientific research tissue from the diversity of plants on Earth. The Global Genome Initiative will facilitate and lead this endeavor through international partnerships.

  6. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    PubMed

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  7. Goals and hurdles for a successful implementation of genomic selection in breeding programme for selected annual and perennial crops.

    PubMed

    Jonas, Elisabeth; de Koning, Dirk Jan

    Genomic Selection is an important topic in quantitative genetics and breeding. Not only does it allow the full use of current molecular genetic technologies, it stimulates also the development of new methods and models. Genomic selection, if fully implemented in commercial farming, should have a major impact on the productivity of various agricultural systems. But suggested approaches need to be applicable in commercial breeding populations. Many of the published research studies focus on methodologies. We conclude from the reviewed publications, that a stronger focus on strategies for the implementation of genomic selection in advanced breeding lines, introduction of new varieties, hybrids or multi-line crosses is needed. Efforts to find solutions for a better prediction and integration of environmental influences need to continue within applied breeding schemes. Goals of the implementation of genomic selection into crop breeding should be carefully defined and crop breeders in the private sector will play a substantial part in the decision-making process. However, the lack of published results from studies within, or in collaboration with, private companies diminishes the knowledge on the status of genomic selection within applied breeding programmes. Studies on the implementation of genomic selection in plant breeding need to evaluate models and methods with an enhanced emphasis on population-specific requirements and production environments. Adaptation of methods to breeding schemes or changes to breeding programmes for a better integration of genomic selection strategies are needed across species. More openness with a continuous exchange will contribute to successes.

  8. Proteome Studies of Filamentous Fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, Scott E.; Panisko, Ellen A.

    2011-04-20

    The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide breadth of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, non-gel basedmore » proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of different variations on the general method and technologies for identifying peptides in a given sample. We present a method that can serve as a “baseline” for proteomic studies of fungi.« less

  9. Importing statistical measures into Artemis enhances gene identification in the Leishmania genome project.

    PubMed

    Aggarwal, Gautam; Worthey, E A; McDonagh, Paul D; Myler, Peter J

    2003-06-07

    Seattle Biomedical Research Institute (SBRI) as part of the Leishmania Genome Network (LGN) is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces. Here we describe a methodology used to import results from three different protein-coding gene-prediction algorithms (GLIMMER, TESTCODE and GENESCAN) into the ARTEMIS sequence viewer and annotation tool. Comparison of these methods, along with the CODONUSAGE algorithm built into ARTEMIS, shows the importance of combining methods to more accurately annotate the L. major genomic sequence. An improvised and powerful tool for gene prediction has been developed by importing data from widely-used algorithms into an existing annotation platform. This approach is especially fruitful in the Leishmania genome project where there is large proportion of novel genes requiring manual annotation.

  10. Draft Genome Sequences of Two Species of "Difficult-to-Identify" Human-Pathogenic Corynebacteria: Implications for Better Identification Tests.

    PubMed

    Pacheco, Luis G C; Mattos-Guaraldi, Ana L; Santos, Carolina S; Veras, Adonney A O; Guimarães, Luis C; Abreu, Vinícius; Pereira, Felipe L; Soares, Siomar C; Dorella, Fernanda A; Carvalho, Alex F; Leal, Carlos G; Figueiredo, Henrique C P; Ramos, Juliana N; Vieira, Veronica V; Farfour, Eric; Guiso, Nicole; Hirata, Raphael; Azevedo, Vasco; Silva, Artur; Ramos, Rommel T J

    2015-01-01

    Non-diphtheriae Corynebacterium species have been increasingly recognized as the causative agents of infections in humans. Differential identification of these bacteria in the clinical microbiology laboratory by the most commonly used biochemical tests is challenging, and normally requires additional molecular methods. Herein, we present the annotated draft genome sequences of two isolates of "difficult-to-identify" human-pathogenic corynebacterial species: C. xerosis and C. minutissimum. The genome sequences of ca. 2.7 Mbp, with a mean number of 2,580 protein encoding genes, were also compared with the publicly available genome sequences of strains of C. amycolatum and C. striatum. These results will aid the exploration of novel biochemical reactions to improve existing identification tests as well as the development of more accurate molecular identification methods through detection of species-specific target genes for isolate's identification or drug susceptibility profiling.

  11. Proteome studies of filamentous fungi.

    PubMed

    Baker, Scott E; Panisko, Ellen A

    2011-01-01

    The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide variety of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, nongel-based proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of variations on the general methods and technologies for identifying peptides in a given sample. We present a method that can serve as a "baseline" for proteomic studies of fungi.

  12. Fast Ordered Sampling of DNA Sequence Variants.

    PubMed

    Greenberg, Anthony J

    2018-05-04

    Explosive growth in the amount of genomic data is matched by increasing power of consumer-grade computers. Even applications that require powerful servers can be quickly tested on desktop or laptop machines if we can generate representative samples from large data sets. I describe a fast and memory-efficient implementation of an on-line sampling method developed for tape drives 30 years ago. Focusing on genotype files, I test the performance of this technique on modern solid-state and spinning hard drives, and show that it performs well compared to a simple sampling scheme. I illustrate its utility by developing a method to quickly estimate genome-wide patterns of linkage disequilibrium (LD) decay with distance. I provide open-source software that samples loci from several variant format files, a separate program that performs LD decay estimates, and a C++ library that lets developers incorporate these methods into their own projects. Copyright © 2018 Greenberg.

  13. Characterization of the exogenous insert and development of event-specific PCR detection methods for genetically modified Huanong No. 1 papaya.

    PubMed

    Guo, Jinchao; Yang, Litao; Liu, Xin; Guan, Xiaoyan; Jiang, Lingxi; Zhang, Dabing

    2009-08-26

    Genetically modified (GM) papaya (Carica papaya L.), Huanong No. 1, was approved for commercialization in Guangdong province, China in 2006, and the development of the Huanong No. 1 papaya detection method is necessary for implementing genetically modified organism (GMO) labeling regulations. In this study, we reported the characterization of the exogenous integration of GM Huanong No. 1 papaya by means of conventional polymerase chain reaction (PCR) and thermal asymmetric interlaced (TAIL)-PCR strategies. The results suggested that one intact copy of the initial construction was integrated in the papaya genome and which probably resulted in one deletion (38 bp in size) of the host genomic DNA. Also, one unintended insertion of a 92 bp truncated NptII fragment was observed at the 5' end of the exogenous insert. Furthermore, we revealed its 5' and 3' flanking sequences between the insert DNA and the papaya genomic DNA, and developed the event-specific qualitative and quantitative PCR assays for GM Huanong No. 1 papaya based on the 5' integration flanking sequence. The relative limit of detection (LOD) of the qualitative PCR assay was about 0.01% in 100 ng of total papaya genomic DNA, corresponding to about 25 copies of papaya haploid genome. In the quantitative PCR, the limits of detection and quantification (LOD and LOQ) were as low as 12.5 and 25 copies of papaya haploid genome, respectively. In practical sample quantification, the quantified biases between the test and true values of three samples ranged from 0.44% to 4.41%. Collectively, we proposed that all of these results are useful for the identification and quantification of Huanong No. 1 papaya and its derivates.

  14. Genome Editing with Engineered Nucleases in Economically Important Animals and Plants: State of the Art in the Research Pipeline.

    PubMed

    Sovová, Tereza; Kerins, Gerard; Demnerová, Kateřina; Ovesná, Jaroslava

    2017-01-01

    After induced mutagenesis and transgenesis, genome editing is the next step in the development of breeding techniques. Genome editing using site-directed nucleases - including meganucleases, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and the CRISPR/Cas9 system - is based on the mechanism of double strand breaks. The nuclease is directed to cleave the DNA at a specific place of the genome which is then repaired by natural repair mechanisms. Changes are introduced during the repair that are either accidental or can be targeted if a DNA template with the desirable sequence is provided. These techniques allow making virtually any change to the genome including specific DNA sequence changes, gene insertion, replacements or deletions with unprecedented precision and specificity while being less laborious and more straightforward compared to traditional breeding techniques or transgenesis. Therefore, the research in this field is developing quickly and, apart from model species, multiple studies have focused on economically important species and agronomically important traits that were the key subjects of this review. In plants, studies have been undertaken on disease resistance, herbicide tolerance, nutrient metabolism and nutritional value. In animals, the studies have mainly focused on disease resistance, meat production and allergenicity of milk. However, none of the promising studies has led to commercialization despite several patent applications. The uncertain legal status of genome-editing methods is one of the reasons for poor commercial development, as it is not clear whether the products would fall under the GMO regulation. We believe this issue should be clarified soon in order to allow promising methods to reach their full potential.

  15. Natural products discovery from micro-organisms in the post-genome era.

    PubMed

    Ikeda, Haruo

    2017-01-01

    With the decision to award the Nobel Prize in Physiology or Medicine to Drs. S. Ōmura, W.C. Campbell, and Y. Tu, the importance and usefulness of natural drug discovery and development have been revalidated. Since the end of the twentieth century, many genome analyses of organisms have been conducted, and accordingly, numerous microbial genomes have been decoded. In particular, genomic studies of actinomycetes, micro-organisms that readily produce natural products, led to the discovery of biosynthetic gene clusters responsible for producing natural products. New explorations for natural products through a comprehensive approach combining genomic information with conventional methods show great promise for the discovery of new natural products and even systematic generation of unnaturally occurring compounds.

  16. Streamlined Genome Sequence Compression using Distributed Source Coding

    PubMed Central

    Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

    2014-01-01

    We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552

  17. Towards the analysis of the genomes of single cells: further characterisation of the multiple displacement amplification.

    PubMed

    Panelli, Simona; Damiani, Giuseppe; Espen, Luca; Micheli, Gioacchino; Sgaramella, Vittorio

    2006-05-10

    The development of methods for the analysis and comparison of the nucleic acids contained in single cells is an ambitious and challenging goal that may provide useful insights in many physiopathological processes. We review here some of the published protocols for the amplification of whole genomes (WGA). We focus on the reaction known as Multiple Displacement Amplification (MDA), which probably represents the most reliable and efficient WGA protocol developed to date. We discuss some recent advances and applications, as well as some modifications to the reaction, which should improve its use and enlarge its range of applicability possibly to degraded genomes, and also to RNA via complementary DNA.

  18. Genetic and Genomic Toolbox of Zea mays

    PubMed Central

    Nannas, Natalie J.; Dawe, R. Kelly

    2015-01-01

    Maize has a long history of genetic and genomic tool development and is considered one of the most accessible higher plant systems. With a fully sequenced genome, a suite of cytogenetic tools, methods for both forward and reverse genetics, and characterized phenotype markers, maize is amenable to studying questions beyond plant biology. Major discoveries in the areas of transposons, imprinting, and chromosome biology came from work in maize. Moving forward in the post-genomic era, this classic model system will continue to be at the forefront of basic biological study. In this review, we outline the basics of working with maize and describe its rich genetic toolbox. PMID:25740912

  19. GENOME-WIDE COMPARATIVE ANALYSIS OF PHYLOGENETIC TREES: THE PROKARYOTIC FOREST OF LIFE

    PubMed Central

    Puigbò, Pere; Wolf, Yuri I.; Koonin, Eugene V.

    2013-01-01

    Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance (SD) method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the applications methods used to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a ‘species tree’. PMID:22399455

  20. Genome-wide comparative analysis of phylogenetic trees: the prokaryotic forest of life.

    PubMed

    Puigbò, Pere; Wolf, Yuri I; Koonin, Eugene V

    2012-01-01

    Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article, we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the application of these methods to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a "species tree."

  1. Diversity arrays technology: a generic genome profiling technology on open platforms.

    PubMed

    Kilian, Andrzej; Wenzl, Peter; Huttner, Eric; Carling, Jason; Xia, Ling; Blois, Hélène; Caig, Vanessa; Heller-Uszynska, Katarzyna; Jaccoud, Damian; Hopper, Colleen; Aschenbrenner-Kilian, Malgorzata; Evers, Margaret; Peng, Kaiman; Cayla, Cyril; Hok, Puthick; Uszynski, Grzegorz

    2012-01-01

    In the last 20 years, we have observed an exponential growth of the DNA sequence data and simular increase in the volume of DNA polymorphism data generated by numerous molecular marker technologies. Most of the investment, and therefore progress, concentrated on human genome and genomes of selected model species. Diversity Arrays Technology (DArT), developed over a decade ago, was among the first "democratizing" genotyping technologies, as its performance was primarily driven by the level of DNA sequence variation in the species rather than by the level of financial investment. DArT also proved more robust to genome size and ploidy-level differences among approximately 60 organisms for which DArT was developed to date compared to other high-throughput genotyping technologies. The success of DArT in a number of organisms, including a wide range of "orphan crops," can be attributed to the simplicity of underlying concepts: DArT combines genome complexity reduction methods enriching for genic regions with a highly parallel assay readout on a number of "open-access" microarray platforms. The quantitative nature of the assay enabled a number of applications in which allelic frequencies can be estimated from DArT arrays. A typical DArT assay tests for polymorphism tens of thousands of genomic loci with the final number of markers reported (hundreds to thousands) reflecting the level of DNA sequence variation in the tested loci. Detailed DArT methods, protocols, and a range of their application examples as well as DArT's evolution path are presented.

  2. Toward Universal Forward Genetics: Using a Draft Genome Sequence of the Nematode Oscheius tipulae To Identify Mutations Affecting Vulva Development

    PubMed Central

    Besnard, Fabrice; Koutsovoulos, Georgios; Dieudonné, Sana; Blaxter, Mark; Félix, Marie-Anne

    2017-01-01

    Mapping-by-sequencing has become a standard method to map and identify phenotype-causing mutations in model species. Here, we show that a fragmented draft assembly is sufficient to perform mapping-by-sequencing in nonmodel species. We generated a draft assembly and annotation of the genome of the free-living nematode Oscheius tipulae, a distant relative of the model Caenorhabditis elegans. We used this draft to identify the likely causative mutations at the O. tipulae cov-3 locus, which affect vulval development. The cov-3 locus encodes the O. tipulae ortholog of C. elegans mig-13, and we further show that Cel-mig-13 mutants also have an unsuspected vulval-development phenotype. In a virtuous circle, we were able to use the linkage information collected during mutant mapping to improve the genome assembly. These results showcase the promise of genome-enabled forward genetics in nonmodel species. PMID:28630114

  3. Toward Universal Forward Genetics: Using a Draft Genome Sequence of the Nematode Oscheius tipulae To Identify Mutations Affecting Vulva Development.

    PubMed

    Besnard, Fabrice; Koutsovoulos, Georgios; Dieudonné, Sana; Blaxter, Mark; Félix, Marie-Anne

    2017-08-01

    Mapping-by-sequencing has become a standard method to map and identify phenotype-causing mutations in model species. Here, we show that a fragmented draft assembly is sufficient to perform mapping-by-sequencing in nonmodel species. We generated a draft assembly and annotation of the genome of the free-living nematode Oscheius tipulae , a distant relative of the model Caenorhabditis elegans We used this draft to identify the likely causative mutations at the O. tipulae cov -3 locus, which affect vulval development. The cov-3 locus encodes the O. tipulae ortholog of C. elegans mig-13 , and we further show that Cel-mig-13 mutants also have an unsuspected vulval-development phenotype. In a virtuous circle, we were able to use the linkage information collected during mutant mapping to improve the genome assembly. These results showcase the promise of genome-enabled forward genetics in nonmodel species. Copyright © 2017 by the Genetics Society of America.

  4. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  5. DEVELOPMENT OF EPA'S TOXCAST PROGRAM FOR PRIORITIZING THE TOXICITY TESTING OF ENVIRONMENTAL CHEMICALS.

    EPA Science Inventory

    EPA is developing methods for utilizing computational chemistry, high-throughput screening (HTS)and genomic technologies to predict potential toxicity and prioritize the use of limited testing resources.

  6. Persistency of accuracy of genomic breeding values for different simulated pig breeding programs in developing countries.

    PubMed

    Akanno, E C; Schenkel, F S; Sargolzaei, M; Friendship, R M; Robinson, J A B

    2014-10-01

    Genetic improvement of pigs in tropical developing countries has focused on imported exotic populations which have been subjected to intensive selection with attendant high population-wide linkage disequilibrium (LD). Presently, indigenous pig population with limited selection and low LD are being considered for improvement. Given that the infrastructure for genetic improvement using the conventional BLUP selection methods are lacking, a genome-wide selection (GS) program was proposed for developing countries. A simulation study was conducted to evaluate the option of using 60 K SNP panel and observed amount of LD in the exotic and indigenous pig populations. Several scenarios were evaluated including different size and structure of training and validation populations, different selection methods and long-term accuracy of GS in different population/breeding structures and traits. The training set included previously selected exotic population, unselected indigenous population and their crossbreds. Traits studied included number born alive (NBA), average daily gain (ADG) and back fat thickness (BFT). The ridge regression method was used to train the prediction model. The results showed that accuracies of genomic breeding values (GBVs) in the range of 0.30 (NBA) to 0.86 (BFT) in the validation population are expected if high density marker panels are utilized. The GS method improved accuracy of breeding values better than pedigree-based approach for traits with low heritability and in young animals with no performance data. Crossbred training population performed better than purebreds when validation was in populations with similar or a different structure as in the training set. Genome-wide selection holds promise for genetic improvement of pigs in the tropics. © 2014 Blackwell Verlag GmbH.

  7. Multiplex Polymerase Chain Reaction for Identification of Shigellae and Four Shigella Species Using Novel Genetic Markers Screened by Comparative Genomics.

    PubMed

    Kim, Hyun-Joong; Ryu, Ji-Oh; Song, Ji-Yeon; Kim, Hae-Yeong

    2017-07-01

    In the detection of Shigella species using molecular biological methods, previously known genetic markers for Shigella species were not sufficient to discriminate between Shigella species and diarrheagenic Escherichia coli. The purposes of this study were to screen for genetic markers of the Shigella genus and four Shigella species through comparative genomics and develop a multiplex polymerase chain reaction (PCR) for the detection of shigellae and Shigella species. A total of seven genomic DNA sequences from Shigella species were subjected to comparative genomics for the screening of genetic markers of shigellae and each Shigella species. The primer sets were designed from the screened genetic markers and evaluated using PCR with genomic DNAs from Shigella and other bacterial strains in Enterobacteriaceae. A novel Shigella quintuplex PCR, designed for the detection of Shigella genus, S. dysenteriae, S. boydii, S. flexneri, and S. sonnei, was developed from the evaluated primer sets, and its performance was demonstrated with specifically amplified results from each Shigella species. This Shigella multiplex PCR is the first to be reported with novel genetic markers developed through comparative genomics and may be a useful tool for the accurate detection of the Shigella genus and species from closely related bacteria in clinical microbiology and food safety.

  8. Prediction of lipoprotein signal peptides in Gram-negative bacteria.

    PubMed

    Juncker, Agnieszka S; Willenbrock, Hanni; Von Heijne, Gunnar; Brunak, Søren; Nielsen, Henrik; Krogh, Anders

    2003-08-01

    A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8% of the lipoproteins correctly with only 0.3% false positives in a set of SPaseI-cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram-positive lipoprotein signal peptides differ from Gram-negatives, the HMM was able to identify 92.9% of the lipoproteins included in a Gram-positive test set. A genome search was carried out for 12 Gram-negative genomes and one Gram-positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network-based predictor was developed for comparison, and it gave very similar results. LipoP is available as a Web server at www.cbs.dtu.dk/services/LipoP/.

  9. Prediction of lipoprotein signal peptides in Gram-negative bacteria

    PubMed Central

    Juncker, Agnieszka S.; Willenbrock, Hanni; von Heijne, Gunnar; Brunak, Søren; Nielsen, Henrik; Krogh, Anders

    2003-01-01

    A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8% of the lipoproteins correctly with only 0.3% false positives in a set of SPaseI-cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram-positive lipoprotein signal peptides differ from Gram-negatives, the HMM was able to identify 92.9% of the lipoproteins included in a Gram-positive test set. A genome search was carried out for 12 Gram-negative genomes and one Gram-positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network-based predictor was developed for comparison, and it gave very similar results. LipoP is available as a Web server at www.cbs.dtu.dk/services/LipoP/. PMID:12876315

  10. Genome editing reveals a role for OCT4 in human embryogenesis.

    PubMed

    Fogarty, Norah M E; McCarthy, Afshan; Snijders, Kirsten E; Powell, Benjamin E; Kubikova, Nada; Blakeley, Paul; Lea, Rebecca; Elder, Kay; Wamaitha, Sissy E; Kim, Daesik; Maciulyte, Valdone; Kleinjung, Jens; Kim, Jin-Soo; Wells, Dagan; Vallier, Ludovic; Bertero, Alessandro; Turner, James M A; Niakan, Kathy K

    2017-10-05

    Despite their fundamental biological and clinical importance, the molecular mechanisms that regulate the first cell fate decisions in the human embryo are not well understood. Here we use CRISPR-Cas9-mediated genome editing to investigate the function of the pluripotency transcription factor OCT4 during human embryogenesis. We identified an efficient OCT4-targeting guide RNA using an inducible human embryonic stem cell-based system and microinjection of mouse zygotes. Using these refined methods, we efficiently and specifically targeted the gene encoding OCT4 (POU5F1) in diploid human zygotes and found that blastocyst development was compromised. Transcriptomics analysis revealed that, in POU5F1-null cells, gene expression was downregulated not only for extra-embryonic trophectoderm genes, such as CDX2, but also for regulators of the pluripotent epiblast, including NANOG. By contrast, Pou5f1-null mouse embryos maintained the expression of orthologous genes, and blastocyst development was established, but maintenance was compromised. We conclude that CRISPR-Cas9-mediated genome editing is a powerful method for investigating gene function in the context of human development.

  11. Future of human mitochondrial DNA editing technologies.

    PubMed

    Verechshagina, N; Nikitchina, N; Yamada, Y; Harashima, Н; Tanaka, M; Orishchenko, K; Mazunin, I

    2018-05-15

    ATP and other metabolites, which are necessary for the development, maintenance, and functioning of bodily cells are all synthesized in the mitochondria. Multiple copies of the genome, present within the mitochondria, together with its maternal inheritance, determine the clinical manifestation and spreading of mutations in mitochondrial DNA (mtDNA). The main obstacle in the way of thorough understanding of mitochondrial biology and the development of gene therapy methods for mitochondrial diseases is the absence of systems that allow to directly change mtDNA sequence. Here, we discuss existing methods of manipulating the level of mtDNA heteroplasmy, as well as the latest systems, that could be used in the future as tools for human mitochondrial genome editing.

  12. Identifying genetic relatives without compromising privacy

    PubMed Central

    He, Dan; Furlotte, Nicholas A.; Hormozdiari, Farhad; Joo, Jong Wha J.; Wadia, Akshay; Ostrovsky, Rafail; Sahai, Amit; Eskin, Eleazar

    2014-01-01

    The development of high-throughput genomic technologies has impacted many areas of genetic research. While many applications of these technologies focus on the discovery of genes involved in disease from population samples, applications of genomic technologies to an individual’s genome or personal genomics have recently gained much interest. One such application is the identification of relatives from genetic data. In this application, genetic information from a set of individuals is collected in a database, and each pair of individuals is compared in order to identify genetic relatives. An inherent issue that arises in the identification of relatives is privacy. In this article, we propose a method for identifying genetic relatives without compromising privacy by taking advantage of novel cryptographic techniques customized for secure and private comparison of genetic information. We demonstrate the utility of these techniques by allowing a pair of individuals to discover whether or not they are related without compromising their genetic information or revealing it to a third party. The idea is that individuals only share enough special-purpose cryptographically protected information with each other to identify whether or not they are relatives, but not enough to expose any information about their genomes. We show in HapMap and 1000 Genomes data that our method can recover first- and second-order genetic relationships and, through simulations, show that our method can identify relationships as distant as third cousins while preserving privacy. PMID:24614977

  13. Identifying genetic relatives without compromising privacy.

    PubMed

    He, Dan; Furlotte, Nicholas A; Hormozdiari, Farhad; Joo, Jong Wha J; Wadia, Akshay; Ostrovsky, Rafail; Sahai, Amit; Eskin, Eleazar

    2014-04-01

    The development of high-throughput genomic technologies has impacted many areas of genetic research. While many applications of these technologies focus on the discovery of genes involved in disease from population samples, applications of genomic technologies to an individual's genome or personal genomics have recently gained much interest. One such application is the identification of relatives from genetic data. In this application, genetic information from a set of individuals is collected in a database, and each pair of individuals is compared in order to identify genetic relatives. An inherent issue that arises in the identification of relatives is privacy. In this article, we propose a method for identifying genetic relatives without compromising privacy by taking advantage of novel cryptographic techniques customized for secure and private comparison of genetic information. We demonstrate the utility of these techniques by allowing a pair of individuals to discover whether or not they are related without compromising their genetic information or revealing it to a third party. The idea is that individuals only share enough special-purpose cryptographically protected information with each other to identify whether or not they are relatives, but not enough to expose any information about their genomes. We show in HapMap and 1000 Genomes data that our method can recover first- and second-order genetic relationships and, through simulations, show that our method can identify relationships as distant as third cousins while preserving privacy.

  14. Dynamix: dynamic visualization by automatic selection of informative tracks from hundreds of genomic datasets.

    PubMed

    Monfort, Matthias; Furlong, Eileen E M; Girardot, Charles

    2017-07-15

    Visualization of genomic data is fundamental for gaining insights into genome function. Yet, co-visualization of a large number of datasets remains a challenge in all popular genome browsers and the development of new visualization methods is needed to improve the usability and user experience of genome browsers. We present Dynamix, a JBrowse plugin that enables the parallel inspection of hundreds of genomic datasets. Dynamix takes advantage of a priori knowledge to automatically display data tracks with signal within a genomic region of interest. As the user navigates through the genome, Dynamix automatically updates data tracks and limits all manual operations otherwise needed to adjust the data visible on screen. Dynamix also introduces a new carousel view that optimizes screen utilization by enabling users to independently scroll through groups of tracks. Dynamix is hosted at http://furlonglab.embl.de/Dynamix . charles.girardot@embl.de. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  15. Deciphering the distance to antibiotic resistance for the pneumococcus using genome sequencing data

    PubMed Central

    Mobegi, Fredrick M.; Cremers, Amelieke J. H.; de Jonge, Marien I.; Bentley, Stephen D.; van Hijum, Sacha A. F. T.; Zomer, Aldert

    2017-01-01

    Advances in genome sequencing technologies and genome-wide association studies (GWAS) have provided unprecedented insights into the molecular basis of microbial phenotypes and enabled the identification of the underlying genetic variants in real populations. However, utilization of genome sequencing in clinical phenotyping of bacteria is challenging due to the lack of reliable and accurate approaches. Here, we report a method for predicting microbial resistance patterns using genome sequencing data. We analyzed whole genome sequences of 1,680 Streptococcus pneumoniae isolates from four independent populations using GWAS and identified probable hotspots of genetic variation which correlate with phenotypes of resistance to essential classes of antibiotics. With the premise that accumulation of putative resistance-conferring SNPs, potentially in combination with specific resistance genes, precedes full resistance, we retrogressively surveyed the hotspot loci and quantified the number of SNPs and/or genes, which if accumulated would confer full resistance to an otherwise susceptible strain. We name this approach the ‘distance to resistance’. It can be used to identify the creep towards complete antibiotics resistance in bacteria using genome sequencing. This approach serves as a basis for the development of future sequencing-based methods for predicting resistance profiles of bacterial strains in hospital microbiology and public health settings. PMID:28205635

  16. Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates.

    PubMed

    Nakatani, Yoichiro; Takeda, Hiroyuki; Kohara, Yuji; Morishita, Shinichi

    2007-09-01

    Although several vertebrate genomes have been sequenced, little is known about the genome evolution of early vertebrates and how large-scale genomic changes such as the two rounds of whole-genome duplications (2R WGD) affected evolutionary complexity and novelty in vertebrates. Reconstructing the ancestral vertebrate genome is highly nontrivial because of the difficulty in identifying traces originating from the 2R WGD. To resolve this problem, we developed a novel method capable of pinning down remains of the 2R WGD in the human and medaka fish genomes using invertebrate tunicate and sea urchin genes to define ohnologs, i.e., paralogs produced by the 2R WGD. We validated the reconstruction using the chicken genome, which was not considered in the reconstruction step, and observed that many ancestral proto-chromosomes were retained in the chicken genome and had one-to-one correspondence to chicken microchromosomes, thereby confirming the reconstructed ancestral genomes. Our reconstruction revealed a contrast between the slow karyotype evolution after the second WGD and the rapid, lineage-specific genome reorganizations that occurred in the ancestral lineages of major taxonomic groups such as teleost fishes, amphibians, reptiles, and marsupials.

  17. Enhanced guide-RNA design and targeting analysis for precise CRISPR genome editing of single and consortia of industrially relevant and non-model organisms.

    PubMed

    Mendoza, Brian J; Trinh, Cong T

    2018-01-01

    Genetic diversity of non-model organisms offers a repertoire of unique phenotypic features for exploration and cultivation for synthetic biology and metabolic engineering applications. To realize this enormous potential, it is critical to have an efficient genome editing tool for rapid strain engineering of these organisms to perform novel programmed functions. To accommodate the use of CRISPR/Cas systems for genome editing across organisms, we have developed a novel method, named CRISPR Associated Software for Pathway Engineering and Research (CASPER), for identifying on- and off-targets with enhanced predictability coupled with an analysis of non-unique (repeated) targets to assist in editing any organism with various endonucleases. Utilizing CASPER, we demonstrated a modest 2.4% and significant 30.2% improvement (F-test, P < 0.05) over the conventional methods for predicting on- and off-target activities, respectively. Further we used CASPER to develop novel applications in genome editing: multitargeting analysis (i.e. simultaneous multiple-site modification on a target genome with a sole guide-RNA requirement) and multispecies population analysis (i.e. guide-RNA design for genome editing across a consortium of organisms). Our analysis on a selection of industrially relevant organisms revealed a number of non-unique target sites associated with genes and transposable elements that can be used as potential sites for multitargeting. The analysis also identified shared and unshared targets that enable genome editing of single or multiple genomes in a consortium of interest. We envision CASPER as a useful platform to enhance the precise CRISPR genome editing for metabolic engineering and synthetic biology applications. https://github.com/TrinhLab/CASPER. ctrinh@utk.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  18. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge.

    PubMed

    Brownstein, Catherine A; Beggs, Alan H; Homer, Nils; Merriman, Barry; Yu, Timothy W; Flannery, Katherine C; DeChene, Elizabeth T; Towne, Meghan C; Savage, Sarah K; Price, Emily N; Holm, Ingrid A; Luquette, Lovelace J; Lyon, Elaine; Majzoub, Joseph; Neupert, Peter; McCallie, David; Szolovits, Peter; Willard, Huntington F; Mendelsohn, Nancy J; Temme, Renee; Finkel, Richard S; Yum, Sabrina W; Medne, Livija; Sunyaev, Shamil R; Adzhubey, Ivan; Cassa, Christopher A; de Bakker, Paul I W; Duzkale, Hatice; Dworzyński, Piotr; Fairbrother, William; Francioli, Laurent; Funke, Birgit H; Giovanni, Monica A; Handsaker, Robert E; Lage, Kasper; Lebo, Matthew S; Lek, Monkol; Leshchiner, Ignaty; MacArthur, Daniel G; McLaughlin, Heather M; Murray, Michael F; Pers, Tune H; Polak, Paz P; Raychaudhuri, Soumya; Rehm, Heidi L; Soemedi, Rachel; Stitziel, Nathan O; Vestecka, Sara; Supper, Jochen; Gugenmus, Claudia; Klocke, Bernward; Hahn, Alexander; Schubach, Max; Menzel, Mortiz; Biskup, Saskia; Freisinger, Peter; Deng, Mario; Braun, Martin; Perner, Sven; Smith, Richard J H; Andorf, Janeen L; Huang, Jian; Ryckman, Kelli; Sheffield, Val C; Stone, Edwin M; Bair, Thomas; Black-Ziegelbein, E Ann; Braun, Terry A; Darbro, Benjamin; DeLuca, Adam P; Kolbe, Diana L; Scheetz, Todd E; Shearer, Aiden E; Sompallae, Rama; Wang, Kai; Bassuk, Alexander G; Edens, Erik; Mathews, Katherine; Moore, Steven A; Shchelochkov, Oleg A; Trapane, Pamela; Bossler, Aaron; Campbell, Colleen A; Heusel, Jonathan W; Kwitek, Anne; Maga, Tara; Panzer, Karin; Wassink, Thomas; Van Daele, Douglas; Azaiez, Hela; Booth, Kevin; Meyer, Nic; Segal, Michael M; Williams, Marc S; Tromp, Gerard; White, Peter; Corsmeier, Donald; Fitzgerald-Butt, Sara; Herman, Gail; Lamb-Thrush, Devon; McBride, Kim L; Newsom, David; Pierson, Christopher R; Rakowsky, Alexander T; Maver, Aleš; Lovrečić, Luca; Palandačić, Anja; Peterlin, Borut; Torkamani, Ali; Wedell, Anna; Huss, Mikael; Alexeyenko, Andrey; Lindvall, Jessica M; Magnusson, Måns; Nilsson, Daniel; Stranneheim, Henrik; Taylan, Fulya; Gilissen, Christian; Hoischen, Alexander; van Bon, Bregje; Yntema, Helger; Nelen, Marcel; Zhang, Weidong; Sager, Jason; Zhang, Lu; Blair, Kathryn; Kural, Deniz; Cariaso, Michael; Lennon, Greg G; Javed, Asif; Agrawal, Saloni; Ng, Pauline C; Sandhu, Komal S; Krishna, Shuba; Veeramachaneni, Vamsi; Isakov, Ofer; Halperin, Eran; Friedman, Eitan; Shomron, Noam; Glusman, Gustavo; Roach, Jared C; Caballero, Juan; Cox, Hannah C; Mauldin, Denise; Ament, Seth A; Rowen, Lee; Richards, Daniel R; San Lucas, F Anthony; Gonzalez-Garay, Manuel L; Caskey, C Thomas; Bai, Yu; Huang, Ying; Fang, Fang; Zhang, Yan; Wang, Zhengyuan; Barrera, Jorge; Garcia-Lobo, Juan M; González-Lamuño, Domingo; Llorca, Javier; Rodriguez, Maria C; Varela, Ignacio; Reese, Martin G; De La Vega, Francisco M; Kiruluta, Edward; Cargill, Michele; Hart, Reece K; Sorenson, Jon M; Lyon, Gholson J; Stevenson, David A; Bray, Bruce E; Moore, Barry M; Eilbeck, Karen; Yandell, Mark; Zhao, Hongyu; Hou, Lin; Chen, Xiaowei; Yan, Xiting; Chen, Mengjie; Li, Cong; Yang, Can; Gunel, Murat; Li, Peining; Kong, Yong; Alexander, Austin C; Albertyn, Zayed I; Boycott, Kym M; Bulman, Dennis E; Gordon, Paul M K; Innes, A Micheil; Knoppers, Bartha M; Majewski, Jacek; Marshall, Christian R; Parboosingh, Jillian S; Sawyer, Sarah L; Samuels, Mark E; Schwartzentruber, Jeremy; Kohane, Isaac S; Margulies, David M

    2014-03-25

    There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance. A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization. The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.

  19. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge

    PubMed Central

    2014-01-01

    Background There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance. Results A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization. Conclusions The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups. PMID:24667040

  20. Haplotype-Based Genotyping in Polyploids.

    PubMed

    Clevenger, Josh P; Korani, Walid; Ozias-Akins, Peggy; Jackson, Scott

    2018-01-01

    Accurate identification of polymorphisms from sequence data is crucial to unlocking the potential of high throughput sequencing for genomics. Single nucleotide polymorphisms (SNPs) are difficult to accurately identify in polyploid crops due to the duplicative nature of polyploid genomes leading to low confidence in the true alignment of short reads. Implementing a haplotype-based method in contrasting subgenome-specific sequences leads to higher accuracy of SNP identification in polyploids. To test this method, a large-scale 48K SNP array (Axiom Arachis2) was developed for Arachis hypogaea (peanut), an allotetraploid, in which 1,674 haplotype-based SNPs were included. Results of the array show that 74% of the haplotype-based SNP markers could be validated, which is considerably higher than previous methods used for peanut. The haplotype method has been implemented in a standalone program, HAPLOSWEEP, which takes as input bam files and a vcf file and identifies haplotype-based markers. Haplotype discovery can be made within single reads or span paired reads, and can leverage long read technology by targeting any length of haplotype. Haplotype-based genotyping is applicable in all allopolyploid genomes and provides confidence in marker identification and in silico-based genotyping for polyploid genomics.

  1. Genome-Wide Convergence during Evolution of Mangroves from Woody Plants.

    PubMed

    Xu, Shaohua; He, Ziwen; Guo, Zixiao; Zhang, Zhang; Wyckoff, Gerald J; Greenberg, Anthony; Wu, Chung-I; Shi, Suhua

    2017-04-01

    When living organisms independently invade a new environment, the evolution of similar phenotypic traits is often observed. An interesting but contentious issue is whether the underlying molecular biology also converges in the new habitat. Independent invasions of tropical intertidal zones by woody plants, collectively referred to as mangrove trees, represent some dramatic examples. The high salinity, hypoxia, and other stressors in the new habitat might have affected both genomic features and protein structures. Here, we developed a new method for detecting convergence at conservative Sites (CCS) and applied it to the genomic sequences of mangroves. In simulations, the CCS method drastically reduces random convergence at rapidly evolving sites as well as falsely inferred convergence caused by the misinferences of the ancestral character. In mangrove genomes, we estimated ∼400 genes that have experienced convergence over the background level of convergence in the nonmangrove relatives. The convergent genes are enriched in pathways related to stress response and embryo development, which could be important for mangroves' adaptation to the new habitat. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Ion Torrent sequencing as a tool for mutation discovery in the flax (Linum usitatissimum L.) genome.

    PubMed

    Galindo-González, Leonardo; Pinzón-Latorre, David; Bergen, Erik A; Jensen, Dustin C; Deyholos, Michael K

    2015-01-01

    Detection of induced mutations is valuable for inferring gene function and for developing novel germplasm for crop improvement. Many reverse genetics approaches have been developed to identify mutations in genes of interest within a mutagenized population, including some approaches that rely on next-generation sequencing (e.g. exome capture, whole genome resequencing). As an alternative to these genome or exome-scale methods, we sought to develop a scalable and efficient method for detection of induced mutations that could be applied to a small number of target genes, using Ion Torrent technology. We developed this method in flax (Linum usitatissimum), to demonstrate its utility in a crop species. We used an amplicon-based approach in which DNA samples from an ethyl methanesulfonate (EMS)-mutagenized population were pooled and used as template in PCR reactions to amplify a region of each gene of interest. Barcodes were incorporated during PCR, and the pooled amplicons were sequenced using an Ion Torrent PGM. A pilot experiment with known SNPs showed that they could be detected at a frequency > 0.3% within the pools. We then selected eight genes for which we wanted to discover novel mutations, and applied our approach to screen 768 individuals from the EMS population, using either the Ion 314 or Ion 316 chips. Out of 29 potential mutations identified after processing the NGS reads, 16 mutations were confirmed using Sanger sequencing. The methodology presented here demonstrates the utility of Ion Torrent technology in detecting mutation variants in specific genome regions for large populations of a species such as flax. The methodology could be scaled-up to test >100 genes using the higher capacity chips now available from Ion Torrent.

  3. Development of genomic tools in a widespread tropical tree, Symphonia globulifera L.f.: a new low-coverage draft genome, SNP and SSR markers.

    PubMed

    Olsson, Sanna; Seoane-Zonjic, Pedro; Bautista, Rocío; Claros, M Gonzalo; González-Martínez, Santiago C; Scotti, Ivan; Scotti-Saintagne, Caroline; Hardy, Olivier J; Heuertz, Myriam

    2017-07-01

    Population genetic studies in tropical plants are often challenging because of limited information on taxonomy, phylogenetic relationships and distribution ranges, scarce genomic information and logistic challenges in sampling. We describe a strategy to develop robust and widely applicable genetic markers based on a modest development of genomic resources in the ancient tropical tree species Symphonia globulifera L.f. (Clusiaceae), a keystone species in African and Neotropical rainforests. We provide the first low-coverage (11X) fragmented draft genome sequenced on an individual from Cameroon, covering 1.027 Gbp or 67.5% of the estimated genome size. Annotation of 565 scaffolds (7.57 Mbp) resulted in the prediction of 1046 putative genes (231 of them containing a complete open reading frame) and 1523 exact simple sequence repeats (SSRs, microsatellites). Aligning a published transcriptome of a French Guiana population against this draft genome produced 923 high-quality single nucleotide polymorphisms. We also preselected genic SSRs in silico that were conserved and polymorphic across a wide geographical range, thus reducing marker development tests on rare DNA samples. Of 23 SSRs tested, 19 amplified and 18 were successfully genotyped in four S. globulifera populations from South America (Brazil and French Guiana) and Africa (Cameroon and São Tomé island, F ST  = 0.34). Most loci showed only population-specific deviations from Hardy-Weinberg proportions, pointing to local population effects (e.g. null alleles). The described genomic resources are valuable for evolutionary studies in Symphonia and for comparative studies in plants. The methods are especially interesting for widespread tropical or endangered taxa with limited DNA availability. © 2016 John Wiley & Sons Ltd.

  4. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

    PubMed

    Muley, Vijaykumar Yogesh; Ranjan, Akash

    2012-01-01

    Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.

  5. A dual selection based, targeted gene replacement tool for Magnaporthe grisea and Fusarium oxysporum.

    PubMed

    Khang, Chang Hyun; Park, Sook-Young; Lee, Yong-Hwan; Kang, Seogchan

    2005-06-01

    Rapid progress in fungal genome sequencing presents many new opportunities for functional genomic analysis of fungal biology through the systematic mutagenesis of the genes identified through sequencing. However, the lack of efficient tools for targeted gene replacement is a limiting factor for fungal functional genomics, as it often necessitates the screening of a large number of transformants to identify the desired mutant. We developed an efficient method of gene replacement and evaluated factors affecting the efficiency of this method using two plant pathogenic fungi, Magnaporthe grisea and Fusarium oxysporum. This method is based on Agrobacterium tumefaciens-mediated transformation with a mutant allele of the target gene flanked by the herpes simplex virus thymidine kinase (HSVtk) gene as a conditional negative selection marker against ectopic transformants. The HSVtk gene product converts 5-fluoro-2'-deoxyuridine to a compound toxic to diverse fungi. Because ectopic transformants express HSVtk, while gene replacement mutants lack HSVtk, growing transformants on a medium amended with 5-fluoro-2'-deoxyuridine facilitates the identification of targeted mutants by counter-selecting against ectopic transformants. In addition to M. grisea and F. oxysporum, the method and associated vectors are likely to be applicable to manipulating genes in a broad spectrum of fungi, thus potentially serving as an efficient, universal functional genomic tool for harnessing the growing body of fungal genome sequence data to study fungal biology.

  6. Single-cell paired-end genome sequencing reveals structural variation per cell cycle

    PubMed Central

    Voet, Thierry; Kumar, Parveen; Van Loo, Peter; Cooke, Susanna L.; Marshall, John; Lin, Meng-Lay; Zamani Esteki, Masoud; Van der Aa, Niels; Mateiu, Ligia; McBride, David J.; Bignell, Graham R.; McLaren, Stuart; Teague, Jon; Butler, Adam; Raine, Keiran; Stebbings, Lucy A.; Quail, Michael A.; D’Hooghe, Thomas; Moreau, Yves; Futreal, P. Andrew; Stratton, Michael R.; Vermeesch, Joris R.; Campbell, Peter J.

    2013-01-01

    The nature and pace of genome mutation is largely unknown. Because standard methods sequence DNA from populations of cells, the genetic composition of individual cells is lost, de novo mutations in cells are concealed within the bulk signal and per cell cycle mutation rates and mechanisms remain elusive. Although single-cell genome analyses could resolve these problems, such analyses are error-prone because of whole-genome amplification (WGA) artefacts and are limited in the types of DNA mutation that can be discerned. We developed methods for paired-end sequence analysis of single-cell WGA products that enable (i) detecting multiple classes of DNA mutation, (ii) distinguishing DNA copy number changes from allelic WGA-amplification artefacts by the discovery of matching aberrantly mapping read pairs among the surfeit of paired-end WGA and mapping artefacts and (iii) delineating the break points and architecture of structural variants. By applying the methods, we capture DNA copy number changes acquired over one cell cycle in breast cancer cells and in blastomeres derived from a human zygote after in vitro fertilization. Furthermore, we were able to discover and fine-map a heritable inter-chromosomal rearrangement t(1;16)(p36;p12) by sequencing a single blastomere. The methods will expedite applications in basic genome research and provide a stepping stone to novel approaches for clinical genetic diagnosis. PMID:23630320

  7. Improved regulatory element prediction based on tissue-specific local epigenomic signatures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Yupeng; Gorkin, David U.; Dickel, Diane E.

    Accurate enhancer identification is critical for understanding the spatiotemporal transcriptional regulation during development as well as the functional impact of disease-related noncoding genetic variants. Computational methods have been developed to predict the genomic locations of active enhancers based on histone modifications, but the accuracy and resolution of these methods remain limited. Here, we present an algorithm, regulator y element prediction based on tissue-specific local epigenetic marks (REPTILE), which integrates histone modification and whole-genome cytosine DNA methylation profiles to identify the precise location of enhancers. We tested the ability of REPTILE to identify enhancers previously validated in reporter assays. Compared withmore » existing methods, REPTILE shows consistently superior performance across diverse cell and tissue types, and the enhancer locations are significantly more refined. We show that, by incorporating base-resolution methylation data, REPTILE greatly improves upon current methods for annotation of enhancers across a variety of cell and tissue types.« less

  8. Improved regulatory element prediction based on tissue-specific local epigenomic signatures

    DOE PAGES

    He, Yupeng; Gorkin, David U.; Dickel, Diane E.; ...

    2017-02-13

    Accurate enhancer identification is critical for understanding the spatiotemporal transcriptional regulation during development as well as the functional impact of disease-related noncoding genetic variants. Computational methods have been developed to predict the genomic locations of active enhancers based on histone modifications, but the accuracy and resolution of these methods remain limited. Here, we present an algorithm, regulator y element prediction based on tissue-specific local epigenetic marks (REPTILE), which integrates histone modification and whole-genome cytosine DNA methylation profiles to identify the precise location of enhancers. We tested the ability of REPTILE to identify enhancers previously validated in reporter assays. Compared withmore » existing methods, REPTILE shows consistently superior performance across diverse cell and tissue types, and the enhancer locations are significantly more refined. We show that, by incorporating base-resolution methylation data, REPTILE greatly improves upon current methods for annotation of enhancers across a variety of cell and tissue types.« less

  9. Genetic and Epigenetic Changes in Oilseed Rape (Brassica napus L.) Extracted from Intergeneric Allopolyploid and Additions with Orychophragmus.

    PubMed

    Gautam, Mayank; Dang, Yanwei; Ge, Xianhong; Shao, Yujiao; Li, Zaiyun

    2016-01-01

    Allopolyploidization with the merger of the genomes from different species has been shown to be associated with genetic and epigenetic changes. But the maintenance of such alterations related to one parental species after the genome is extracted from the allopolyploid remains to be detected. In this study, the genome of Brassica napus L. (2n = 38, genomes AACC) was extracted from its intergeneric allohexaploid (2n = 62, genomes AACCOO) with another crucifer Orychophragmus violaceus (2n = 24, genome OO), by backcrossing and development of alien addition lines. B. napus-type plants identified in the self-pollinated progenies of nine monosomic additions were analyzed by the methods of amplified fragment length polymorphism, sequence-specific amplified polymorphism, and methylation-sensitive amplified polymorphism. They showed modifications to certain extents in genomic components (loss and gain of DNA segments and transposons, introgression of alien DNA segments) and DNA methylation, compared with B. napus donor. The significant differences in the changes between the B. napus types extracted from these additions likely resulted from the different effects of individual alien chromosomes. Particularly, the additions which harbored the O. violaceus chromosome carrying dominant rRNA genes over those of B. napus tended to result in the development of plants which showed fewer changes, suggesting a role of the expression levels of alien rRNA genes in genomic stability. These results provided new cues for the genetic alterations in one parental genome that are maintained even after the genome becomes independent.

  10. Genetic and Epigenetic Changes in Oilseed Rape (Brassica napus L.) Extracted from Intergeneric Allopolyploid and Additions with Orychophragmus

    PubMed Central

    Gautam, Mayank; Dang, Yanwei; Ge, Xianhong; Shao, Yujiao; Li, Zaiyun

    2016-01-01

    Allopolyploidization with the merger of the genomes from different species has been shown to be associated with genetic and epigenetic changes. But the maintenance of such alterations related to one parental species after the genome is extracted from the allopolyploid remains to be detected. In this study, the genome of Brassica napus L. (2n = 38, genomes AACC) was extracted from its intergeneric allohexaploid (2n = 62, genomes AACCOO) with another crucifer Orychophragmus violaceus (2n = 24, genome OO), by backcrossing and development of alien addition lines. B. napus-type plants identified in the self-pollinated progenies of nine monosomic additions were analyzed by the methods of amplified fragment length polymorphism, sequence-specific amplified polymorphism, and methylation-sensitive amplified polymorphism. They showed modifications to certain extents in genomic components (loss and gain of DNA segments and transposons, introgression of alien DNA segments) and DNA methylation, compared with B. napus donor. The significant differences in the changes between the B. napus types extracted from these additions likely resulted from the different effects of individual alien chromosomes. Particularly, the additions which harbored the O. violaceus chromosome carrying dominant rRNA genes over those of B. napus tended to result in the development of plants which showed fewer changes, suggesting a role of the expression levels of alien rRNA genes in genomic stability. These results provided new cues for the genetic alterations in one parental genome that are maintained even after the genome becomes independent. PMID:27148282

  11. Solving gap metabolites and blocked reactions in genome-scale models: application to the metabolic network of Blattabacterium cuenoti.

    PubMed

    Ponce-de-León, Miguel; Montero, Francisco; Peretó, Juli

    2013-10-31

    Metabolic reconstruction is the computational-based process that aims to elucidate the network of metabolites interconnected through reactions catalyzed by activities assigned to one or more genes. Reconstructed models may contain inconsistencies that appear as gap metabolites and blocked reactions. Although automatic methods for solving this problem have been previously developed, there are many situations where manual curation is still needed. We introduce a general definition of gap metabolite that allows its detection in a straightforward manner. Moreover, a method for the detection of Unconnected Modules, defined as isolated sets of blocked reactions connected through gap metabolites, is proposed. The method has been successfully applied to the curation of iCG238, the genome-scale metabolic model for the bacterium Blattabacterium cuenoti, obligate endosymbiont of cockroaches. We found the proposed approach to be a valuable tool for the curation of genome-scale metabolic models. The outcome of its application to the genome-scale model B. cuenoti iCG238 is a more accurate model version named as B. cuenoti iMP240.

  12. The opportunities and challenges of large-scale molecular approaches to songbird neurobiology

    PubMed Central

    Mello, C.V.; Clayton, D.F.

    2014-01-01

    High-through put methods for analyzing genome structure and function are having a large impact in song-bird neurobiology. Methods include genome sequencing and annotation, comparative genomics, DNA microarrays and transcriptomics, and the development of a brain atlas of gene expression. Key emerging findings include the identification of complex transcriptional programs active during singing, the robust brain expression of non-coding RNAs, evidence of profound variations in gene expression across brain regions, and the identification of molecular specializations within song production and learning circuits. Current challenges include the statistical analysis of large datasets, effective genome curations, the efficient localization of gene expression changes to specific neuronal circuits and cells, and the dissection of behavioral and environmental factors that influence brain gene expression. The field requires efficient methods for comparisons with organisms like chicken, which offer important anatomical, functional and behavioral contrasts. As sequencing costs plummet, opportunities emerge for comparative approaches that may help reveal evolutionary transitions contributing to vocal learning, social behavior and other properties that make songbirds such compelling research subjects. PMID:25280907

  13. Multi-Instance Metric Transfer Learning for Genome-Wide Protein Function Prediction.

    PubMed

    Xu, Yonghui; Min, Huaqing; Wu, Qingyao; Song, Hengjie; Ye, Bicui

    2017-02-06

    Multi-Instance (MI) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with multiple instances. Many studies in this literature attempted to find an appropriate Multi-Instance Learning (MIL) method for genome-wide protein function prediction under a usual assumption, the underlying distribution from testing data (target domain, i.e., TD) is the same as that from training data (source domain, i.e., SD). However, this assumption may be violated in real practice. To tackle this problem, in this paper, we propose a Multi-Instance Metric Transfer Learning (MIMTL) approach for genome-wide protein function prediction. In MIMTL, we first transfer the source domain distribution to the target domain distribution by utilizing the bag weights. Then, we construct a distance metric learning method with the reweighted bags. At last, we develop an alternative optimization scheme for MIMTL. Comprehensive experimental evidence on seven real-world organisms verifies the effectiveness and efficiency of the proposed MIMTL approach over several state-of-the-art methods.

  14. Capturing Three-Dimensional Genome Organization in Individual Cells by Single-Cell Hi-C.

    PubMed

    Nagano, Takashi; Wingett, Steven W; Fraser, Peter

    2017-01-01

    Hi-C is a powerful method to investigate genome-wide, higher-order chromatin and chromosome conformations averaged from a population of cells. To expand the potential of Hi-C for single-cell analysis, we developed single-cell Hi-C. Similar to the existing "ensemble" Hi-C method, single-cell Hi-C detects proximity-dependent ligation events between cross-linked and restriction-digested chromatin fragments in cells. A major difference between the single-cell Hi-C and ensemble Hi-C protocol is that the proximity-dependent ligation is carried out in the nucleus. This allows the isolation of individual cells in which nearly the entire Hi-C procedure has been carried out, enabling the production of a Hi-C library and data from individual cells. With this new method, we studied genome conformations and found evidence for conserved topological domain organization from cell to cell, but highly variable interdomain contacts and chromosome folding genome wide. In addition, we found that the single-cell Hi-C protocol provided cleaner results with less technical noise suggesting it could be used to improve the ensemble Hi-C technique.

  15. Mapping Challenging Mutations by Whole-Genome Sequencing

    PubMed Central

    Smith, Harold E.; Fabritius, Amy S.; Jaramillo-Lambert, Aimee; Golden, Andy

    2016-01-01

    Whole-genome sequencing provides a rapid and powerful method for identifying mutations on a global scale, and has spurred a renewed enthusiasm for classical genetic screens in model organisms. The most commonly characterized category of mutation consists of monogenic, recessive traits, due to their genetic tractability. Therefore, most of the mapping methods for mutation identification by whole-genome sequencing are directed toward alleles that fulfill those criteria (i.e., single-gene, homozygous variants). However, such approaches are not entirely suitable for the characterization of a variety of more challenging mutations, such as dominant and semidominant alleles or multigenic traits. Therefore, we have developed strategies for the identification of those classes of mutations, using polymorphism mapping in Caenorhabditis elegans as our model for validation. We also report an alternative approach for mutation identification from traditional recombinant crosses, and a solution to the technical challenge of sequencing sterile or terminally arrested strains where population size is limiting. The methods described herein extend the applicability of whole-genome sequencing to a broader spectrum of mutations, including classes that are difficult to map by traditional means. PMID:26945029

  16. Transitioning from Forensic Genetics to Forensic Genomics

    PubMed Central

    Kayser, Manfred

    2017-01-01

    Due to its support of law enforcement, forensics is a conservative field; nevertheless, driven by scientific and technological progress, forensic genetics is slowly transitioning into forensic genomics. With this Special Issue of Genes we acknowledge and appreciate this rather recent development by not only introducing the field of forensics to the wider community of geneticists, but we do so by emphasizing on different topics of forensic relevance where genomic, transcriptomic, and epigenomic principles, methods, and datasets of humans and beyond are beginning to be used to answer forensic questions. PMID:29271907

  17. CRISPR/Cas9 Based Genome Editing of Penicillium chrysogenum.

    PubMed

    Pohl, C; Kiel, J A K W; Driessen, A J M; Bovenberg, R A L; Nygård, Y

    2016-07-15

    CRISPR/Cas9 based systems have emerged as versatile platforms for precision genome editing in a wide range of organisms. Here we have developed powerful CRISPR/Cas9 tools for marker-based and marker-free genome modifications in Penicillium chrysogenum, a model filamentous fungus and industrially relevant cell factory. The developed CRISPR/Cas9 toolbox is highly flexible and allows editing of new targets with minimal cloning efforts. The Cas9 protein and the sgRNA can be either delivered during transformation, as preassembled CRISPR-Cas9 ribonucleoproteins (RNPs) or expressed from an AMA1 based plasmid within the cell. The direct delivery of the Cas9 protein with in vitro synthesized sgRNA to the cells allows for a transient method for genome engineering that may rapidly be applicable for other filamentous fungi. The expression of Cas9 from an AMA1 based vector was shown to be highly efficient for marker-free gene deletions.

  18. Development and Molecular Characterization of Novel Polymorphic Genomic DNA SSR Markers in Lentinula edodes.

    PubMed

    Moon, Suyun; Lee, Hwa-Yong; Shim, Donghwan; Kim, Myungkil; Ka, Kang-Hyeon; Ryoo, Rhim; Ko, Han-Gyu; Koo, Chang-Duck; Chung, Jong-Wook; Ryu, Hojin

    2017-06-01

    Sixteen genomic DNA simple sequence repeat (SSR) markers of Lentinula edodes were developed from 205 SSR motifs present in 46.1-Mb long L. edodes genome sequences. The number of alleles ranged from 3-14 and the major allele frequency was distributed from 0.17-0.96. The values of observed and expected heterozygosity ranged from 0.00-0.76 and 0.07-0.90, respectively. The polymorphic information content value ranged from 0.07-0.89. A dendrogram, based on 16 SSR markers clustered by the paired hierarchical clustering' method, showed that 33 shiitake cultivars could be divided into three major groups and successfully identified. These SSR markers will contribute to the efficient breeding of this species by providing diversity in shiitake varieties. Furthermore, the genomic information covered by the markers can provide a valuable resource for genetic linkage map construction, molecular mapping, and marker-assisted selection in the shiitake mushroom.

  19. PATENTS IN GENOMICS AND HUMAN GENETICS

    PubMed Central

    Cook-Deegan, Robert; Heaney, Christopher

    2010-01-01

    Genomics and human genetics are scientifically fundamental and commercially valuable. These fields grew to prominence in an era of growth in government and nonprofit research funding, and of even greater growth of privately funded research and development in biotechnology and pharmaceuticals. Patents on DNA technologies are a central feature of this story, illustrating how patent law adapts---and sometimes fails to adapt---to emerging genomic technologies. In instrumentation and for therapeutic proteins, patents have largely played their traditional role of inducing investment in engineering and product development, including expensive postdiscovery clinical research to prove safety and efficacy. Patents on methods and DNA sequences relevant to clinical genetic testing show less evidence of benefits and more evidence of problems and impediments, largely attributable to university exclusive licensing practices. Whole-genome sequencing will confront uncertainty about infringing granted patents but jurisprudence trends away from upholding the broadest and potentially most troublesome patent claims. PMID:20590431

  20. Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services.

    PubMed

    Wan, Zhiyu; Vorobeychik, Yevgeniy; Kantarcioglu, Murat; Malin, Bradley

    2017-07-26

    Genomic data is increasingly collected by a wide array of organizations. As such, there is a growing demand to make summary information about such collections available more widely. However, over the past decade, a series of investigations have shown that attacks, rooted in statistical inference methods, can be applied to discern the presence of a known individual's DNA sequence in the pool of subjects. Recently, it was shown that the Beacon Project of the Global Alliance for Genomics and Health, a web service for querying about the presence (or absence) of a specific allele, was vulnerable. The Integrating Data for Analysis, Anonymization, and Sharing (iDASH) Center modeled a track in their third Privacy Protection Challenge on how to mitigate the Beacon vulnerability. We developed the winning solution for this track. This paper describes our computational method to optimize the tradeoff between the utility and the privacy of the Beacon service. We generalize the genomic data sharing problem beyond that which was introduced in the iDASH Challenge to be more representative of real world scenarios to allow for a more comprehensive evaluation. We then conduct a sensitivity analysis of our method with respect to several state-of-the-art methods using a dataset of 400,000 positions in Chromosome 10 for 500 individuals from Phase 3 of the 1000 Genomes Project. All methods are evaluated for utility, privacy and efficiency. Our method achieves better performance than all state-of-the-art methods, irrespective of how key factors (e.g., the allele frequency in the population, the size of the pool and utility weights) change from the original parameters of the problem. We further illustrate that it is possible for our method to exhibit subpar performance under special cases of allele query sequences. However, we show our method can be extended to address this issue when the query sequence is fixed and known a priori to the data custodian, so that they may plan stage their responses accordingly. This research shows that it is possible to thwart the attack on Beacon services, without substantially altering the utility of the system, using computational methods. The method we initially developed is limited by the design of the scenario and evaluation protocol for the iDASH Challenge; however, it can be improved by allowing the data custodian to act in a staged manner.

  1. Linear and exponential TAIL-PCR: a method for efficient and quick amplification of flanking sequences adjacent to Tn5 transposon insertion sites.

    PubMed

    Jia, Xianbo; Lin, Xinjian; Chen, Jichen

    2017-11-02

    Current genome walking methods are very time consuming, and many produce non-specific amplification products. To amplify the flanking sequences that are adjacent to Tn5 transposon insertion sites in Serratia marcescens FZSF02, we developed a genome walking method based on TAIL-PCR. This PCR method added a 20-cycle linear amplification step before the exponential amplification step to increase the concentration of the target sequences. Products of the linear amplification and the exponential amplification were diluted 100-fold to decrease the concentration of the templates that cause non-specific amplification. Fast DNA polymerase with a high extension speed was used in this method, and an amplification program was used to rapidly amplify long specific sequences. With this linear and exponential TAIL-PCR (LETAIL-PCR), we successfully obtained products larger than 2 kb from Tn5 transposon insertion mutant strains within 3 h. This method can be widely used in genome walking studies to amplify unknown sequences that are adjacent to known sequences.

  2. A Rapid Method for Engineering Recombinant Polioviruses or Other Enteroviruses.

    PubMed

    Bessaud, Maël; Pelletier, Isabelle; Blondel, Bruno; Delpeyroux, Francis

    2016-01-01

    The cloning of large enterovirus RNA sequences is labor-intensive because of the frequent instability in bacteria of plasmidic vectors containing the corresponding cDNAs. In order to circumvent this issue we have developed a PCR-based method that allows the generation of highly modified or chimeric full-length enterovirus genomes. This method relies on fusion PCR which enables the concatenation of several overlapping cDNA amplicons produced separately. A T7 promoter sequence added upstream the fusion PCR products allows its transcription into infectious genomic RNAs directly in transfected cells constitutively expressing the phage T7 RNA polymerase. This method permits the rapid recovery of modified viruses that can be subsequently amplified on adequate cell-lines.

  3. Comparative analysis of gene regulatory networks: from network reconstruction to evolution.

    PubMed

    Thompson, Dawn; Regev, Aviv; Roy, Sushmita

    2015-01-01

    Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.

  4. Using in Vitro Evolution and Whole Genome Analysis To Discover Next Generation Targets for Antimalarial Drug Discovery

    PubMed Central

    2018-01-01

    Although many new anti-infectives have been discovered and developed solely using phenotypic cellular screening and assay optimization, most researchers recognize that structure-guided drug design is more practical and less costly. In addition, a greater chemical space can be interrogated with structure-guided drug design. The practicality of structure-guided drug design has launched a search for the targets of compounds discovered in phenotypic screens. One method that has been used extensively in malaria parasites for target discovery and chemical validation is in vitro evolution and whole genome analysis (IVIEWGA). Here, small molecules from phenotypic screens with demonstrated antiparasitic activity are used in genome-based target discovery methods. In this Review, we discuss the newest, most promising druggable targets discovered or further validated by evolution-based methods, as well as some exceptions. PMID:29451780

  5. Efficient engineering of marker-free synthetic allotetraploids of Saccharomyces.

    PubMed

    Alexander, William G; Peris, David; Pfannenstiel, Brandon T; Opulente, Dana A; Kuang, Meihua; Hittinger, Chris Todd

    2016-04-01

    Saccharomyces interspecies hybrids are critical biocatalysts in the fermented beverage industry, including in the production of lager beers, Belgian ales, ciders, and cold-fermented wines. Current methods for making synthetic interspecies hybrids are cumbersome and/or require genome modifications. We have developed a simple, robust, and efficient method for generating allotetraploid strains of prototrophic Saccharomyces without sporulation or nuclear genome manipulation. S. cerevisiae×S. eubayanus, S. cerevisiae×S. kudriavzevii, and S. cerevisiae×S. uvarum designer hybrid strains were created as synthetic lager, Belgian, and cider strains, respectively. The ploidy and hybrid nature of the strains were confirmed using flow cytometry and PCR-RFLP analysis, respectively. This method provides an efficient means for producing novel synthetic hybrids for beverage and biofuel production, as well as for constructing tetraploids to be used for basic research in evolutionary genetics and genome stability. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Improved annotation of the insect vector of citrus greening disease: biocuration by a diverse genomics community

    PubMed Central

    Hosmani, Prashant S.; Villalobos-Ayala, Krystal; Miller, Sherry; Shippy, Teresa; Flores, Mirella; Rosendale, Andrew; Cordola, Chris; Bell, Tracey; Mann, Hannah; DeAvila, Gabe; DeAvila, Daniel; Moore, Zachary; Buller, Kyle; Ciolkevich, Kathryn; Nandyal, Samantha; Mahoney, Robert; Van Voorhis, Joshua; Dunlevy, Megan; Farrow, David; Hunter, David; Morgan, Taylar; Shore, Kayla; Guzman, Victoria; Izsak, Allison; Dixon, Danielle E.; Cridge, Andrew; Cano, Liliana; Cao, Xiaolong; Jiang, Haobo; Leng, Nan; Johnson, Shannon; Cantarel, Brandi L.; Richards, Stephen; English, Adam; Shatters, Robert G.; Childers, Chris; Chen, Mei-Ju; Hunter, Wayne; Cilia, Michelle; Mueller, Lukas A.; Munoz-Torres, Monica; Nelson, David; Poelchau, Monica F.; Benoit, Joshua B.; Wiersma-Koch, Helen; D’Elia, Tom; Brown, Susan J.

    2017-01-01

    Abstract The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models. Database URL: https://citrusgreening.org/ PMID:29220441

  7. Clinical evaluation incorporating a personal genome

    PubMed Central

    Ashley, Euan A.; Butte, Atul J.; Wheeler, Matthew T.; Chen, Rong; Klein, Teri E.; Dewey, Frederick E.; Dudley, Joel T.; Ormond, Kelly E.; Pavlovic, Aleksandra; Hudgins, Louanne; Gong, Li; Hodges, Laura M.; Berlin, Dorit S.; Thorn, Caroline F.; Sangkuhl, Katrin; Hebert, Joan M.; Woon, Mark; Sagreiya, Hersh; Whaley, Ryan; Morgan, Alexander A.; Pushkarev, Dmitry; Neff, Norma F; Knowles, Joshua W.; Chou, Mike; Thakuria, Joseph; Rosenbaum, Abraham; Zaranek, Alexander Wait; Church, George; Greely, Henry T.; Quake, Stephen R.; Altman, Russ B.

    2010-01-01

    Background The cost of genomic information has fallen steeply but the path to clinical translation of risk estimates for common variants found in genome wide association studies remains unclear. Since the speed and cost of sequencing complete genomes is rapidly declining, more comprehensive means of analyzing these data in concert with rare variants for genetic risk assessment and individualisation of therapy are required. Here, we present the first integrated analysis of a complete human genome in a clinical context. Methods An individual with a family history of vascular disease and early sudden death was evaluated. Clinical assessment included risk prediction for coronary artery disease, screening for causes of sudden cardiac death, and genetic counselling. Genetic analysis included the development of novel methods for the integration of whole genome sequence data including 2.6 million single nucleotide polymorphisms and 752 copy number variations. The algorithm focused on predicting genetic risk of genes associated with known Mendelian disease, recognised drug responses, and pathogenicity for novel variants. In addition, since integration of risk ratios derived from case control studies is challenging, we estimated posterior probabilities from age and sex appropriate prior probability and likelihood ratios derived for each genotype. In addition, we developed a visualisation approach to account for gene-environment interactions and conditionally dependent risks. Findings We found increased genetic risk for myocardial infarction, type II diabetes and certain cancers. Rare variants in LPA are consistent with the family history of coronary artery disease. Pharmacogenomic analysis suggested a positive response to lipid lowering therapy, likely clopidogrel resistance, and a low initial dosing requirement for warfarin. Many variants of uncertain significance were reported. Interpretation Although challenges remain, our results suggest that whole genome sequencing can yield useful and clinically relevant information for individual patients, especially for those with a strong family history of significant disease. PMID:20435227

  8. Two Quantitative Trait Loci Influence Whipworm (Trichuris trichiura) Infection in a Nepalese Population

    PubMed Central

    Williams-Blangero, Sarah; VandeBerg, John L.; Subedi, Janardan; Jha, Bharat; Dyer, T.D.; Blangero, John

    2014-01-01

    Background Whipworm (Trichuris trichiura) is a soil-transmitted helminth which infects over a billion people. It is a serious public health problem in many developing countries and can result in deficits in growth and cognitive development. In a follow-up study of a significant heritability for whipworm infection, we conducted the first genome scan for susceptibility to this important parasitic disease. Methods We assessed whipworm eggs per gram of feces in 1253 members of the Jirel population of eastern Nepal. All sampled individuals belonged to a single pedigree containing over 26,000 relative pairs that are informative for genetic analysis. Results Linkage analysis of genome scan data generated for the pedigree provided unambiguous evidence for two quantitative trait loci influencing susceptibility to whipworm infection, one located on chromosome 9 (LOD = 3.35, genome-wide p = 0.0138) and the other located on chromosome 18 (LOD = 3.29, genome-wide p = 0.0159). There was also suggestive evidence for two loci located on chromosomes 12 and 13 influencing whipworm infection. Conclusion The results of this first genome scan for susceptibility to whipworm infection may ultimately lead to the identification of novel targets for vaccine and drug development efforts. PMID:18462166

  9. Secure searching of biomarkers through hybrid homomorphic encryption scheme.

    PubMed

    Kim, Miran; Song, Yongsoo; Cheon, Jung Hee

    2017-07-26

    As genome sequencing technology develops rapidly, there has lately been an increasing need to keep genomic data secure even when stored in the cloud and still used for research. We are interested in designing a protocol for the secure outsourcing matching problem on encrypted data. We propose an efficient method to securely search a matching position with the query data and extract some information at the position. After decryption, only a small amount of comparisons with the query information should be performed in plaintext state. We apply this method to find a set of biomarkers in encrypted genomes. The important feature of our method is to encode a genomic database as a single element of polynomial ring. Since our method requires a single homomorphic multiplication of hybrid scheme for query computation, it has the advantage over the previous methods in parameter size, computation complexity, and communication cost. In particular, the extraction procedure not only prevents leakage of database information that has not been queried by user but also reduces the communication cost by half. We evaluate the performance of our method and verify that the computation on large-scale personal data can be securely and practically outsourced to a cloud environment during data analysis. It takes about 3.9 s to search-and-extract the reference and alternate sequences at the queried position in a database of size 4M. Our solution for finding a set of biomarkers in DNA sequences shows the progress of cryptographic techniques in terms of their capability can support real-world genome data analysis in a cloud environment.

  10. Editing the Neuronal Genome: a CRISPR View of Chromatin Regulation in Neuronal Development, Function, and Plasticity.

    PubMed

    Yang, Marty G; West, Anne E

    2016-12-01

    The dynamic orchestration of gene expression is crucial for the proper differentiation, function, and adaptation of cells. In the brain, transcriptional regulation underlies the incredible diversity of neuronal cell types and contributes to the ability of neurons to adapt their function to the environment. Recently, novel methods for genome and epigenome editing have begun to revolutionize our understanding of gene regulatory mechanisms. In particular, the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system has proven to be a particularly accessible and adaptable technique for genome engineering. Here, we review the use of CRISPR/Cas9 in neurobiology and discuss how these studies have advanced understanding of nervous system development and plasticity. We cover four especially salient applications of CRISPR/Cas9: testing the consequences of enhancer mutations, tagging genes and gene products for visualization in live cells, directly activating or repressing enhancers in vivo , and manipulating the epigenome. In each case, we summarize findings from recent studies and discuss evolving adaptations of the method.

  11. Approximate matching of structured motifs in DNA sequences.

    PubMed

    El-Mabrouk, Nadia; Raffinot, Mathieu; Duchesne, Jean-Eudes; Lajoie, Mathieu; Luc, Nicolas

    2005-04-01

    Several methods have been developed for identifying more or less complex RNA structures in a genome. All these methods are based on the search for conserved primary and secondary sub-structures. In this paper, we present a simple formal representation of a helix, which is a combination of sequence and folding constraints, as a constrained regular expression. This representation allows us to develop a well-founded algorithm that searches for all approximate matches of a helix in a genome. The algorithm is based on an alignment graph constructed from several copies of a pushdown automaton, arranged one on top of another. This is a first attempt to take advantage of the possibilities of pushdown automata in the context of approximate matching. The worst time complexity is O(krpn), where k is the error threshold, n the size of the genome, p the size of the secondary expression, and r its number of union symbols. We then extend the algorithm to search for pseudo-knots and secondary structures containing an arbitrary number of helices.

  12. Reconstructing spatial organizations of chromosomes through manifold learning

    PubMed Central

    Deng, Wenxuan; Hu, Hailin; Ma, Rui; Zhang, Sai; Yang, Jinglin; Peng, Jian; Kaplan, Tommy; Zeng, Jianyang

    2018-01-01

    Abstract Decoding the spatial organizations of chromosomes has crucial implications for studying eukaryotic gene regulation. Recently, chromosomal conformation capture based technologies, such as Hi-C, have been widely used to uncover the interaction frequencies of genomic loci in a high-throughput and genome-wide manner and provide new insights into the folding of three-dimensional (3D) genome structure. In this paper, we develop a novel manifold learning based framework, called GEM (Genomic organization reconstructor based on conformational Energy and Manifold learning), to reconstruct the three-dimensional organizations of chromosomes by integrating Hi-C data with biophysical feasibility. Unlike previous methods, which explicitly assume specific relationships between Hi-C interaction frequencies and spatial distances, our model directly embeds the neighboring affinities from Hi-C space into 3D Euclidean space. Extensive validations demonstrated that GEM not only greatly outperformed other state-of-art modeling methods but also provided a physically and physiologically valid 3D representations of the organizations of chromosomes. Furthermore, we for the first time apply the modeled chromatin structures to recover long-range genomic interactions missing from original Hi-C data. PMID:29408992

  13. Reconstructing spatial organizations of chromosomes through manifold learning.

    PubMed

    Zhu, Guangxiang; Deng, Wenxuan; Hu, Hailin; Ma, Rui; Zhang, Sai; Yang, Jinglin; Peng, Jian; Kaplan, Tommy; Zeng, Jianyang

    2018-05-04

    Decoding the spatial organizations of chromosomes has crucial implications for studying eukaryotic gene regulation. Recently, chromosomal conformation capture based technologies, such as Hi-C, have been widely used to uncover the interaction frequencies of genomic loci in a high-throughput and genome-wide manner and provide new insights into the folding of three-dimensional (3D) genome structure. In this paper, we develop a novel manifold learning based framework, called GEM (Genomic organization reconstructor based on conformational Energy and Manifold learning), to reconstruct the three-dimensional organizations of chromosomes by integrating Hi-C data with biophysical feasibility. Unlike previous methods, which explicitly assume specific relationships between Hi-C interaction frequencies and spatial distances, our model directly embeds the neighboring affinities from Hi-C space into 3D Euclidean space. Extensive validations demonstrated that GEM not only greatly outperformed other state-of-art modeling methods but also provided a physically and physiologically valid 3D representations of the organizations of chromosomes. Furthermore, we for the first time apply the modeled chromatin structures to recover long-range genomic interactions missing from original Hi-C data.

  14. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas

    PubMed Central

    Hou, Yu; Guo, Huahu; Cao, Chen; Li, Xianlong; Hu, Boqiang; Zhu, Ping; Wu, Xinglong; Wen, Lu; Tang, Fuchou; Huang, Yanyi; Peng, Jirun

    2016-01-01

    Single-cell genome, DNA methylome, and transcriptome sequencing methods have been separately developed. However, to accurately analyze the mechanism by which transcriptome, genome and DNA methylome regulate each other, these omic methods need to be performed in the same single cell. Here we demonstrate a single-cell triple omics sequencing technique, scTrio-seq, that can be used to simultaneously analyze the genomic copy-number variations (CNVs), DNA methylome, and transcriptome of an individual mammalian cell. We show that large-scale CNVs cause proportional changes in RNA expression of genes within the gained or lost genomic regions, whereas these CNVs generally do not affect DNA methylation in these regions. Furthermore, we applied scTrio-seq to 25 single cancer cells derived from a human hepatocellular carcinoma tissue sample. We identified two subpopulations within these cells based on CNVs, DNA methylome, or transcriptome of individual cells. Our work offers a new avenue of dissecting the complex contribution of genomic and epigenomic heterogeneities to the transcriptomic heterogeneity within a population of cells. PMID:26902283

  15. Unexpected effects of different genetic backgrounds on identification of genomic rearrangements via whole-genome next generation sequencing.

    PubMed

    Chen, Zhangguo; Gowan, Katherine; Leach, Sonia M; Viboolsittiseri, Sawanee S; Mishra, Ameet K; Kadoishi, Tanya; Diener, Katrina; Gao, Bifeng; Jones, Kenneth; Wang, Jing H

    2016-10-21

    Whole genome next generation sequencing (NGS) is increasingly employed to detect genomic rearrangements in cancer genomes, especially in lymphoid malignancies. We recently established a unique mouse model by specifically deleting a key non-homologous end-joining DNA repair gene, Xrcc4, and a cell cycle checkpoint gene, Trp53, in germinal center B cells. This mouse model spontaneously develops mature B cell lymphomas (termed G1XP lymphomas). Here, we attempt to employ whole genome NGS to identify novel structural rearrangements, in particular inter-chromosomal translocations (CTXs), in these G1XP lymphomas. We sequenced six lymphoma samples, aligned our NGS data with mouse reference genome (in C57BL/6J (B6) background) and identified CTXs using CREST algorithm. Surprisingly, we detected widespread CTXs in both lymphomas and wildtype control samples, majority of which were false positive and attributable to different genetic backgrounds. In addition, we validated our NGS pipeline by sequencing multiple control samples from distinct tissues of different genetic backgrounds of mouse (B6 vs non-B6). Lastly, our studies showed that widespread false positive CTXs can be generated by simply aligning sequences from different genetic backgrounds of mouse. We conclude that mapping and alignment with reference genome might not be a preferred method for analyzing whole-genome NGS data obtained from a genetic background different from reference genome. Given the complex genetic background of different mouse strains or the heterogeneity of cancer genomes in human patients, in order to minimize such systematic artifacts and uncover novel CTXs, a preferred method might be de novo assembly of personalized normal control genome and cancer cell genome, instead of mapping and aligning NGS data to mouse or human reference genome. Thus, our studies have critical impact on the manner of data analysis for cancer genomics.

  16. Overview Article: Identifying transcriptional cis-regulatory modules in animal genomes

    PubMed Central

    Suryamohan, Kushal; Halfon, Marc S.

    2014-01-01

    Gene expression is regulated through the activity of transcription factors and chromatin modifying proteins acting on specific DNA sequences, referred to as cis-regulatory elements. These include promoters, located at the transcription initiation sites of genes, and a variety of distal cis-regulatory modules (CRMs), the most common of which are transcriptional enhancers. Because regulated gene expression is fundamental to cell differentiation and acquisition of new cell fates, identifying, characterizing, and understanding the mechanisms of action of CRMs is critical for understanding development. CRM discovery has historically been challenging, as CRMs can be located far from the genes they regulate, have few readily-identifiable sequence characteristics, and for many years were not amenable to high-throughput discovery methods. However, the recent availability of complete genome sequences and the development of next-generation sequencing methods has led to an explosion of both computational and empirical methods for CRM discovery in model and non-model organisms alike. Experimentally, CRMs can be identified through chromatin immunoprecipitation directed against transcription factors or histone post-translational modifications, identification of nucleosome-depleted “open” chromatin regions, or sequencing-based high-throughput functional screening. Computational methods include comparative genomics, clustering of known or predicted transcription factor binding sites, and supervised machine-learning approaches trained on known CRMs. All of these methods have proven effective for CRM discovery, but each has its own considerations and limitations, and each is subject to a greater or lesser number of false-positive identifications. Experimental confirmation of predictions is essential, although shortcomings in current methods suggest that additional means of validation need to be developed. PMID:25704908

  17. A public resource facilitating clinical use of genomes

    PubMed Central

    Ball, Madeleine P.; Thakuria, Joseph V.; Zaranek, Alexander Wait; Clegg, Tom; Rosenbaum, Abraham M.; Wu, Xiaodi; Angrist, Misha; Bhak, Jong; Bobe, Jason; Callow, Matthew J.; Cano, Carlos; Chou, Michael F.; Chung, Wendy K.; Douglas, Shawn M.; Estep, Preston W.; Gore, Athurva; Hulick, Peter; Labarga, Alberto; Lee, Je-Hyuk; Lunshof, Jeantine E.; Kim, Byung Chul; Kim, Jong-Il; Li, Zhe; Murray, Michael F.; Nilsen, Geoffrey B.; Peters, Brock A.; Raman, Anugraha M.; Rienhoff, Hugh Y.; Robasky, Kimberly; Wheeler, Matthew T.; Vandewege, Ward; Vorhaus, Daniel B.; Yang, Joyce L.; Yang, Luhan; Aach, John; Ashley, Euan A.; Drmanac, Radoje; Kim, Seong-Jin; Li, Jin Billy; Peshkin, Leonid; Seidman, Christine E.; Seo, Jeong-Sun; Zhang, Kun; Rehm, Heidi L.; Church, George M.

    2012-01-01

    Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved “open consent” process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain—we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research. PMID:22797899

  18. Genome Engineering for Personalized Arthritis Therapeutics.

    PubMed

    Adkar, Shaunak S; Brunger, Jonathan M; Willard, Vincent P; Wu, Chia-Lung; Gersbach, Charles A; Guilak, Farshid

    2017-10-01

    Arthritis represents a family of complex joint pathologies responsible for the majority of musculoskeletal conditions. Nearly all diseases within this family, including osteoarthritis, rheumatoid arthritis, and juvenile idiopathic arthritis, are chronic conditions with few or no disease-modifying therapeutics available. Advances in genome engineering technology, most recently with CRISPR-Cas9, have revolutionized our ability to interrogate and validate genetic and epigenetic elements associated with chronic diseases such as arthritis. These technologies, together with cell reprogramming methods, including the use of induced pluripotent stem cells, provide a platform for human disease modeling. We summarize new evidence from genome-wide association studies and genomics that substantiates a genetic basis for arthritis pathogenesis. We also review the potential contributions of genome engineering in the development of new arthritis therapeutics. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Application of single-step genomic evaluation for crossbred performance in pig.

    PubMed

    Xiang, T; Nielsen, B; Su, G; Legarra, A; Christensen, O F

    2016-03-01

    Crossbreding is predominant and intensively used in commercial meat production systems, especially in poultry and swine. Genomic evaluation has been successfully applied for breeding within purebreds but also offers opportunities of selecting purebreds for crossbred performance by combining information from purebreds with information from crossbreds. However, it generally requires that all relevant animals are genotyped, which is costly and presently does not seem to be feasible in practice. Recently, a novel single-step BLUP method for genomic evaluation of both purebred and crossbred performance has been developed that can incorporate marker genotypes into a traditional animal model. This new method has not been validated in real data sets. In this study, we applied this single-step method to analyze data for the maternal trait of total number of piglets born in Danish Landrace, Yorkshire, and two-way crossbred pigs in different scenarios. The genetic correlation between purebred and crossbred performances was investigated first, and then the impact of (crossbred) genomic information on prediction reliability for crossbred performance was explored. The results confirm the existence of a moderate genetic correlation, and it was seen that the standard errors on the estimates were reduced when including genomic information. Models with marker information, especially crossbred genomic information, improved model-based reliabilities for crossbred performance of purebred boars and also improved the predictive ability for crossbred animals and, to some extent, reduced the bias of prediction. We conclude that the new single-step BLUP method is a good tool in the genetic evaluation for crossbred performance in purebred animals.

  20. Mixed Model Methods for Genomic Prediction and Variance Component Estimation of Additive and Dominance Effects Using SNP Markers

    PubMed Central

    Da, Yang; Wang, Chunkao; Wang, Shengwen; Hu, Guo

    2014-01-01

    We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005–0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level. PMID:24498162

  1. Mixed model methods for genomic prediction and variance component estimation of additive and dominance effects using SNP markers.

    PubMed

    Da, Yang; Wang, Chunkao; Wang, Shengwen; Hu, Guo

    2014-01-01

    We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005-0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level.

  2. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource

    PubMed Central

    Seaver, Samuel M. D.; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M. T.; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D.; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D.; Henry, Christopher S.

    2014-01-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed. PMID:24927599

  3. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource.

    PubMed

    Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S

    2014-07-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.

  4. A RecET-assisted CRISPR-Cas9 genome editing in Corynebacterium glutamicum.

    PubMed

    Wang, Bo; Hu, Qitiao; Zhang, Yu; Shi, Ruilin; Chai, Xin; Liu, Zhe; Shang, Xiuling; Zhang, Yun; Wen, Tingyi

    2018-04-23

    Extensive modification of genome is an efficient manner to regulate the metabolic network for producing target metabolites or non-native products using Corynebacterium glutamicum as a cell factory. Genome editing approaches by means of homologous recombination and counter-selection markers are laborious and time consuming due to multiple round manipulations and low editing efficiencies. The current two-plasmid-based CRISPR-Cas9 editing methods generate false positives due to the potential instability of Cas9 on the plasmid, and require a high transformation efficiency for co-occurrence of two plasmids transformation. Here, we developed a RecET-assisted CRISPR-Cas9 genome editing method using a chromosome-borne Cas9-RecET and a single plasmid harboring sgRNA and repair templates. The inducible expression of chromosomal RecET promoted the frequencies of homologous recombination, and increased the efficiency for gene deletion. Due to the high transformation efficiency of a single plasmid, this method enabled 10- and 20-kb region deletion, 2.5-, 5.7- and 7.5-kb expression cassette insertion and precise site-specific mutation, suggesting a versatility of this method. Deletion of argR and farR regulators as well as site-directed mutation of argB and pgi genes generated the mutant capable of accumulating L-arginine, indicating the stability of chromosome-borne Cas9 for iterative genome editing. Using this method, the model-predicted target genes were modified to redirect metabolic flux towards 1,2-propanediol biosynthetic pathway. The final engineered strain produced 6.75 ± 0.46 g/L of 1,2-propanediol that is the highest titer reported in C. glutamicum. Furthermore, this method is available for Corynebacterium pekinense 1.563, suggesting its universal applicability in other Corynebacterium species. The RecET-assisted CRISPR-Cas9 genome editing method will facilitate engineering of metabolic networks for the synthesis of interested bio-based products from renewable biomass using Corynebacterium species as cell factories.

  5. Guidelines for whole genome bisulphite sequencing of intact and FFPET DNA on the Illumina HiSeq X Ten.

    PubMed

    Nair, Shalima S; Luu, Phuc-Loi; Qu, Wenjia; Maddugoda, Madhavi; Huschtscha, Lily; Reddel, Roger; Chenevix-Trench, Georgia; Toso, Martina; Kench, James G; Horvath, Lisa G; Hayes, Vanessa M; Stricker, Phillip D; Hughes, Timothy P; White, Deborah L; Rasko, John E J; Wong, Justin J-L; Clark, Susan J

    2018-05-28

    Comprehensive genome-wide DNA methylation profiling is critical to gain insights into epigenetic reprogramming during development and disease processes. Among the different genome-wide DNA methylation technologies, whole genome bisulphite sequencing (WGBS) is considered the gold standard for assaying genome-wide DNA methylation at single base resolution. However, the high sequencing cost to achieve the optimal depth of coverage limits its application in both basic and clinical research. To achieve 15× coverage of the human methylome, using WGBS, requires approximately three lanes of 100-bp-paired-end Illumina HiSeq 2500 sequencing. It is important, therefore, for advances in sequencing technologies to be developed to enable cost-effective high-coverage sequencing. In this study, we provide an optimised WGBS methodology, from library preparation to sequencing and data processing, to enable 16-20× genome-wide coverage per single lane of HiSeq X Ten, HCS 3.3.76. To process and analyse the data, we developed a WGBS pipeline (METH10X) that is fast and can call SNPs. We performed WGBS on both high-quality intact DNA and degraded DNA from formalin-fixed paraffin-embedded tissue. First, we compared different library preparation methods on the HiSeq 2500 platform to identify the best method for sequencing on the HiSeq X Ten. Second, we optimised the PhiX and genome spike-ins to achieve higher quality and coverage of WGBS data on the HiSeq X Ten. Third, we performed integrated whole genome sequencing (WGS) and WGBS of the same DNA sample in a single lane of HiSeq X Ten to improve data output. Finally, we compared methylation data from the HiSeq 2500 and HiSeq X Ten and found high concordance (Pearson r > 0.9×). Together we provide a systematic, efficient and complete approach to perform and analyse WGBS on the HiSeq X Ten. Our protocol allows for large-scale WGBS studies at reasonable processing time and cost on the HiSeq X Ten platform.

  6. Identification of Genomic Insertion and Flanking Sequence of G2-EPSPS and GAT Transgenes in Soybean Using Whole Genome Sequencing Method.

    PubMed

    Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan

    2016-01-01

    Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.

  7. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

  8. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features

    PubMed Central

    Sievers, Aaron; Bosiek, Katharina; Bisch, Marc; Dreessen, Chris; Riedel, Jascha; Froß, Patrick; Hausmann, Michael; Hildenbrand, Georg

    2017-01-01

    In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers) was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4) on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B) of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs), which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST) and the high pace of standard k-mer analysis. PMID:28422050

  9. Supervised Machine Learning for Population Genetics: A New Paradigm

    PubMed Central

    Schrider, Daniel R.; Kern, Andrew D.

    2018-01-01

    As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning (ML). We review the fundamentals of ML, discuss recent applications of supervised ML to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised ML is an important and underutilized tool that has considerable potential for the world of evolutionary genomics. PMID:29331490

  10. Comparative Genomics and Host Resistance against Infectious Diseases

    PubMed Central

    Qureshi, Salman T.; Skamene, Emil

    1999-01-01

    The large size and complexity of the human genome have limited the identification and functional characterization of components of the innate immune system that play a critical role in front-line defense against invading microorganisms. However, advances in genome analysis (including the development of comprehensive sets of informative genetic markers, improved physical mapping methods, and novel techniques for transcript identification) have reduced the obstacles to discovery of novel host resistance genes. Study of the genomic organization and content of widely divergent vertebrate species has shown a remarkable degree of evolutionary conservation and enables meaningful cross-species comparison and analysis of newly discovered genes. Application of comparative genomics to host resistance will rapidly expand our understanding of human immune defense by facilitating the translation of knowledge acquired through the study of model organisms. We review the rationale and resources for comparative genomic analysis and describe three examples of host resistance genes successfully identified by this approach. PMID:10081670

  11. Chemical biology on the genome.

    PubMed

    Balasubramanian, Shankar

    2014-08-15

    In this article I discuss studies towards understanding the structure and function of DNA in the context of genomes from the perspective of a chemist. The first area I describe concerns the studies that led to the invention and subsequent development of a method for sequencing DNA on a genome scale at high speed and low cost, now known as Solexa/Illumina sequencing. The second theme will feature the four-stranded DNA structure known as a G-quadruplex with a focus on its fundamental properties, its presence in cellular genomic DNA and the prospects for targeting such a structure in cels with small molecules. The final topic for discussion is naturally occurring chemically modified DNA bases with an emphasis on chemistry for decoding (or sequencing) such modifications in genomic DNA. The genome is a fruitful topic to be further elucidated by the creation and application of chemical approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.

  12. Genome Sequencing and Assembly by Long Reads in Plants

    PubMed Central

    Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong

    2017-01-01

    Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420

  13. Genomics-assisted breeding in four major pulse crops of developing countries: present status and prospects.

    PubMed

    Bohra, Abhishek; Pandey, Manish K; Jha, Uday C; Singh, Balwant; Singh, Indra P; Datta, Dibendu; Chaturvedi, Sushil K; Nadarajan, N; Varshney, Rajeev K

    2014-06-01

    Given recent advances in pulse molecular biology, genomics-driven breeding has emerged as a promising approach to address the issues of limited genetic gain and low productivity in various pulse crops. The global population is continuously increasing and is expected to reach nine billion by 2050. This huge population pressure will lead to severe shortage of food, natural resources and arable land. Such an alarming situation is most likely to arise in developing countries due to increase in the proportion of people suffering from protein and micronutrient malnutrition. Pulses being a primary and affordable source of proteins and minerals play a key role in alleviating the protein calorie malnutrition, micronutrient deficiencies and other undernourishment-related issues. Additionally, pulses are a vital source of livelihood generation for millions of resource-poor farmers practising agriculture in the semi-arid and sub-tropical regions. Limited success achieved through conventional breeding so far in most of the pulse crops will not be enough to feed the ever increasing population. In this context, genomics-assisted breeding (GAB) holds promise in enhancing the genetic gains. Though pulses have long been considered as orphan crops, recent advances in the area of pulse genomics are noteworthy, e.g. discovery of genome-wide genetic markers, high-throughput genotyping and sequencing platforms, high-density genetic linkage/QTL maps and, more importantly, the availability of whole-genome sequence. With genome sequence in hand, there is a great scope to apply genome-wide methods for trait mapping using association studies and to choose desirable genotypes via genomic selection. It is anticipated that GAB will speed up the progress of genetic improvement of pulses, leading to the rapid development of cultivars with higher yield, enhanced stress tolerance and wider adaptability.

  14. Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression

    PubMed Central

    Libbrecht, Maxwell W.; Ay, Ferhat; Hoffman, Michael M.; Gilbert, David M.; Bilmes, Jeffrey A.; Noble, William Stafford

    2015-01-01

    The genomic neighborhood of a gene influences its activity, a behavior that is attributable in part to domain-scale regulation. Previous genomic studies have identified many types of regulatory domains. However, due to the difficulty of integrating genomics data sets, the relationships among these domain types are poorly understood. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation of heterogeneous collections of genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing SAGA methods cannot integrate inherently pairwise chromatin conformation data. We developed a new computational method, called graph-based regularization (GBR), for expressing a pairwise prior that encourages certain pairs of genomic loci to receive the same label in a genome annotation. We used GBR to exploit chromatin conformation information during genome annotation by encouraging positions that are close in 3D to occupy the same type of domain. Using this approach, we produced a model of chromatin domains in eight human cell types, thereby revealing the relationships among known domain types. Through this model, we identified clusters of tightly regulated genes expressed in only a small number of cell types, which we term “specific expression domains.” We found that domain boundaries marked by promoters and CTCF motifs are consistent between cell types even when domain activity changes. Finally, we showed that GBR can be used to transfer information from well-studied cell types to less well-characterized cell types during genome annotation, making it possible to produce high-quality annotations of the hundreds of cell types with limited available data. PMID:25677182

  15. Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression.

    PubMed

    Libbrecht, Maxwell W; Ay, Ferhat; Hoffman, Michael M; Gilbert, David M; Bilmes, Jeffrey A; Noble, William Stafford

    2015-04-01

    The genomic neighborhood of a gene influences its activity, a behavior that is attributable in part to domain-scale regulation. Previous genomic studies have identified many types of regulatory domains. However, due to the difficulty of integrating genomics data sets, the relationships among these domain types are poorly understood. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation of heterogeneous collections of genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing SAGA methods cannot integrate inherently pairwise chromatin conformation data. We developed a new computational method, called graph-based regularization (GBR), for expressing a pairwise prior that encourages certain pairs of genomic loci to receive the same label in a genome annotation. We used GBR to exploit chromatin conformation information during genome annotation by encouraging positions that are close in 3D to occupy the same type of domain. Using this approach, we produced a model of chromatin domains in eight human cell types, thereby revealing the relationships among known domain types. Through this model, we identified clusters of tightly regulated genes expressed in only a small number of cell types, which we term "specific expression domains." We found that domain boundaries marked by promoters and CTCF motifs are consistent between cell types even when domain activity changes. Finally, we showed that GBR can be used to transfer information from well-studied cell types to less well-characterized cell types during genome annotation, making it possible to produce high-quality annotations of the hundreds of cell types with limited available data. © 2015 Libbrecht et al.; Published by Cold Spring Harbor Laboratory Press.

  16. Exome-wide DNA capture and next generation sequencing in domestic and wild species.

    PubMed

    Cosart, Ted; Beja-Pereira, Albano; Chen, Shanyuan; Ng, Sarah B; Shendure, Jay; Luikart, Gordon

    2011-07-05

    Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses.We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus) to capture (enrich for), and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison). Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.

  17. Construction of an ultra-high density consensus genetic map, and enhancement of the physical map from genome sequencing in Lupinus angustifolius.

    PubMed

    Zhou, Gaofeng; Jian, Jianbo; Wang, Penghao; Li, Chengdao; Tao, Ye; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark; Yang, Huaan

    2018-01-01

    An ultra-high density genetic map containing 34,574 sequence-defined markers was developed in Lupinus angustifolius. Markers closely linked to nine genes of agronomic traits were identified. A physical map was improved to cover 560.5 Mb genome sequence. Lupin (Lupinus angustifolius L.) is a recently domesticated legume grain crop. In this study, we applied the restriction-site associated DNA sequencing (RADseq) method to genotype an F 9 recombinant inbred line population derived from a wild type × domesticated cultivar (W × D) cross. A high density linkage map was developed based on the W × D population. By integrating sequence-defined DNA markers reported in previous mapping studies, we established an ultra-high density consensus genetic map, which contains 34,574 markers consisting of 3508 loci covering 2399 cM on 20 linkage groups. The largest gap in the entire consensus map was 4.73 cM. The high density W × D map and the consensus map were used to develop an improved physical map, which covered 560.5 Mb of genome sequence data. The ultra-high density consensus linkage map, the improved physical map and the markers linked to genes of breeding interest reported in this study provide a common tool for genome sequence assembly, structural genomics, comparative genomics, functional genomics, QTL mapping, and molecular plant breeding in lupin.

  18. GAP Final Technical Report 12-14-04

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andrew J. Bordner, PhD, Senior Research Scientist

    2004-12-14

    The Genomics Annotation Platform (GAP) was designed to develop new tools for high throughput functional annotation and characterization of protein sequences and structures resulting from genomics and structural proteomics, benchmarking and application of those tools. Furthermore, this platform integrated the genomic scale sequence and structural analysis and prediction tools with the advanced structure prediction and bioinformatics environment of ICM. The development of GAP was primarily oriented towards the annotation of new biomolecular structures using both structural and sequence data. Even though the amount of protein X-ray crystal data is growing exponentially, the volume of sequence data is growing even moremore » rapidly. This trend was exploited by leveraging the wealth of sequence data to provide functional annotation for protein structures. The additional information provided by GAP is expected to assist the majority of the commercial users of ICM, who are involved in drug discovery, in identifying promising drug targets as well in devising strategies for the rational design of therapeutics directed at the protein of interest. The GAP also provided valuable tools for biochemistry education, and structural genomics centers. In addition, GAP incorporates many novel prediction and analysis methods not available in other molecular modeling packages. This development led to signing the first Molsoft agreement in the structural genomics annotation area with the University of oxford Structural Genomics Center. This commercial agreement validated the Molsoft efforts under the GAP project and provided the basis for further development of the large scale functional annotation platform.« less

  19. Development of molecular markers for determining continental origin of wood from White Oaks (Quercus L. sect. Quercus)

    Treesearch

    Hilke Schroeder; Richard Cronn; Yulai Yanbaev; Tara Jennings; Malte Mader; Bernd Degen; Birgit Kersten; Dusan Gomory

    2016-01-01

    To detect and avoid illegal logging of valuable tree species, identification methods for the origin of timber are necessary. We used next-generation sequencing to identify chloroplast genome regions that differentiate the origin of white oaks from the three continents; Asia, Europe, and North America. By using the chloroplast genome of Asian Q. mongolica...

  20. Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population

    PubMed Central

    2012-01-01

    Background A single-step blending approach allows genomic prediction using information of genotyped and non-genotyped animals simultaneously. However, the combined relationship matrix in a single-step method may need to be adjusted because marker-based and pedigree-based relationship matrices may not be on the same scale. The same may apply when a GBLUP model includes both genomic breeding values and residual polygenic effects. The objective of this study was to compare single-step blending methods and GBLUP methods with and without adjustment of the genomic relationship matrix for genomic prediction of 16 traits in the Nordic Holstein population. Methods The data consisted of de-regressed proofs (DRP) for 5 214 genotyped and 9 374 non-genotyped bulls. The bulls were divided into a training and a validation population by birth date, October 1, 2001. Five approaches for genomic prediction were used: 1) a simple GBLUP method, 2) a GBLUP method with a polygenic effect, 3) an adjusted GBLUP method with a polygenic effect, 4) a single-step blending method, and 5) an adjusted single-step blending method. In the adjusted GBLUP and single-step methods, the genomic relationship matrix was adjusted for the difference of scale between the genomic and the pedigree relationship matrices. A set of weights on the pedigree relationship matrix (ranging from 0.05 to 0.40) was used to build the combined relationship matrix in the single-step blending method and the GBLUP method with a polygenetic effect. Results Averaged over the 16 traits, reliabilities of genomic breeding values predicted using the GBLUP method with a polygenic effect (relative weight of 0.20) were 0.3% higher than reliabilities from the simple GBLUP method (without a polygenic effect). The adjusted single-step blending and original single-step blending methods (relative weight of 0.20) had average reliabilities that were 2.1% and 1.8% higher than the simple GBLUP method, respectively. In addition, the GBLUP method with a polygenic effect led to less bias of genomic predictions than the simple GBLUP method, and both single-step blending methods yielded less bias of predictions than all GBLUP methods. Conclusions The single-step blending method is an appealing approach for practical genomic prediction in dairy cattle. Genomic prediction from the single-step blending method can be improved by adjusting the scale of the genomic relationship matrix. PMID:22455934

  1. Spermatogonial stem cell autotransplantation and germline genomic editing: a future cure for spermatogenic failure and prevention of transmission of genomic diseases

    PubMed Central

    Mulder, Callista L.; Zheng, Yi; Jan, Sabrina Z.; Struijk, Robert B.; Repping, Sjoerd; Hamer, Geert; van Pelt, Ans M.M.

    2016-01-01

    BACKGROUND Subfertility affects approximately 15% of all couples, and a severe male factor is identified in 17% of these couples. While the etiology of a severe male factor remains largely unknown, prior gonadotoxic treatment and genomic aberrations have been associated with this type of subfertility. Couples with a severe male factor can resort to ICSI, with either ejaculated spermatozoa (in case of oligozoospermia) or surgically retrieved testicular spermatozoa (in case of azoospermia) to generate their own biological children. Currently there is no direct treatment for azoospermia or oligozoospermia. Spermatogonial stem cell (SSC) autotransplantation (SSCT) is a promising novel clinical application currently under development to restore fertility in sterile childhood cancer survivors. Meanwhile, recent advances in genomic editing, especially the clustered regulatory interspaced short palindromic repeats-associated protein 9 (CRISPR-Cas9) system, are likely to enable genomic rectification of human SSCs in the near future. OBJECTIVE AND RATIONALE The objective of this review is to provide insights into the prospects of the potential clinical application of SSCT with or without genomic editing to cure spermatogenic failure and to prevent transmission of genetic diseases. SEARCH METHODS We performed a narrative review using the literature available on PubMed not restricted to any publishing year on topics of subfertility, fertility treatments, (molecular regulation of) spermatogenesis and SSCT, inherited (genetic) disorders, prenatal screening methods, genomic editing and germline editing. For germline editing, we focussed on the novel CRISPR-Cas9 system. We included papers written in English only. OUTCOMES Current techniques allow propagation of human SSCs in vitro, which is indispensable to successful transplantation. This technique is currently being developed in a preclinical setting for childhood cancer survivors who have stored a testis biopsy prior to cancer treatment. Similarly, SSCT could be used to restore fertility in sterile adult cancer survivors. In vitro propagation of SSCs might also be employed to enhance spermatogenesis in oligozoospermic men and in azoospermic men who still have functional SSCs albeit in insufficient numbers. The combination of SSCT with genomic editing techniques could potentially rectify defects in spermatogenesis caused by genomic mutations or, more broadly, prevent transmission of genomic diseases to the offspring. In spite of the promising prospects, SSCT and germline genomic editing are not yet clinically applicable and both techniques require optimization at various levels. WIDER IMPLICATIONS SSCT with or without genomic editing could potentially be used to restore fertility in cancer survivors to treat couples with a severe male factor and to prevent the paternal transmission of diseases. This will potentially allow these couples to have their own biological children. Technical development is progressing rapidly, and ethical reflection and societal debate on the use of SSCT with or without genomic editing is pressing. PMID:27240817

  2. Practical Considerations for Implementing Genomic Information Resources

    PubMed Central

    Overby, Casey L.; Connolly, John; Chute, Christopher G.; Denny, Joshua C.; Freimuth, Robert R.; Hartzler, Andrea L.; Holm, Ingrid A.; Manzi, Shannon; Pathak, Jyotishman; Peissig, Peggy L.; Smith, Maureen; Williams, Marc S.; Shirts, Brian H.; Stoffel, Elena M.; Tarczy-Hornoch, Peter; Vitek, Carolyn R. Rohrer; Wolf, Wendy A.; Starren, Justin

    2016-01-01

    Summary Objectives To understand opinions and perceptions on the state of information resources specifically targeted to genomics, and approaches to delivery in clinical practice. Methods We conducted a survey of genomic content use and its clinical delivery from representatives across eight institutions in the electronic Medical Records and Genomics (eMERGE) network and two institutions in the Clinical Sequencing Exploratory Research (CSER) consortium in 2014. Results Eleven responses representing distinct projects across ten sites showed heterogeneity in how content is being delivered, with provider-facing content primarily delivered via the electronic health record (EHR) (n=10), and paper/pamphlets as the leading mode for patient-facing content (n=9). There was general agreement (91%) that new content is needed for patients and providers specific to genomics, and that while aspects of this content could be shared across institutions there remain site-specific needs (73% in agreement). Conclusion This work identifies a need for the improved access to and expansion of information resources to support genomic medicine, and opportunities for content developers and EHR vendors to partner with institutions to develop needed resources, and streamline their use – such as a central content site in multiple modalities while implementing approaches to allow for site-specific customization. PMID:27652374

  3. Statistical Methods in Integrative Genomics

    PubMed Central

    Richardson, Sylvia; Tseng, George C.; Sun, Wei

    2016-01-01

    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531

  4. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome.

    PubMed

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-06-15

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development.

  5. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome

    PubMed Central

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-01-01

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development. PMID:22508961

  6. A bioinformatics approach for identifying transgene insertion sites using whole genome sequencing data.

    PubMed

    Park, Doori; Park, Su-Hyun; Ban, Yong Wook; Kim, Youn Shic; Park, Kyoung-Cheul; Kim, Nam-Soo; Kim, Ju-Kon; Choi, Ik-Young

    2017-08-15

    Genetically modified crops (GM crops) have been developed to improve the agricultural traits of modern crop cultivars. Safety assessments of GM crops are of paramount importance in research at developmental stages and before releasing transgenic plants into the marketplace. Sequencing technology is developing rapidly, with higher output and labor efficiencies, and will eventually replace existing methods for the molecular characterization of genetically modified organisms. To detect the transgenic insertion locations in the three GM rice gnomes, Illumina sequencing reads are mapped and classified to the rice genome and plasmid sequence. The both mapped reads are classified to characterize the junction site between plant and transgene sequence by sequence alignment. Herein, we present a next generation sequencing (NGS)-based molecular characterization method, using transgenic rice plants SNU-Bt9-5, SNU-Bt9-30, and SNU-Bt9-109. Specifically, using bioinformatics tools, we detected the precise insertion locations and copy numbers of transfer DNA, genetic rearrangements, and the absence of backbone sequences, which were equivalent to results obtained from Southern blot analyses. NGS methods have been suggested as an effective means of characterizing and detecting transgenic insertion locations in genomes. Our results demonstrate the use of a combination of NGS technology and bioinformatics approaches that offers cost- and time-effective methods for assessing the safety of transgenic plants.

  7. Targeted activation of diverse CRISPR-Cas systems for mammalian genome editing via proximal CRISPR targeting.

    PubMed

    Chen, Fuqiang; Ding, Xiao; Feng, Yongmei; Seebeck, Timothy; Jiang, Yanfang; Davis, Gregory D

    2017-04-07

    Bacterial CRISPR-Cas systems comprise diverse effector endonucleases with different targeting ranges, specificities and enzymatic properties, but many of them are inactive in mammalian cells and are thus precluded from genome-editing applications. Here we show that the type II-B FnCas9 from Francisella novicida possesses novel properties, but its nuclease function is frequently inhibited at many genomic loci in living human cells. Moreover, we develop a proximal CRISPR (termed proxy-CRISPR) targeting method that restores FnCas9 nuclease activity in a target-specific manner. We further demonstrate that this proxy-CRISPR strategy is applicable to diverse CRISPR-Cas systems, including type II-C Cas9 and type V Cpf1 systems, and can facilitate precise gene editing even between identical genomic sites within the same genome. Our findings provide a novel strategy to enable use of diverse otherwise inactive CRISPR-Cas systems for genome-editing applications and a potential path to modulate the impact of chromatin microenvironments on genome modification.

  8. Targeted activation of diverse CRISPR-Cas systems for mammalian genome editing via proximal CRISPR targeting

    PubMed Central

    Chen, Fuqiang; Ding, Xiao; Feng, Yongmei; Seebeck, Timothy; Jiang, Yanfang; Davis, Gregory D.

    2017-01-01

    Bacterial CRISPR–Cas systems comprise diverse effector endonucleases with different targeting ranges, specificities and enzymatic properties, but many of them are inactive in mammalian cells and are thus precluded from genome-editing applications. Here we show that the type II-B FnCas9 from Francisella novicida possesses novel properties, but its nuclease function is frequently inhibited at many genomic loci in living human cells. Moreover, we develop a proximal CRISPR (termed proxy-CRISPR) targeting method that restores FnCas9 nuclease activity in a target-specific manner. We further demonstrate that this proxy-CRISPR strategy is applicable to diverse CRISPR–Cas systems, including type II-C Cas9 and type V Cpf1 systems, and can facilitate precise gene editing even between identical genomic sites within the same genome. Our findings provide a novel strategy to enable use of diverse otherwise inactive CRISPR–Cas systems for genome-editing applications and a potential path to modulate the impact of chromatin microenvironments on genome modification. PMID:28387220

  9. The genome sequence of the colonial chordate, Botryllus schlosseri

    PubMed Central

    Voskoboynik, Ayelet; Neff, Norma F; Sahoo, Debashis; Newman, Aaron M; Pushkarev, Dmitry; Koh, Winston; Passarelli, Benedetto; Fan, H Christina; Mantalas, Gary L; Palmeri, Karla J; Ishizuka, Katherine J; Gissi, Carmela; Griggio, Francesca; Ben-Shlomo, Rachel; Corey, Daniel M; Penland, Lolita; White, Richard A; Weissman, Irving L; Quake, Stephen R

    2013-01-01

    Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI: http://dx.doi.org/10.7554/eLife.00569.001 PMID:23840927

  10. DCODE.ORG Anthology of Comparative Genomic Tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loots, G G; Ovcharenko, I

    2005-01-11

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the noncoding encryption of gene regulation across genomes. To facilitate the use of comparative genomics to practical applications in genetics and genomics we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools: zPicture and Mulan; a phylogenetic shadowing tool: eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools: rVista and multiTF; a toolmore » for extracting cis-regulatory modules governing the expression of co-regulated genes, CREME; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ web site.« less

  11. Selective Gene Delivery for Integrating Exogenous DNA into Plastid and Mitochondrial Genomes Using Peptide-DNA Complexes.

    PubMed

    Yoshizumi, Takeshi; Oikawa, Kazusato; Chuah, Jo-Ann; Kodama, Yutaka; Numata, Keiji

    2018-05-14

    Selective gene delivery into organellar genomes (mitochondrial and plastid genomes) has been limited because of a lack of appropriate platform technology, even though these organelles are essential for metabolite and energy production. Techniques for selective organellar modification are needed to functionally improve organelles and produce transplastomic/transmitochondrial plants. However, no method for mitochondrial genome modification has yet been established for multicellular organisms including plants. Likewise, modification of plastid genomes has been limited to a few plant species and algae. In the present study, we developed ionic complexes of fusion peptides containing organellar targeting signal and plasmid DNA for selective delivery of exogenous DNA into the plastid and mitochondrial genomes of intact plants. This is the first report of exogenous DNA being integrated into the mitochondrial genomes of not only plants, but also multicellular organisms in general. This fusion peptide-mediated gene delivery system is a breakthrough platform for both plant organellar biotechnology and gene therapy for mitochondrial diseases in animals.

  12. Evaluation of methods and marker Systems in Genomic Selection of oil palm (Elaeis guineensis Jacq.).

    PubMed

    Kwong, Qi Bin; Teh, Chee Keng; Ong, Ai Ling; Chew, Fook Tim; Mayes, Sean; Kulaveerasingam, Harikrishna; Tammi, Martti; Yeoh, Suat Hui; Appleton, David Ross; Harikrishna, Jennifer Ann

    2017-12-11

    Genomic selection (GS) uses genome-wide markers as an attempt to accelerate genetic gain in breeding programs of both animals and plants. This approach is particularly useful for perennial crops such as oil palm, which have long breeding cycles, and for which the optimal method for GS is still under debate. In this study, we evaluated the effect of different marker systems and modeling methods for implementing GS in an introgressed dura family derived from a Deli dura x Nigerian dura (Deli x Nigerian) with 112 individuals. This family is an important breeding source for developing new mother palms for superior oil yield and bunch characters. The traits of interest selected for this study were fruit-to-bunch (F/B), shell-to-fruit (S/F), kernel-to-fruit (K/F), mesocarp-to-fruit (M/F), oil per palm (O/P) and oil-to-dry mesocarp (O/DM). The marker systems evaluated were simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). RR-BLUP, Bayesian A, B, Cπ, LASSO, Ridge Regression and two machine learning methods (SVM and Random Forest) were used to evaluate GS accuracy of the traits. The kinship coefficient between individuals in this family ranged from 0.35 to 0.62. S/F and O/DM had the highest genomic heritability, whereas F/B and O/P had the lowest. The accuracies using 135 SSRs were low, with accuracies of the traits around 0.20. The average accuracy of machine learning methods was 0.24, as compared to 0.20 achieved by other methods. The trait with the highest mean accuracy was F/B (0.28), while the lowest were both M/F and O/P (0.18). By using whole genomic SNPs, the accuracies for all traits, especially for O/DM (0.43), S/F (0.39) and M/F (0.30) were improved. The average accuracy of machine learning methods was 0.32, compared to 0.31 achieved by other methods. Due to high genomic resolution, the use of whole-genome SNPs improved the efficiency of GS dramatically for oil palm and is recommended for dura breeding programs. Machine learning slightly outperformed other methods, but required parameters optimization for GS implementation.

  13. Application of machine learning methods in bioinformatics

    NASA Astrophysics Data System (ADS)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  14. "Harnessing genomics to improve health in India" – an executive course to support genomics policy

    PubMed Central

    Acharya, Tara; Kumar, Nandini K; Muthuswamy, Vasantha; Daar, Abdallah S; Singer, Peter A

    2004-01-01

    Background The benefits of scientific medicine have eluded millions in developing countries and the genomics revolution threatens to increase health inequities between North and South. India, as a developing yet also industrialized country, is uniquely positioned to pioneer science policy innovations to narrow the genomics divide. Recognizing this, the Indian Council of Medical Research and the University of Toronto Joint Centre for Bioethics conducted a Genomics Policy Executive Course in January 2003 in Kerala, India. The course provided a forum for stakeholders to discuss the relevance of genomics for health in India. This article presents the course findings and recommendations formulated by the participants for genomics policy in India. Methods The course goals were to familiarize participants with the implications of genomics for health in India; analyze and debate policy and ethical issues; and develop a multi-sectoral opinion leaders' network to share perspectives. To achieve these goals, the course brought together representatives of academic research centres, biotechnology companies, regulatory bodies, media, voluntary, and legal organizations to engage in discussion. Topics included scientific advances in genomics, followed by innovations in business models, public sector perspectives, ethics, legal issues and national innovation systems. Results Seven main recommendations emerged: increase funding for healthcare research with appropriate emphasis on genomics; leverage India's assets such as traditional knowledge and genomic diversity in consultation with knowledge-holders; prioritize strategic entry points for India; improve industry-academic interface with appropriate incentives to improve public health and the nation's wealth; develop independent, accountable, transparent regulatory systems to ensure that ethical, legal and social issues are addressed for a single entry, smart and effective system; engage the public and ensure broad-based input into policy setting; ensure equitable access of poor to genomics products and services; deliver knowledge, products and services for public health. A key outcome of the course was the internet-based opinion leaders' network – the Indian Genome Policy Forum – a multi-stakeholder forum to foster further discussion on policy. Conclusion We expect that the process that has led to this network will serve as a model to establish similar Science and Technology policy networks on regional levels and eventually on a global level. PMID:15151698

  15. Systematic review of knowledge, confidence and education in nutritional genomics for students and professionals in nutrition and dietetics.

    PubMed

    Wright, O R L

    2014-06-01

    This review examines knowledge and confidence of nutrition and dietetics professionals in nutritional genomics and evaluates the teaching strategies in this field within nutrition and dietetics university programmes and professional development courses internationally. A systematic search of 10 literature databases was conducted from January 2000 to December 2012 to identify original research. Any studies of either nutrition and/or dietetics students or dietitians/nutritionists investigating current levels of knowledge or confidence in nutritional genomics, or strategies to improve learning and/or confidence in this area, were eligible. Eighteen articles (15 separate studies) met the inclusion criteria. Three articles were assessed as negative, eight as neutral and seven as positive according to the American Dietetics Association Quality Criteria Checklist. The overall ranking of evidence was low. Dietitians have low involvement, knowledge and confidence in nutritional genomics, and evidence for educational strategies is limited and methodologically weak. There is a need to develop training pathways and material to up-skill nutrition and/or dietetics students and nutrition and/or dietetics professionals in nutritional genomics through multidisciplinary collaboration with content area experts. There is a paucity of high quality evidence on optimum teaching strategies; however, methods promoting repetitive exposure to nutritional genomics material, problem-solving, collaborative and case-based learning are most promising for university and professional development programmes. © 2013 The British Dietetic Association Ltd.

  16. Recent advances in understanding the role of nutrition in human genome evolution.

    PubMed

    Ye, Kaixiong; Gu, Zhenglong

    2011-11-01

    Dietary transitions in human history have been suggested to play important roles in the evolution of mankind. Genetic variations caused by adaptation to diet during human evolution could have important health consequences in current society. The advance of sequencing technologies and the rapid accumulation of genome information provide an unprecedented opportunity to comprehensively characterize genetic variations in human populations and unravel the genetic basis of human evolution. Series of selection detection methods, based on various theoretical models and exploiting different aspects of selection signatures, have been developed. Their applications at the species and population levels have respectively led to the identification of human specific selection events that distinguish human from nonhuman primates and local adaptation events that contribute to human diversity. Scrutiny of candidate genes has revealed paradigms of adaptations to specific nutritional components and genome-wide selection scans have verified the prevalence of diet-related selection events and provided many more candidates awaiting further investigation. Understanding the role of diet in human evolution is fundamental for the development of evidence-based, genome-informed nutritional practices in the era of personal genomics.

  17. Social and behavioral research in genomic sequencing: approaches from the Clinical Sequencing Exploratory Research Consortium Outcomes and Measures Working Group.

    PubMed

    Gray, Stacy W; Martins, Yolanda; Feuerman, Lindsay Z; Bernhardt, Barbara A; Biesecker, Barbara B; Christensen, Kurt D; Joffe, Steven; Rini, Christine; Veenstra, David; McGuire, Amy L

    2014-10-01

    The routine use of genomic sequencing in clinical medicine has the potential to dramatically alter patient care and medical outcomes. To fully understand the psychosocial and behavioral impact of sequencing integration into clinical practice, it is imperative that we identify the factors that influence sequencing-related decision making and patient outcomes. In an effort to develop a collaborative and conceptually grounded approach to studying sequencing adoption, members of the National Human Genome Research Institute's Clinical Sequencing Exploratory Research Consortium formed the Outcomes and Measures Working Group. Here we highlight the priority areas of investigation and psychosocial and behavioral outcomes identified by the Working Group. We also review some of the anticipated challenges to measurement in social and behavioral research related to genomic sequencing; opportunities for instrument development; and the importance of qualitative, quantitative, and mixed-method approaches. This work represents the early, shared efforts of multiple research teams as we strive to understand individuals' experiences with genomic sequencing. The resulting body of knowledge will guide recommendations for the optimal use of sequencing in clinical practice.

  18. Promoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains.

    PubMed

    Ron, Gil; Globerson, Yuval; Moran, Dror; Kaplan, Tommy

    2017-12-21

    Proximity-ligation methods such as Hi-C allow us to map physical DNA-DNA interactions along the genome, and reveal its organization into topologically associating domains (TADs). As the Hi-C data accumulate, computational methods were developed for identifying domain borders in multiple cell types and organisms. Here, we present PSYCHIC, a computational approach for analyzing Hi-C data and identifying promoter-enhancer interactions. We use a unified probabilistic model to segment the genome into domains, which we then merge hierarchically and fit using a local background model, allowing us to identify over-represented DNA-DNA interactions across the genome. By analyzing the published Hi-C data sets in human and mouse, we identify hundreds of thousands of putative enhancers and their target genes, and compile an extensive genome-wide catalog of gene regulation in human and mouse. As we show, our predictions are highly enriched for ChIP-seq and DNA accessibility data, evolutionary conservation, eQTLs and other DNA-DNA interaction data.

  19. Systemic errors in quantitative polymerase chain reaction titration of self-complementary adeno-associated viral vectors and improved alternative methods.

    PubMed

    Fagone, Paolo; Wright, J Fraser; Nathwani, Amit C; Nienhuis, Arthur W; Davidoff, Andrew M; Gray, John T

    2012-02-01

    Self-complementary AAV (scAAV) vector genomes contain a covalently closed hairpin derived from a mutated inverted terminal repeat that connects the two monomer single-stranded genomes into a head-to-head or tail-to-tail dimer. We found that during quantitative PCR (qPCR) this structure inhibits the amplification of proximal amplicons and causes the systemic underreporting of copy number by as much as 10-fold. We show that cleavage of scAAV vector genomes with restriction endonuclease to liberate amplicons from the covalently closed terminal hairpin restores quantitative amplification, and we implement this procedure in a simple, modified qPCR titration method for scAAV vectors. In addition, we developed and present an AAV genome titration procedure based on gel electrophoresis that requires minimal sample processing and has low interassay variability, and as such is well suited for the rigorous quality control demands of clinical vector production facilities.

  20. Single-cell sequencing and tumorigenesis: improved understanding of tumor evolution and metastasis.

    PubMed

    Ellsworth, Darrell L; Blackburn, Heather L; Shriver, Craig D; Rabizadeh, Shahrooz; Soon-Shiong, Patrick; Ellsworth, Rachel E

    2017-12-01

    Extensive genomic and transcriptomic heterogeneity in human cancer often negatively impacts treatment efficacy and survival, thus posing a significant ongoing challenge for modern treatment regimens. State-of-the-art DNA- and RNA-sequencing methods now provide high-resolution genomic and gene expression portraits of individual cells, facilitating the study of complex molecular heterogeneity in cancer. Important developments in single-cell sequencing (SCS) technologies over the past 5 years provide numerous advantages over traditional sequencing methods for understanding the complexity of carcinogenesis, but significant hurdles must be overcome before SCS can be clinically useful. In this review, we: (1) highlight current methodologies and recent technological advances for isolating single cells, single-cell whole-genome and whole-transcriptome amplification using minute amounts of nucleic acids, and SCS, (2) summarize research investigating molecular heterogeneity at the genomic and transcriptomic levels and how this heterogeneity affects clonal evolution and metastasis, and (3) discuss the promise for integrating SCS in the clinical care arena for improved patient care.

  1. CRISPR Approaches to Small Molecule Target Identification. | Office of Cancer Genomics

    Cancer.gov

    A long-standing challenge in drug development is the identification of the mechanisms of action of small molecules with therapeutic potential. A number of methods have been developed to address this challenge, each with inherent strengths and limitations. We here provide a brief review of these methods with a focus on chemical-genetic methods that are based on systematically profiling the effects of genetic perturbations on drug sensitivity.

  2. Materials Genome Initiative Element

    NASA Technical Reports Server (NTRS)

    Vickers, John

    2015-01-01

    NASA is committed to developing new materials and manufacturing methods that can enable new missions with ever increasing mission demands. Typically, the development and certification of new materials and manufacturing methods in the aerospace industry has required more than 20 years of development time with a costly testing and certification program. To reduce the cost and time to mature these emerging technologies, NASA is developing computational materials tools to improve understanding of the material and guide the certification process.

  3. Bringing the fathead minnow into the genomic era | Science ...

    EPA Pesticide Factsheets

    The fathead minnow is a well-established ecotoxicological model organism that has been widely used for regulatory ecotoxicity testing and research for over a half century. While a large amount of molecular information has been gathered on the fathead minnow over the years, the lack of genomic sequence data has limited the utility of the fathead minnow for certain applications. To address this limitation, high-throughput Illumina sequencing technology was employed to sequence the fathead minnow genome. Approximately 100X coverage was achieved by sequencing several libraries of paired-end reads with differing genome insert sizes. Two draft genome assemblies were generated using the SOAPdenovo and String Graph Assembler (SGA) methods, respectively. When these were compared, the SOAPdenovo assembly had a higher scaffold N50 value of 60.4 kbp versus 15.4 kbp, and it also performed better in a Core Eukaryotic Genes Mapping Analysis (CEGMA), mapping 91% versus 67% of genes. As such, this assembly was selected for further development and annotation. The foundation for genome annotation was generated using AUGUSTUS, an ab initio method for gene prediction. A total of 43,345 potential coding sequences were predicted on the genome assembly. These predicted sequences were translated to peptides and queried in a BLAST search against all vertebrates, with 28,290 of these sequences corresponding to zebrafish peptides and 5,242 producing no significant alignments. Additional ty

  4. A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.

    PubMed

    Fostier, Jan; Proost, Sebastian; Dhoedt, Bart; Saeys, Yvan; Demeester, Piet; Van de Peer, Yves; Vandepoele, Klaas

    2011-03-15

    Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes. http://bioinformatics.psb.ugent.be/software. The algorithm is implemented as a part of the i-ADHoRe 3.0 package.

  5. Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C)

    PubMed Central

    DeMaere, Matthew Z.

    2016-01-01

    Background Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. Methods We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. Results When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. Discussion Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development. PMID:27843713

  6. Screening of a Brassica napus bacterial artificial chromosome library using highly parallel single nucleotide polymorphism assays

    PubMed Central

    2013-01-01

    Background Efficient screening of bacterial artificial chromosome (BAC) libraries with polymerase chain reaction (PCR)-based markers is feasible provided that a multidimensional pooling strategy is implemented. Single nucleotide polymorphisms (SNPs) can be screened in multiplexed format, therefore this marker type lends itself particularly well for medium- to high-throughput applications. Combining the power of multiplex-PCR assays with a multidimensional pooling system may prove to be especially challenging in a polyploid genome. In polyploid genomes two classes of SNPs need to be distinguished, polymorphisms between accessions (intragenomic SNPs) and those differentiating between homoeologous genomes (intergenomic SNPs). We have assessed whether the highly parallel Illumina GoldenGate® Genotyping Assay is suitable for the screening of a BAC library of the polyploid Brassica napus genome. Results A multidimensional screening platform was developed for a Brassica napus BAC library which is composed of almost 83,000 clones. Intragenomic and intergenomic SNPs were included in Illumina’s GoldenGate® Genotyping Assay and both SNP classes were used successfully for screening of the multidimensional BAC pools of the Brassica napus library. An optimized scoring method is proposed which is especially valuable for SNP calling of intergenomic SNPs. Validation of the genotyping results by independent methods revealed a success of approximately 80% for the multiplex PCR-based screening regardless of whether intra- or intergenomic SNPs were evaluated. Conclusions Illumina’s GoldenGate® Genotyping Assay can be efficiently used for screening of multidimensional Brassica napus BAC pools. SNP calling was specifically tailored for the evaluation of BAC pool screening data. The developed scoring method can be implemented independently of plant reference samples. It is demonstrated that intergenomic SNPs represent a powerful tool for BAC library screening of a polyploid genome. PMID:24010766

  7. Systems biology of embryonic development: Prospects for a complete understanding of the Caenorhabditis elegans embryo.

    PubMed

    Murray, John Isaac

    2018-05-01

    The convergence of developmental biology and modern genomics tools brings the potential for a comprehensive understanding of developmental systems. This is especially true for the Caenorhabditis elegans embryo because its small size, invariant developmental lineage, and powerful genetic and genomic tools provide the prospect of a cellular resolution understanding of messenger RNA (mRNA) expression and regulation across the organism. We describe here how a systems biology framework might allow large-scale determination of the embryonic regulatory relationships encoded in the C. elegans genome. This framework consists of two broad steps: (a) defining the "parts list"-all genes expressed in all cells at each time during development and (b) iterative steps of computational modeling and refinement of these models by experimental perturbation. Substantial progress has been made towards defining the parts list through imaging methods such as large-scale green fluorescent protein (GFP) reporter analysis. Imaging results are now being augmented by high-resolution transcriptome methods such as single-cell RNA sequencing, and it is likely the complete expression patterns of all genes across the embryo will be known within the next few years. In contrast, the modeling and perturbation experiments performed so far have focused largely on individual cell types or genes, and improved methods will be needed to expand them to the full genome and organism. This emerging comprehensive map of embryonic expression and regulatory function will provide a powerful resource for developmental biologists, and would also allow scientists to ask questions not accessible without a comprehensive picture. This article is categorized under: Invertebrate Organogenesis > Worms Technologies > Analysis of the Transcriptome Gene Expression and Transcriptional Hierarchies > Gene Networks and Genomics. © 2018 Wiley Periodicals, Inc.

  8. [Development of a hepatitis B virus carrier transgenic mice model].

    PubMed

    Caner, Müge; Arat, Sezen; Bircan, Rifat

    2008-01-01

    The studies for the development of transgenic mice models which provide important profits for the studies concerning immunopathogenesis of hepatitis B virus (HBV) infections are in progress since 20 years. For this purpose different lineages bearing whole HBV genome or selected viral genes have been developed and their usage in clarifying the HBV replication and pathogenesis mechanisms have been emphasized. The aim of this study was to develop and breed a HBV carrier mice model. In the study the full HBV genome has been transferred to mouse embryos by microinjection procedure. Following transgenic manipulation, the HBV carriers among the daughter mice have been detected by molecular methods in which HBV-DNA replication and expression have been shown. The manipulations for transgene transfers have been performed in TUBITAK Marmara Research Center Transgene Laboratory, Gebze, Istanbul. The HBV-DNA carrier mice have been demonstrated by polymerase chain reaction (PCR) using the DNA samples obtained from tail tissues and also by dot-blot hybridization of the mice sera. Integrated HBV-DNA has been detected by applying in-situ hybridization to the liver tissue sections. HBV-DNA expression has been shown by reverse transcriptase PCR method with total RNA molecules that have been isolated from the liver tissues of the HBV-DNA carrier mice. HBsAg has been detected in the liver by immunohistochemical method, and HBsAg and HBeAg have additionally been demonstrated by ELISA. HBV genome, expression of the genome and the expression products have been determined in approximately 10% of the mice of which HBV-DNA have been transferred. By inbreeding heterozygote carrier mice, homozygote HBV transgenic mice line have been obtained. These HBV transgenic mice are the first lineages developed in our country. It is hopefully thought that this HBV carrier transgenic mouse model may contribute to the studies on the pathogenesis of HBV infections which are important health problems in the world as well as in Turkey.

  9. Approaches to Fungal Genome Annotation

    PubMed Central

    Haas, Brian J.; Zeng, Qiandong; Pearson, Matthew D.; Cuomo, Christina A.; Wortman, Jennifer R.

    2011-01-01

    Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center’s production genome annotation environment. PMID:22059117

  10. Multilocus sequence typing of total-genome-sequenced bacteria.

    PubMed

    Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole

    2012-04-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.

  11. Advances in genetic modification of pluripotent stem cells.

    PubMed

    Fontes, Andrew; Lakshmipathy, Uma

    2013-11-15

    Genetically engineered stem cells aid in dissecting basic cell function and are valuable tools for drug discovery, in vivo cell tracking, and gene therapy. Gene transfer into pluripotent stem cells has been a challenge due to their intrinsic feature of growing in clusters and hence not amenable to common gene delivery methods. Several advances have been made in the rapid assembly of DNA elements, optimization of culture conditions, and DNA delivery methods. This has lead to the development of viral and non-viral methods for transient or stable modification of cells, albeit with varying efficiencies. Most methods require selection and clonal expansion that demand prolonged culture and are not suited for cells with limited proliferative potential. Choosing the right platform based on preferred length, strength, and context of transgene expression is a critical step. Random integration of the transgene into the genome can be complicated due to silencing or altered regulation of expression due to genomic effects. An alternative to this are site-specific methods that target transgenes followed by screening to identify the genomic loci that support long-term expression with stem cell proliferation and differentiation. A highly precise and accurate editing of the genome driven by homology can be achieved using traditional methods as well as the newer technologies such as zinc finger nuclease, TAL effector nucleases and CRISPR. In this review, we summarize the different genetic engineering methods that have been successfully used to create modified embryonic and induced pluripotent stem cells. © 2013. Published by Elsevier Inc. All rights reserved.

  12. Novel genetic tools for studying food-borne Salmonella.

    PubMed

    Andrews-Polymenis, Helene L; Santiviago, Carlos A; McClelland, Michael

    2009-04-01

    Nontyphoidal Salmonellae are highly prevalent food-borne pathogens. High-throughput sequencing of Salmonella genomes is expanding our knowledge of the evolution of serovars and epidemic isolates. Genome sequences have also allowed the creation of complete microarrays. Microarrays have improved the throughput of in vivo expression technology (IVET) used to uncover promoters active during infection. In another method, signature tagged mutagenesis (STM), pools of mutants are subjected to selection. Changes in the population are monitored on a microarray, revealing genes under selection. Complete genome sequences permit the construction of pools of targeted in-frame deletions that have improved STM by minimizing the number of clones and the polarity of each mutant. Together, genome sequences and the continuing development of new tools for functional genomics will drive a revolution in the understanding of Salmonellae in many different niches that are critical for food safety.

  13. Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing

    PubMed Central

    Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi

    2016-01-01

    Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039

  14. Detecting and Characterizing Genomic Signatures of Positive Selection in Global Populations

    PubMed Central

    Liu, Xuanyao; Ong, Rick Twee-Hee; Pillai, Esakimuthu Nisha; Elzein, Abier M.; Small, Kerrin S.; Clark, Taane G.; Kwiatkowski, Dominic P.; Teo, Yik-Ying

    2013-01-01

    Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event. PMID:23731540

  15. Selective recruitment of nuclear factors to productively replicating herpes simplex virus genomes.

    PubMed

    Dembowski, Jill A; DeLuca, Neal A

    2015-05-01

    Much of the HSV-1 life cycle is carried out in the cell nucleus, including the expression, replication, repair, and packaging of viral genomes. Viral proteins, as well as cellular factors, play essential roles in these processes. Isolation of proteins on nascent DNA (iPOND) was developed to label and purify cellular replication forks. We adapted aspects of this method to label viral genomes to both image, and purify replicating HSV-1 genomes for the identification of associated proteins. Many viral and cellular factors were enriched on viral genomes, including factors that mediate DNA replication, repair, chromatin remodeling, transcription, and RNA processing. As infection proceeded, packaging and structural components were enriched to a greater extent. Among the more abundant proteins that copurified with genomes were the viral transcription factor ICP4 and the replication protein ICP8. Furthermore, all seven viral replication proteins were enriched on viral genomes, along with cellular PCNA and topoisomerases, while other cellular replication proteins were not detected. The chromatin-remodeling complexes present on viral genomes included the INO80, SWI/SNF, NURD, and FACT complexes, which may prevent chromatinization of the genome. Consistent with this conclusion, histones were not readily recovered with purified viral genomes, and imaging studies revealed an underrepresentation of histones on viral genomes. RNA polymerase II, the mediator complex, TFIID, TFIIH, and several other transcriptional activators and repressors were also affinity purified with viral DNA. The presence of INO80, NURD, SWI/SNF, mediator, TFIID, and TFIIH components is consistent with previous studies in which these complexes copurified with ICP4. Therefore, ICP4 is likely involved in the recruitment of these key cellular chromatin remodeling and transcription factors to viral genomes. Taken together, iPOND is a valuable method for the study of viral genome dynamics during infection and provides a comprehensive view of how HSV-1 selectively utilizes cellular resources.

  16. From Agrobacterium to viral vectors: genome modification of plant cells by rare cutting restriction enzymes.

    PubMed

    Marton, Ira; Honig, Arik; Omid, Ayelet; De Costa, Noam; Marhevka, Elena; Cohen, Barry; Zuker, Amir; Vainstein, Alexander

    2013-01-01

    Researchers and biotechnologists require methods to accurately modify the genome of higher eukaryotic cells. Such modifications include, but are not limited to, site-specific mutagenesis, site-specific insertion of foreign DNA, and replacement and deletion of native sequences. Accurate genome modifications in plant species have been rather limited, with only a handful of plant species and genes being modified through the use of early genome-editing techniques. The development of rare-cutting restriction enzymes as a tool for the induction of site-specific genomic double-strand breaks and their introduction as a reliable tool for genome modification in animals, animal cells and human cell lines have paved the way for the adaptation of rare-cutting restriction enzymes to genome editing in plant cells. Indeed, the number of plant species and genes which have been successfully edited using zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and engineered homing endonucleases is on the rise. In our review, we discuss the basics of rare-cutting restriction enzyme-mediated genome-editing technology with an emphasis on its application in plant species.

  17. Secure distributed genome analysis for GWAS and sequence comparison computation

    PubMed Central

    2015-01-01

    Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307

  18. Production of genome-edited pluripotent stem cells and mice by CRISPR/Cas.

    PubMed

    Horii, Takuro; Hatada, Izuho

    2016-01-01

    Clustered regularly at interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) nucleases, so-called CRISPR/Cas, was recently developed as an epoch-making genome engineering technology. This system only requires Cas9 nuclease and single-guide RNA complementary to a target locus. CRISPR/Cas enables the generation of knockout cells and animals in a single step. This system can also be used to generate multiple mutations and knockin in a single step, which is not possible using other methods. In this review, we provide an overview of genome editing by CRISPR/Cas in pluripotent stem cells and mice.

  19. A Method to Evaluate Genome-Wide Methylation in Archival Formalin-Fixed, Paraffin-Embedded Ovarian Epithelial Cells

    PubMed Central

    Li, Qiling; Li, Min; Ma, Li; Li, Wenzhi; Wu, Xuehong; Richards, Jendai; Fu, Guoxing; Xu, Wei; Bythwood, Tameka; Li, Xu; Wang, Jianxin; Song, Qing

    2014-01-01

    Background The use of DNA from archival formalin and paraffin embedded (FFPE) tissue for genetic and epigenetic analyses may be problematic, since the DNA is often degraded and only limited amounts may be available. Thus, it is currently not known whether genome-wide methylation can be reliably assessed in DNA from archival FFPE tissue. Methodology/Principal Findings Ovarian tissues, which were obtained and formalin-fixed and paraffin-embedded in either 1999 or 2011, were sectioned and stained with hematoxylin-eosin (H&E).Epithelial cells were captured by laser micro dissection, and their DNA subjected to whole genomic bisulfite conversion, whole genomic polymerase chain reaction (PCR) amplification, and purification. Sequencing and software analyses were performed to identify the extent of genomic methylation. We observed that 31.7% of sequence reads from the DNA in the 1999 archival FFPE tissue, and 70.6% of the reads from the 2011 sample, could be matched with the genome. Methylation rates of CpG on the Watson and Crick strands were 32.2% and 45.5%, respectively, in the 1999 sample, and 65.1% and 42.7% in the 2011 sample. Conclusions/Significance We have developed an efficient method that allows DNA methylation to be assessed in archival FFPE tissue samples. PMID:25133528

  20. Inference of Ancestral Recombination Graphs through Topological Data Analysis

    PubMed Central

    Cámara, Pablo G.; Levine, Arnold J.; Rabadán, Raúl

    2016-01-01

    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands. PMID:27532298

  1. A silica sands-based method for faithful analysis of microbial communities and DNA isolation from a wide range of species.

    PubMed

    Liu, Xia; Xu, Yongdong; Li, Zhi; Jiang, Shengwei; Yao, Shuo; Wu, Rina; An, Yingfeng

    2018-04-21

    A silica sands-based method has been developed to isolate high quality genomic DNAs from cells of animals, plants and microorganisms, such as Hemisalanx prognathus, Spinacia oleracea, Pichia pastoris, Bacillus licheniformis and Escherichia coli. To the best of our knowledge, no DNA isolation method has so wide application until now. In addition, this method and a commercially available kit were compared in analysis of microbial communities using high-throughput 16s rDNA sequencing. As a result, the silica sands-based method was found to be even more efficient in isolating genomic DNA from gram-positive bacteria than the kit, indicating that it would become a very valuable choice to faithfully reflect the composition of microbial communities.

  2. Electronic Health Record Design and Implementation for Pharmacogenomics: a Local Perspective

    PubMed Central

    Peterson, Josh F.; Bowton, Erica; Field, Julie R.; Beller, Marc; Mitchell, Jennifer; Schildcrout, Jonathan; Gregg, William; Johnson, Kevin; Jirjis, Jim N; Roden, Dan M.; Pulley, Jill M.; Denny, Josh C.

    2014-01-01

    Purpose The design of electronic health records (EHR) to translate genomic medicine into clinical care is crucial to successful introduction of new genomic services, yet there are few published guides to implementation. Methods The design, implemented features, and evolution of a locally developed EHR that supports a large pharmacogenomics program at a tertiary care academic medical center was tracked over a 4-year development period. Results Developers and program staff created EHR mechanisms for ordering a pharmacogenomics panel in advance of clinical need (preemptive genotyping) and in response to a specific drug indication. Genetic data from panel-based genotyping were sequestered from the EHR until drug-gene interactions (DGIs) met evidentiary standards and deemed clinically actionable. A service to translate genotype to predicted drug response phenotype populated a summary of DGIs, triggered inpatient and outpatient clinical decision support, updated laboratory records, and created gene results within online personal health records. Conclusion The design of a locally developed EHR supporting pharmacogenomics has generalizable utility. The challenge of representing genomic data in a comprehensible and clinically actionable format is discussed along with reflection on the scalability of the model to larger sets of genomic data. PMID:24009000

  3. INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles

    PubMed Central

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B.

    2013-01-01

    Background The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo. PMID:24324765

  4. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    PubMed

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba Alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B

    2013-01-01

    The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.

  5. GENOMIC DIVERSITY AND THE MICROENVIRONMENT AS DRIVERS OF PROGRESSION IN DCIS

    DTIC Science & Technology

    2017-10-01

    stains, including quantitative analysis, 7) Identification of upstaged DCIS cases for the radiology aim, 8) Development of image analysis methods for...goals of the project? Aim 1. Determine whether genetic diversity of DCIS is greater in DCIS with adjacent invasive disease compared to DCIS without... compared to DCIS without IDC. Since genomics is not the sole driver of tumor behavior, we will phenotypically characterize DCIS and its

  6. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    PubMed

    Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca

    2015-01-01

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  7. The proteome: structure, function and evolution

    PubMed Central

    Fleming, Keiran; Kelley, Lawrence A; Islam, Suhail A; MacCallum, Robert M; Muller, Arne; Pazos, Florencio; Sternberg, Michael J.E

    2006-01-01

    This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family. PMID:16524832

  8. Whole Genome Sequencing and Multiplex qPCR Methods to Identify Campylobacter jejuni Encoding cst-II or cst-III Sialyltransferase

    PubMed Central

    Neal-McKinney, Jason M.; Liu, Kun C.; Jinneman, Karen C.; Wu, Wen-Hsin; Rice, Daniel H.

    2018-01-01

    Campylobacter jejuni causes more than 2 million cases of gastroenteritis annually in the United States, and is also linked to the autoimmune sequelae Guillan–Barre syndrome (GBS). GBS often results in flaccid paralysis, as the myelin sheaths of nerve cells are degraded by the adaptive immune response. Certain strains of C. jejuni modify their lipooligosaccharide (LOS) with the addition of neuraminic acid, resulting in LOS moieties that are structurally similar to gangliosides present on nerve cells. This can trigger GBS in a susceptible host, as antibodies generated against C. jejuni can cross-react with gangliosides, leading to demyelination of nerves and a loss of signal transduction. The goal of this study was to develop a quantitative PCR (qPCR) method and use whole genome sequencing data to detect the Campylobacter sialyltransferase (cst) genes responsible for the addition of neuraminic acid to LOS. The qPCR method was used to screen a library of 89 C. jejuni field samples collected by the Food and Drug Administration Pacific Northwest Lab (PNL) as well as clinical isolates transferred to PNL. In silico analysis was used to screen 827 C. jejuni genomes in the FDA GenomeTrakr SRA database. The results indicate that a majority of C. jejuni strains could produce LOS with ganglioside mimicry, as 43.8% of PNL isolates and 46.9% of the GenomeTrakr isolates lacked the cst genes. The methods described in this study can be used by public health laboratories to rapidly determine whether a C. jejuni isolate has the potential to induce GBS. Based on these results, a majority of C. jejuni in the PNL collection and submitted to GenomeTrakr have the potential to produce LOS that mimics human gangliosides. PMID:29615986

  9. Whole Genome Sequencing and Multiplex qPCR Methods to Identify Campylobacter jejuni Encoding cst-II or cst-III Sialyltransferase.

    PubMed

    Neal-McKinney, Jason M; Liu, Kun C; Jinneman, Karen C; Wu, Wen-Hsin; Rice, Daniel H

    2018-01-01

    Campylobacter jejuni causes more than 2 million cases of gastroenteritis annually in the United States, and is also linked to the autoimmune sequelae Guillan-Barre syndrome (GBS). GBS often results in flaccid paralysis, as the myelin sheaths of nerve cells are degraded by the adaptive immune response. Certain strains of C. jejuni modify their lipooligosaccharide (LOS) with the addition of neuraminic acid, resulting in LOS moieties that are structurally similar to gangliosides present on nerve cells. This can trigger GBS in a susceptible host, as antibodies generated against C. jejuni can cross-react with gangliosides, leading to demyelination of nerves and a loss of signal transduction. The goal of this study was to develop a quantitative PCR (qPCR) method and use whole genome sequencing data to detect the Campylobacter sialyltransferase ( cst ) genes responsible for the addition of neuraminic acid to LOS. The qPCR method was used to screen a library of 89 C. jejuni field samples collected by the Food and Drug Administration Pacific Northwest Lab (PNL) as well as clinical isolates transferred to PNL. In silico analysis was used to screen 827 C. jejuni genomes in the FDA GenomeTrakr SRA database. The results indicate that a majority of C. jejuni strains could produce LOS with ganglioside mimicry, as 43.8% of PNL isolates and 46.9% of the GenomeTrakr isolates lacked the cst genes. The methods described in this study can be used by public health laboratories to rapidly determine whether a C. jejuni isolate has the potential to induce GBS. Based on these results, a majority of C. jejuni in the PNL collection and submitted to GenomeTrakr have the potential to produce LOS that mimics human gangliosides.

  10. Alignment-free genome tree inference by learning group-specific distance metrics.

    PubMed

    Patil, Kaustubh R; McHardy, Alice C

    2013-01-01

    Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.

  11. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis

    PubMed Central

    Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao

    2016-01-01

    Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214

  12. Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure

    PubMed Central

    Coyne, Robert S; Thiagarajan, Mathangi; Jones, Kristie M; Wortman, Jennifer R; Tallon, Luke J; Haas, Brian J; Cassidy-Hanley, Donna M; Wiley, Emily A; Smith, Joshua J; Collins, Kathleen; Lee, Suzanne R; Couvillion, Mary T; Liu, Yifan; Garg, Jyoti; Pearlman, Ronald E; Hamilton, Eileen P; Orias, Eduardo; Eisen, Jonathan A; Methé, Barbara A

    2008-01-01

    Background Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes. PMID:19036158

  13. Plant-RRBS, a bisulfite and next-generation sequencing-based methylome profiling method enriching for coverage of cytosine positions.

    PubMed

    Schmidt, Martin; Van Bel, Michiel; Woloszynska, Magdalena; Slabbinck, Bram; Martens, Cindy; De Block, Marc; Coppens, Frederik; Van Lijsebettens, Mieke

    2017-07-06

    Cytosine methylation in plant genomes is important for the regulation of gene transcription and transposon activity. Genome-wide methylomes are studied upon mutation of the DNA methyltransferases, adaptation to environmental stresses or during development. However, from basic biology to breeding programs, there is a need to monitor multiple samples to determine transgenerational methylation inheritance or differential cytosine methylation. Methylome data obtained by sodium hydrogen sulfite (bisulfite)-conversion and next-generation sequencing (NGS) provide genome-wide information on cytosine methylation. However, a profiling method that detects cytosine methylation state dispersed over the genome would allow high-throughput analysis of multiple plant samples with distinct epigenetic signatures. We use specific restriction endonucleases to enrich for cytosine coverage in a bisulfite and NGS-based profiling method, which was compared to whole-genome bisulfite sequencing of the same plant material. We established an effective methylome profiling method in plants, termed plant-reduced representation bisulfite sequencing (plant-RRBS), using optimized double restriction endonuclease digestion, fragment end repair, adapter ligation, followed by bisulfite conversion, PCR amplification and NGS. We report a performant laboratory protocol and a straightforward bioinformatics data analysis pipeline for plant-RRBS, applicable for any reference-sequenced plant species. As a proof of concept, methylome profiling was performed using an Oryza sativa ssp. indica pure breeding line and a derived epigenetically altered line (epiline). Plant-RRBS detects methylation levels at tens of millions of cytosine positions deduced from bisulfite conversion in multiple samples. To evaluate the method, the coverage of cytosine positions, the intra-line similarity and the differential cytosine methylation levels between the pure breeding line and the epiline were determined. Plant-RRBS reproducibly covers commonly up to one fourth of the cytosine positions in the rice genome when using MspI-DpnII within a group of five biological replicates of a line. The method predominantly detects cytosine methylation in putative promoter regions and not-annotated regions in rice. Plant-RRBS offers high-throughput and broad, genome-dispersed methylation detection by effective read number generation obtained from reproducibly covered genome fractions using optimized endonuclease combinations, facilitating comparative analyses of multi-sample studies for cytosine methylation and transgenerational stability in experimental material and plant breeding populations.

  14. CRISPR mediated somatic cell genome engineering in the chicken.

    PubMed

    Véron, Nadège; Qu, Zhengdong; Kipen, Phoebe A S; Hirst, Claire E; Marcelle, Christophe

    2015-11-01

    Gene-targeted knockout technologies are invaluable tools for understanding the functions of genes in vivo. CRISPR/Cas9 system of RNA-guided genome editing is revolutionizing genetics research in a wide spectrum of organisms. Here, we combined CRISPR with in vivo electroporation in the chicken embryo to efficiently target the transcription factor PAX7 in tissues of the developing embryo. This approach generated mosaic genetic mutations within a wild-type cellular background. This series of proof-of-principle experiments indicate that in vivo CRISPR-mediated cell genome engineering is an effective method to achieve gene loss-of-function in the tissues of the chicken embryo and it completes the growing genetic toolbox to study the molecular mechanisms regulating development in this important animal model. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Does genomic selection have a future in plant breeding?

    PubMed

    Jonas, Elisabeth; de Koning, Dirk-Jan

    2013-09-01

    Plant breeding largely depends on phenotypic selection in plots and only for some, often disease-resistance-related traits, uses genetic markers. The more recently developed concept of genomic selection, using a black box approach with no need of prior knowledge about the effect or function of individual markers, has also been proposed as a great opportunity for plant breeding. Several empirical and theoretical studies have focused on the possibility to implement this as a novel molecular method across various species. Although we do not question the potential of genomic selection in general, in this Opinion, we emphasize that genomic selection approaches from dairy cattle breeding cannot be easily applied to complex plant breeding. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. cisprimertool: software to implement a comparative genomics strategy for the development of conserved intron scanning (CIS) markers.

    PubMed

    Jayashree, B; Jagadeesh, V T; Hoisington, D

    2008-05-01

    The availability of complete, annotated genomic sequence information in model organisms is a rich resource that can be extended to understudied orphan crops through comparative genomic approaches. We report here a software tool (cisprimertool) for the identification of conserved intron scanning regions using expressed sequence tag alignments to a completely sequenced model crop genome. The method used is based on earlier studies reporting the assessment of conserved intron scanning primers (called CISP) within relatively conserved exons located near exon-intron boundaries from onion, banana, sorghum and pearl millet alignments with rice. The tool is freely available to academic users at http://www.icrisat.org/gt-bt/CISPTool.htm. © 2007 ICRISAT.

  17. CscoreTool: fast Hi-C compartment analysis at high resolution.

    PubMed

    Zheng, Xiaobin; Zheng, Yixian

    2018-05-01

    The genome-wide chromosome conformation capture (Hi-C) has revealed that the eukaryotic genome can be partitioned into A and B compartments that have distinctive chromatin and transcription features. Current Principle Component Analyses (PCA)-based method for the A/B compartment prediction based on Hi-C data requires substantial CPU time and memory. We report the development of a method, CscoreTool, which enables fast and memory-efficient determination of A/B compartments at high resolution even in datasets with low sequencing depth. https://github.com/scoutzxb/CscoreTool. xzheng@carnegiescience.edu. Supplementary data are available at Bioinformatics online.

  18. "Harnessing genomics to improve health in Africa" - an executive course to support genomics policy.

    PubMed

    Smith, Alyna C; Mugabe, John; Singer, Peter A; Daar, Abdallah S

    2005-01-24

    BACKGROUND: Africa in the twenty-first century is faced with a heavy burden of disease, combined with ill-equipped medical systems and underdeveloped technological capacity. A major challenge for the international community is to bring scientific and technological advances like genomics to bear on the health priorities of poorer countries. The New Partnership for Africa's Development has identified science and technology as a key platform for Africa's renewal. Recognizing the timeliness of this issue, the African Centre for Technology Studies and the University of Toronto Joint Centre for Bioethics co-organized a course on Genomics and Public Health Policy in Nairobi, Kenya, the first of a series of similar courses to take place in the developing world. This article presents the findings and recommendations that emerged from this process, recommendations which suggest that a regional approach to developing sound science and technology policies is the key to harnessing genome-related biotechnology to improve health and contribute to human development in Africa. METHODS: The objectives of the course were to familiarize participants with the current status and implications of genomics for health in Africa; to provide frameworks for analyzing and debating the policy and ethical questions; and to begin developing a network across different sectors by sharing perspectives and building relationships. To achieve these goals the course brought together a diverse group of stakeholders from academic research centres, the media, non-governmental, voluntary and legal organizations to stimulate multi-sectoral debate around issues of policy. Topics included scientific advances in genomics innovation systems and business models, international regulatory frameworks, as well as ethical and legal issues. RESULTS: Seven main recommendations emerged: establish a network for sustained dialogue among participants; identify champions among politicians; use the New Plan for African Development (NEPAD) as entry point onto political agenda; commission an African capacity survey in genomics-related R&D to determine areas of strength; undertake a detailed study of R&D models with demonstrated success in the developing world, i.e. China, India, Cuba, Brazil; establish seven regional research centres of excellence; and, create sustainable financing mechanisms. A concrete outcome of this intensive five-day course was the establishment of the African Genome Policy Forum, a multi-stakeholder forum to foster further discussion on policy. CONCLUSION: With African leaders engaged in the New Partnership for Africa's Development, science and technology is well poised to play a valuable role in Africa's renewal, by contributing to economic development and to improved health. Africa's first course on Genomics and Public Health Policy aspired to contribute to the effort to bring this issue to the forefront of the policy debate, focusing on genomics through the lens of public health. The process that has led to this course has served as a model for three subsequent courses (in India, Venezuela and Oman), and the establishment of similar regional networks on genomics and policy, which could form the basis for inter-regional dialogue in the future.

  19. Prediction of Chemical Respiratory Sensitizers Using GARD, a Novel In Vitro Assay Based on a Genomic Biomarker Signature

    PubMed Central

    Albrekt, Ann-Sofie; Borrebaeck, Carl A. K.; Lindstedt, Malin

    2015-01-01

    Background Repeated exposure to certain low molecular weight (LMW) chemical compounds may result in development of allergic reactions in the skin or in the respiratory tract. In most cases, a certain LMW compound selectively sensitize the skin, giving rise to allergic contact dermatitis (ACD), or the respiratory tract, giving rise to occupational asthma (OA). To limit occurrence of allergic diseases, efforts are currently being made to develop predictive assays that accurately identify chemicals capable of inducing such reactions. However, while a few promising methods for prediction of skin sensitization have been described, to date no validated method, in vitro or in vivo, exists that is able to accurately classify chemicals as respiratory sensitizers. Results Recently, we presented the in vitro based Genomic Allergen Rapid Detection (GARD) assay as a novel testing strategy for classification of skin sensitizing chemicals based on measurement of a genomic biomarker signature. We have expanded the applicability domain of the GARD assay to classify also respiratory sensitizers by identifying a separate biomarker signature containing 389 differentially regulated genes for respiratory sensitizers in comparison to non-respiratory sensitizers. By using an independent data set in combination with supervised machine learning, we validated the assay, showing that the identified genomic biomarker is able to accurately classify respiratory sensitizers. Conclusions We have identified a genomic biomarker signature for classification of respiratory sensitizers. Combining this newly identified biomarker signature with our previously identified biomarker signature for classification of skin sensitizers, we have developed a novel in vitro testing strategy with a potent ability to predict both skin and respiratory sensitization in the same sample. PMID:25760038

  20. A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds.

    PubMed

    Yang, Songbai; Li, Xiuling; Li, Kui; Fan, Bin; Tang, Zhonglin

    2014-01-15

    Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs.

  1. A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds

    PubMed Central

    2014-01-01

    Background Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. Results We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. Conclusions In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs. PMID:24422716

  2. Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data

    PubMed Central

    2017-01-01

    Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. PMID:28821014

  3. Whole-genome single-nucleotide polymorphism (SNP) marker discovery and association analysis with the eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content in Larimichthys crocea

    PubMed Central

    Xiao, Shijun; Wang, Panpan; Dong, Linsong; Zhang, Yaguang; Han, Zhaofang; Wang, Qiurong

    2016-01-01

    Whole-genome single-nucleotide polymorphism (SNP) markers are valuable genetic resources for the association and conservation studies. Genome-wide SNP development in many teleost species are still challenging because of the genome complexity and the cost of re-sequencing. Genotyping-By-Sequencing (GBS) provided an efficient reduced representative method to squeeze cost for SNP detection; however, most of recent GBS applications were reported on plant organisms. In this work, we used an EcoRI-NlaIII based GBS protocol to teleost large yellow croaker, an important commercial fish in China and East-Asia, and reported the first whole-genome SNP development for the species. 69,845 high quality SNP markers that evenly distributed along genome were detected in at least 80% of 500 individuals. Nearly 95% randomly selected genotypes were successfully validated by Sequenom MassARRAY assay. The association studies with the muscle eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content discovered 39 significant SNP markers, contributing as high up to ∼63% genetic variance that explained by all markers. Functional genes that involved in fat digestion and absorption pathway were identified, such as APOB, CRAT and OSBPL10. Notably, PPT2 Gene, previously identified in the association study of the plasma n-3 and n-6 polyunsaturated fatty acid level in human, was re-discovered in large yellow croaker. Our study verified that EcoRI-NlaIII based GBS could produce quality SNP markers in a cost-efficient manner in teleost genome. The developed SNP markers and the EPA and DHA associated SNP loci provided invaluable resources for the population structure, conservation genetics and genomic selection of large yellow croaker and other fish organisms. PMID:28028455

  4. Salivary biomarker development using genomic, proteomic and metabolomic approaches

    PubMed Central

    2012-01-01

    The use of saliva as a diagnostic sample provides a non-invasive, cost-efficient method of sample collection for disease screening without the need for highly trained professionals. Saliva collection is far more practical and safe compared with invasive methods of sample collection, because of the infection risk from contaminated needles during, for example, blood sampling. Furthermore, the use of saliva could increase the availability of accurate diagnostics for remote and impoverished regions. However, the development of salivary diagnostics has required technical innovation to allow stabilization and detection of analytes in the complex molecular mixture that is saliva. The recent development of cost-effective room temperature analyte stabilization methods, nucleic acid pre-amplification techniques and direct saliva transcriptomic analysis have allowed accurate detection and quantification of transcripts found in saliva. Novel protein stabilization methods have also facilitated improved proteomic analyses. Although candidate biomarkers have been discovered using epigenetic, transcriptomic, proteomic and metabolomic approaches, transcriptomic analyses have so far achieved the most progress in terms of sensitivity and specificity, and progress towards clinical implementation. Here, we review recent developments in salivary diagnostics that have been accomplished using genomic, transcriptomic, proteomic and metabolomic approaches. PMID:23114182

  5. Genetic markers, genotyping methods & next generation sequencing in Mycobacterium tuberculosis

    PubMed Central

    Desikan, Srinidhi; Narayanan, Sujatha

    2015-01-01

    Molecular epidemiology (ME) is one of the main areas in tuberculosis research which is widely used to study the transmission epidemics and outbreaks of tubercle bacilli. It exploits the presence of various polymorphisms in the genome of the bacteria that can be widely used as genetic markers. Many DNA typing methods apply these genetic markers to differentiate various strains and to study the evolutionary relationships between them. The three widely used genotyping tools to differentiate Mycobacterium tuberculosis strains are IS6110 restriction fragment length polymorphism (RFLP), spacer oligotyping (Spoligotyping), and mycobacterial interspersed repeat units - variable number of tandem repeats (MIRU-VNTR). A new prospect towards ME was introduced with the development of whole genome sequencing (WGS) and the next generation sequencing (NGS) methods, where the entire genome is sequenced that not only helps in pointing out minute differences between the various sequences but also saves time and the cost. NGS is also found to be useful in identifying single nucleotide polymorphisms (SNPs), comparative genomics and also various aspects about transmission dynamics. These techniques enable the identification of mycobacterial strains and also facilitate the study of their phylogenetic and evolutionary traits. PMID:26205019

  6. Single-Cell-Based Platform for Copy Number Variation Profiling through Digital Counting of Amplified Genomic DNA Fragments.

    PubMed

    Li, Chunmei; Yu, Zhilong; Fu, Yusi; Pang, Yuhong; Huang, Yanyi

    2017-04-26

    We develop a novel single-cell-based platform through digital counting of amplified genomic DNA fragments, named multifraction amplification (mfA), to detect the copy number variations (CNVs) in a single cell. Amplification is required to acquire genomic information from a single cell, while introducing unavoidable bias. Unlike prevalent methods that directly infer CNV profiles from the pattern of sequencing depth, our mfA platform denatures and separates the DNA molecules from a single cell into multiple fractions of a reaction mix before amplification. By examining the sequencing result of each fraction for a specific fragment and applying a segment-merge maximum likelihood algorithm to the calculation of copy number, we digitize the sequencing-depth-based CNV identification and thus provide a method that is less sensitive to the amplification bias. In this paper, we demonstrate a mfA platform through multiple displacement amplification (MDA) chemistry. When performing the mfA platform, the noise of MDA is reduced; therefore, the resolution of single-cell CNV identification can be improved to 100 kb. We can also determine the genomic region free of allelic drop-out with mfA platform, which is impossible for conventional single-cell amplification methods.

  7. Scalable and Versatile Genome Editing Using Linear DNAs with Microhomology to Cas9 Sites in Caenorhabditis elegans

    PubMed Central

    Paix, Alexandre; Wang, Yuemeng; Smith, Harold E.; Lee, Chih-Yung S.; Calidas, Deepika; Lu, Tu; Smith, Jarrett; Schmidt, Helen; Krause, Michael W.; Seydoux, Geraldine

    2014-01-01

    Homology-directed repair (HDR) of double-strand DNA breaks is a promising method for genome editing, but is thought to be less efficient than error-prone nonhomologous end joining in most cell types. We have investigated HDR of double-strand breaks induced by CRISPR-associated protein 9 (Cas9) in Caenorhabditis elegans. We find that HDR is very robust in the C. elegans germline. Linear repair templates with short (∼30–60 bases) homology arms support the integration of base and gene-sized edits with high efficiency, bypassing the need for selection. Based on these findings, we developed a systematic method to mutate, tag, or delete any gene in the C. elegans genome without the use of co-integrated markers or long homology arms. We generated 23 unique edits at 11 genes, including premature stops, whole-gene deletions, and protein fusions to antigenic peptides and GFP. Whole-genome sequencing of five edited strains revealed the presence of passenger variants, but no mutations at predicted off-target sites. The method is scalable for multi-gene editing projects and could be applied to other animals with an accessible germline. PMID:25249454

  8. Development of the first consensus genetic map of intermediate wheatgrass (Thinopyrum intermedium) using genotyping-by-sequencing.

    PubMed

    Kantarski, Traci; Larson, Steve; Zhang, Xiaofei; DeHaan, Lee; Borevitz, Justin; Anderson, James; Poland, Jesse

    2017-01-01

    Development of the first consensus genetic map of intermediate wheatgrass gives insight into the genome and tools for molecular breeding. Intermediate wheatgrass (Thinopyrum intermedium) has been identified as a candidate for domestication and improvement as a perennial grain, forage, and biofuel crop and is actively being improved by several breeding programs. To accelerate this process using genomics-assisted breeding, efficient genotyping methods and genetic marker reference maps are needed. We present here the first consensus genetic map for intermediate wheatgrass (IWG), which confirms the species' allohexaploid nature (2n = 6x = 42) and homology to Triticeae genomes. Genotyping-by-sequencing was used to identify markers that fit expected segregation ratios and construct genetic maps for 13 heterogeneous parents of seven full-sib families. These maps were then integrated using a linear programming method to produce a consensus map with 21 linkage groups containing 10,029 markers, 3601 of which were present in at least two populations. Each of the 21 linkage groups contained between 237 and 683 markers, cumulatively covering 5061 cM (2891 cM--Kosambi) with an average distance of 0.5 cM between each pair of markers. Through mapping the sequence tags to the diploid (2n = 2x = 14) barley reference genome, we observed high colinearity and synteny between these genomes, with three homoeologous IWG chromosomes corresponding to each of the seven barley chromosomes, and mapped translocations that are known in the Triticeae. The consensus map is a valuable tool for wheat breeders to map important disease-resistance genes within intermediate wheatgrass. These genomic tools can help lead to rapid improvement of IWG and development of high-yielding cultivars of this perennial grain that would facilitate the sustainable intensification of agricultural systems.

  9. Reconstruction of 24 Penicillium genome-scale metabolic models shows diversity based on their secondary metabolism.

    PubMed

    Prigent, Sylvain; Nielsen, Jens Christian; Frisvad, Jens Christian; Nielsen, Jens

    2018-06-05

    Modelling of metabolism at the genome-scale have proved to be an efficient method for explaining observed phenotypic traits in living organisms. Further, it can be used as a means of predicting the effect of genetic modifications e.g. for development of microbial cell factories. With the increasing amount of genome sequencing data available, a need exists to accurately and efficiently generate such genome-scale metabolic models (GEMs) of non-model organisms, for which data is sparse. In this study, we present an automatic reconstruction approach applied to 24 Penicillium species, which have potential for production of pharmaceutical secondary metabolites or used in the manufacturing of food products such as cheeses. The models were based on the MetaCyc database and a previously published Penicillium GEM, and gave rise to comprehensive genome-scale metabolic descriptions. The models proved that while central carbon metabolism is highly conserved, secondary metabolic pathways represent the main diversity among the species. The automatic reconstruction approach presented in this study can be applied to generate GEMs of other understudied organisms, and the developed GEMs are a useful resource for the study of Penicillium metabolism, for example with the scope of developing novel cell factories. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  10. 04-ERD-052-Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loots, G G; Ovcharenko, I; Collette, N

    2007-02-26

    Generating the sequence of the human genome represents a colossal achievement for science and mankind. The technical use for the human genome project information holds great promise to cure disease, prevent bioterror threats, as well as to learn about human origins. Yet converting the sequence data into biological meaningful information has not been immediately obvious, and we are still in the preliminary stages of understanding how the genome is organized, what are the functional building blocks and how do these sequences mediate complex biological processes. The overarching goal of this program was to develop novel methods and high throughput strategiesmore » for determining the functions of ''anonymous'' human genes that are evolutionarily deeply conserved in other vertebrates. We coupled analytical tool development and computational predictions regarding gene function with novel high throughput experimental strategies and tested biological predictions in the laboratory. The tools required for comparative genomic data-mining are fundamentally the same whether they are applied to scientific studies of related microbes or the search for functions of novel human genes. For this reason the tools, conceptual framework and the coupled informatics-experimental biology paradigm we developed in this LDRD has many potential scientific applications relevant to LLNL multidisciplinary research in bio-defense, bioengineering, bionanosciences and microbial and environmental genomics.« less

  11. CRISPR/Cas9 mediated sequential editing of genes critical for ookinete motility in Plasmodium yoelii.

    PubMed

    Zhang, Cui; Gao, Han; Yang, Zhenke; Jiang, Yuanyuan; Li, Zhenkui; Wang, Xu; Xiao, Bo; Su, Xin-Zhuan; Cui, Huiting; Yuan, Jing

    2017-03-01

    CRISPR/Cas9 has been successfully adapted for gene editing in malaria parasites including Plasmodium falciparum and Plasmodium yoelii. However, the reported methods were limited to editing one gene at a time. In practice, it is often desired to modify multiple genetic loci in a parasite genome. Here we describe a CRISPR/Cas9 mediated genome editing method that allows successive modification of more than one gene in the genome of P. yoelii using an improved single-vector system (pYCm) we developed previously. Drug resistant genes encoding human dihydrofolate reductase (hDHFR) and a yeast bifunctional protein (yFCU), with cytosine deaminase (CD) and uridyl phosphoribosyl transferase (UPRT) activities in the plasmid, allowed sequential positive (pyrimethamine, Pyr) and negative (5-fluorocytosine, 5FC) selections and generation of transgenic parasites free of the episomal plasmid after genetic modification. Using this system, we were able to efficiently tag a gene of interest (Pyp28) and subsequently disrupted two genes (Pyctrp and Pycdpk3) that are individually critical for ookinete motility. Disruption of the genes either eliminated (Pyctrp) or greatly reduced (Pycdpk3) ookinete forward motility in matrigel in vitro and completely blocked oocyst development in mosquito midgut. The method will greatly facilitate studies of parasite gene function, development, and disease pathogenesis. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

    PubMed

    Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P

    2018-01-01

    We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

  13. Privacy-preserving techniques of genomic data-a survey.

    PubMed

    Aziz, Md Momin Al; Sadat, Md Nazmus; Alhadidi, Dima; Wang, Shuang; Jiang, Xiaoqian; Brown, Cheryl L; Mohammed, Noman

    2017-11-07

    Genomic data hold salient information about the characteristics of a living organism. Throughout the past decade, pinnacle developments have given us more accurate and inexpensive methods to retrieve genome sequences of humans. However, with the advancement of genomic research, there is a growing privacy concern regarding the collection, storage and analysis of such sensitive human data. Recent results show that given some background information, it is possible for an adversary to reidentify an individual from a specific genomic data set. This can reveal the current association or future susceptibility of some diseases for that individual (and sometimes the kinship between individuals) resulting in a privacy violation. Regardless of these risks, our genomic data hold much importance in analyzing the well-being of us and the future generation. Thus, in this article, we discuss the different privacy and security-related problems revolving around human genomic data. In addition, we will explore some of the cardinal cryptographic concepts, which can bring efficacy in secure and private genomic data computation. This article will relate the gaps between these two research areas-Cryptography and Genomics. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Genomic identification of regulatory elements by evolutionary sequence comparison and functional analysis.

    PubMed

    Loots, Gabriela G

    2008-01-01

    Despite remarkable recent advances in genomics that have enabled us to identify most of the genes in the human genome, comparable efforts to define transcriptional cis-regulatory elements that control gene expression are lagging behind. The difficulty of this task stems from two equally important problems: our knowledge of how regulatory elements are encoded in genomes remains elementary, and there is a vast genomic search space for regulatory elements, since most of mammalian genomes are noncoding. Comparative genomic approaches are having a remarkable impact on the study of transcriptional regulation in eukaryotes and currently represent the most efficient and reliable methods of predicting noncoding sequences likely to control the patterns of gene expression. By subjecting eukaryotic genomic sequences to computational comparisons and subsequent experimentation, we are inching our way toward a more comprehensive catalog of common regulatory motifs that lie behind fundamental biological processes. We are still far from comprehending how the transcriptional regulatory code is encrypted in the human genome and providing an initial global view of regulatory gene networks, but collectively, the continued development of comparative and experimental approaches will rapidly expand our knowledge of the transcriptional regulome.

  15. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data

    PubMed Central

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted ‘glmnet’). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition. PMID:27537694

  16. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

    PubMed

    Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition.

  17. The International Conference on Intelligent Biology and Medicine (ICIBM) 2016: summary and innovation in genomics.

    PubMed

    Zhao, Zhongming; Liu, Zhandong; Chen, Ken; Guo, Yan; Allen, Genevera I; Zhang, Jiajie; Jim Zheng, W; Ruan, Jianhua

    2017-10-03

    In this editorial, we first summarize the 2016 International Conference on Intelligent Biology and Medicine (ICIBM 2016) that was held on December 8-10, 2016 in Houston, Texas, USA, and then briefly introduce the ten research articles included in this supplement issue. ICIBM 2016 included four workshops or tutorials, four keynote lectures, four conference invited talks, eight concurrent scientific sessions and a poster session for 53 accepted abstracts, covering current topics in bioinformatics, systems biology, intelligent computing, and biomedical informatics. Through our call for papers, a total of 77 original manuscripts were submitted to ICIBM 2016. After peer review, 11 articles were selected in this special issue, covering topics such as single cell RNA-seq analysis method, genome sequence and variation analysis, bioinformatics method for vaccine development, and cancer genomics.

  18. Editing Transgenic DNA Components by Inducible Gene Replacement in Drosophila melanogaster

    PubMed Central

    Lin, Chun-Chieh; Potter, Christopher J.

    2016-01-01

    Gene conversions occur when genomic double-strand DNA breaks (DSBs) trigger unidirectional transfer of genetic material from a homologous template sequence. Exogenous or mutated sequence can be introduced through this homology-directed repair (HDR). We leveraged gene conversion to develop a method for genomic editing of existing transgenic insertions in Drosophila melanogaster. The clustered regularly-interspaced palindromic repeats (CRISPR)/Cas9 system is used in the homology assisted CRISPR knock-in (HACK) method to induce DSBs in a GAL4 transgene, which is repaired by a single-genomic transgenic construct containing GAL4 homologous sequences flanking a T2A-QF2 cassette. With two crosses, this technique converts existing GAL4 lines, including enhancer traps, into functional QF2 expressing lines. We used HACK to convert the most commonly-used GAL4 lines (labeling tissues such as neurons, fat, glia, muscle, and hemocytes) to QF2 lines. We also identified regions of the genome that exhibited differential efficiencies of HDR. The HACK technique is robust and readily adaptable for targeting and replacement of other genomic sequences, and could be a useful approach to repurpose existing transgenes as new genetic reagents become available. PMID:27334272

  19. Challenges and opportunities for genomic developmental neuropsychology: examples from the Penn-Drexel collaborative battery.

    PubMed

    Gur, Ruben C; Irani, Farzin; Seligman, Sarah; Calkins, Monica E; Richard, Jan; Gur, Raquel E

    2011-08-01

    Genomics has been revolutionizing medicine over the past decade by offering mechanistic insights into disease processes and engendering the age of "individualized medicine." Because of the sheer number of measures generated by gene sequencing methods, genomics requires "Big Science" where large datasets on genes are analyzed in reference to electronic medical record data. This revolution has largely bypassed the behavioral neurosciences, mainly because of the paucity of behavioral data in medical records and the labor-intensity of available neuropsychological assessment methods. We describe the development and implementation of an efficient neuroscience-based computerized battery, coupled with a computerized clinical assessment procedure. This assessment package has been applied to a genomic study of 10,000 children aged 8-21, of whom 1000 also undergo neuroimaging. Results from the first 3000 participants indicate sensitivity to neurodevelopmental trajectories. Sex differences were evident, with females outperforming males in memory and social cognition domains, while for spatial processing males were more accurate and faster, and they were faster on simple motor tasks. The study illustrates what will hopefully become a major component of the work of clinical and research neuropsychologists as invaluable participants in the dawning age of Big Science neuropsychological genomics.

  20. Quadruplex MAPH: improvement of throughput in high-resolution copy number screening.

    PubMed

    Tyson, Jess; Majerus, Tamsin Mo; Walker, Susan; Armour, John Al

    2009-09-28

    Copy number variation (CNV) in the human genome is recognised as a widespread and important source of human genetic variation. Now the challenge is to screen for these CNVs at high resolution in a reliable, accurate and cost-effective way. Multiplex Amplifiable Probe Hybridisation (MAPH) is a sensitive, high-resolution technology appropriate for screening for CNVs in a defined region, for a targeted population. We have developed MAPH to a highly multiplexed format ("QuadMAPH") that allows the user a four-fold increase in the number of loci tested simultaneously. We have used this method to analyse a genomic region of 210 kb, including the MSH2 gene and 120 kb of flanking DNA. We show that the QuadMAPH probes report copy number with equivalent accuracy to simplex MAPH, reliably demonstrating diploid copy number in control samples and accurately detecting deletions in Hereditary Non-Polyposis Colorectal Cancer (HNPCC) samples. QuadMAPH is an accurate, high-resolution method that allows targeted screening of large numbers of subjects without the expense of genome-wide approaches. Whilst we have applied this technique to a region of the human genome, it is equally applicable to the genomes of other organisms.

  1. An efficient genotyping method for genome-modified animals and human cells generated with CRISPR/Cas9 system.

    PubMed

    Zhu, Xiaoxiao; Xu, Yajie; Yu, Shanshan; Lu, Lu; Ding, Mingqin; Cheng, Jing; Song, Guoxu; Gao, Xing; Yao, Liangming; Fan, Dongdong; Meng, Shu; Zhang, Xuewen; Hu, Shengdi; Tian, Yong

    2014-09-19

    The rapid generation of various species and strains of laboratory animals using CRISPR/Cas9 technology has dramatically accelerated the interrogation of gene function in vivo. So far, the dominant approach for genotyping of genome-modified animals has been the T7E1 endonuclease cleavage assay. Here, we present a polyacrylamide gel electrophoresis-based (PAGE) method to genotype mice harboring different types of indel mutations. We developed 6 strains of genome-modified mice using CRISPR/Cas9 system, and utilized this approach to genotype mice from F0 to F2 generation, which included single and multiplexed genome-modified mice. We also determined the maximal detection sensitivity for detecting mosaic DNA using PAGE-based assay as 0.5%. We further applied PAGE-based genotyping approach to detect CRISPR/Cas9-mediated on- and off-target effect in human 293T and induced pluripotent stem cells (iPSCs). Thus, PAGE-based genotyping approach meets the rapidly increasing demand for genotyping of the fast-growing number of genome-modified animals and human cell lines created using CRISPR/Cas9 system or other nuclease systems such as TALEN or ZFN.

  2. An Efficient Genotyping Method for Genome-modified Animals and Human Cells Generated with CRISPR/Cas9 System

    PubMed Central

    Zhu, Xiaoxiao; Xu, Yajie; Yu, Shanshan; Lu, Lu; Ding, Mingqin; Cheng, Jing; Song, Guoxu; Gao, Xing; Yao, Liangming; Fan, Dongdong; Meng, Shu; Zhang, Xuewen; Hu, Shengdi; Tian, Yong

    2014-01-01

    The rapid generation of various species and strains of laboratory animals using CRISPR/Cas9 technology has dramatically accelerated the interrogation of gene function in vivo. So far, the dominant approach for genotyping of genome-modified animals has been the T7E1 endonuclease cleavage assay. Here, we present a polyacrylamide gel electrophoresis-based (PAGE) method to genotype mice harboring different types of indel mutations. We developed 6 strains of genome-modified mice using CRISPR/Cas9 system, and utilized this approach to genotype mice from F0 to F2 generation, which included single and multiplexed genome-modified mice. We also determined the maximal detection sensitivity for detecting mosaic DNA using PAGE-based assay as 0.5%. We further applied PAGE-based genotyping approach to detect CRISPR/Cas9-mediated on- and off-target effect in human 293T and induced pluripotent stem cells (iPSCs). Thus, PAGE-based genotyping approach meets the rapidly increasing demand for genotyping of the fast-growing number of genome-modified animals and human cell lines created using CRISPR/Cas9 system or other nuclease systems such as TALEN or ZFN. PMID:25236476

  3. Quadruplex MAPH: improvement of throughput in high-resolution copy number screening

    PubMed Central

    Tyson, Jess; Majerus, Tamsin MO; Walker, Susan; Armour, John AL

    2009-01-01

    Background Copy number variation (CNV) in the human genome is recognised as a widespread and important source of human genetic variation. Now the challenge is to screen for these CNVs at high resolution in a reliable, accurate and cost-effective way. Results Multiplex Amplifiable Probe Hybridisation (MAPH) is a sensitive, high-resolution technology appropriate for screening for CNVs in a defined region, for a targeted population. We have developed MAPH to a highly multiplexed format ("QuadMAPH") that allows the user a four-fold increase in the number of loci tested simultaneously. We have used this method to analyse a genomic region of 210 kb, including the MSH2 gene and 120 kb of flanking DNA. We show that the QuadMAPH probes report copy number with equivalent accuracy to simplex MAPH, reliably demonstrating diploid copy number in control samples and accurately detecting deletions in Hereditary Non-Polyposis Colorectal Cancer (HNPCC) samples. Conclusion QuadMAPH is an accurate, high-resolution method that allows targeted screening of large numbers of subjects without the expense of genome-wide approaches. Whilst we have applied this technique to a region of the human genome, it is equally applicable to the genomes of other organisms. PMID:19785739

  4. Therapeutic applications of CRISPR RNA-guided genome editing.

    PubMed

    Koo, Taeyoung; Kim, Jin-Soo

    2017-01-01

    The rapid development of programmable nuclease-based genome editing technologies has enabled targeted gene disruption and correction both in vitro and in vivo This revolution opens up the possibility of precise genome editing at target genomic sites to modulate gene function in animals and plants. Among several programmable nucleases, the type II clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated nuclease 9 (Cas9) system has progressed remarkably in recent years, leading to its widespread use in research, medicine and biotechnology. In particular, CRISPR-Cas9 shows highly efficient gene editing activity for therapeutic purposes in systems ranging from patient stem cells to animal models. However, the development of therapeutic approaches and delivery methods remains a great challenge for biomedical applications. Herein, we review therapeutic applications that use the CRISPR-Cas9 system and discuss the possibilities and challenges ahead. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  5. Genome-wide gene order distances support clustering the gram-positive bacteria

    PubMed Central

    House, Christopher H.; Pellegrini, Matteo; Fitz-Gibbon, Sorel T.

    2015-01-01

    Initially using 143 genomes, we developed a method for calculating the pair-wise distance between prokaryotic genomes using a Monte Carlo method to estimate the conservation of gene order. The method was based on repeatedly selecting five or six non-adjacent random orthologs from each of two genomes and determining if the chosen orthologs were in the same order. The raw distances were then corrected for gene order convergence using an adaptation of the Jukes-Cantor model, as well as using the common distance correction D′ = −ln(1-D). First, we compared the distances found via the order of six orthologs to distances found based on ortholog gene content and small subunit rRNA sequences. The Jukes-Cantor gene order distances are reasonably well correlated with the divergence of rRNA (R2 = 0.24), especially at rRNA Jukes-Cantor distances of less than 0.2 (R2 = 0.52). Gene content is only weakly correlated with rRNA divergence (R2 = 0.04) over all distances, however, it is especially strongly correlated at rRNA Jukes-Cantor distances of less than 0.1 (R2 = 0.67). This initial work suggests that gene order may be useful in conjunction with other methods to help understand the relatedness of genomes. Using the gene order distances in 143 genomes, the relations of prokaryotes were studied using neighbor joining and agreement subtrees. We then repeated our study of the relations of prokaryotes using gene order in 172 complete genomes better representing a wider-diversity of prokaryotes. Consistently, our trees show the Actinobacteria as a sister group to the bulk of the Firmicutes. In fact, the robustness of gene order support was found to be considerably greater for uniting these two phyla than for uniting any of the proteobacterial classes together. The results are supportive of the idea that Actinobacteria and Firmicutes are closely related, which in turn implies a single origin for the gram-positive cell. PMID:25653643

  6. [Progress in stable isotope labeled quantitative proteomics methods].

    PubMed

    Zhou, Yuan; Shan, Yichu; Zhang, Lihua; Zhang, Yukui

    2013-06-01

    Quantitative proteomics is an important research field in post-genomics era. There are two strategies for proteome quantification: label-free methods and stable isotope labeling methods which have become the most important strategy for quantitative proteomics at present. In the past few years, a number of quantitative methods have been developed, which support the fast development in biology research. In this work, we discuss the progress in the stable isotope labeling methods for quantitative proteomics including relative and absolute quantitative proteomics, and then give our opinions on the outlook of proteome quantification methods.

  7. Improved regulatory element prediction based on tissue-specific local epigenomic signatures

    PubMed Central

    He, Yupeng; Gorkin, David U.; Dickel, Diane E.; Nery, Joseph R.; Castanon, Rosa G.; Lee, Ah Young; Shen, Yin; Visel, Axel; Pennacchio, Len A.; Ren, Bing; Ecker, Joseph R.

    2017-01-01

    Accurate enhancer identification is critical for understanding the spatiotemporal transcriptional regulation during development as well as the functional impact of disease-related noncoding genetic variants. Computational methods have been developed to predict the genomic locations of active enhancers based on histone modifications, but the accuracy and resolution of these methods remain limited. Here, we present an algorithm, regulatory element prediction based on tissue-specific local epigenetic marks (REPTILE), which integrates histone modification and whole-genome cytosine DNA methylation profiles to identify the precise location of enhancers. We tested the ability of REPTILE to identify enhancers previously validated in reporter assays. Compared with existing methods, REPTILE shows consistently superior performance across diverse cell and tissue types, and the enhancer locations are significantly more refined. We show that, by incorporating base-resolution methylation data, REPTILE greatly improves upon current methods for annotation of enhancers across a variety of cell and tissue types. REPTILE is available at https://github.com/yupenghe/REPTILE/. PMID:28193886

  8. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

    PubMed

    Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A

    2016-07-01

    Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.

  9. Annotation and sequence diversity of transposable elements in common bean (Phaseolus vulgaris).

    PubMed

    Gao, Dongying; Abernathy, Brian; Rohksar, Daniel; Schmutz, Jeremy; Jackson, Scott A

    2014-01-01

    Common bean (Phaseolus vulgaris) is an important legume crop grown and consumed worldwide. With the availability of the common bean genome sequence, the next challenge is to annotate the genome and characterize functional DNA elements. Transposable elements (TEs) are the most abundant component of plant genomes and can dramatically affect genome evolution and genetic variation. Thus, it is pivotal to identify TEs in the common bean genome. In this study, we performed a genome-wide transposon annotation in common bean using a combination of homology and sequence structure-based methods. We developed a 2.12-Mb transposon database which includes 791 representative transposon sequences and is available upon request or from www.phytozome.org. Of note, nearly all transposons in the database are previously unrecognized TEs. More than 5,000 transposon-related expressed sequence tags (ESTs) were detected which indicates that some transposons may be transcriptionally active. Two Ty1-copia retrotransposon families were found to encode the envelope-like protein which has rarely been identified in plant genomes. Also, we identified an extra open reading frame (ORF) termed ORF2 from 15 Ty3-gypsy families that was located between the ORF encoding the retrotransposase and the 3'LTR. The ORF2 was in opposite transcriptional orientation to retrotransposase. Sequence homology searches and phylogenetic analysis suggested that the ORF2 may have an ancient origin, but its function is not clear. These transposon data provide a useful resource for understanding the genome organization and evolution and may be used to identify active TEs for developing transposon-tagging system in common bean and other related genomes.

  10. Development of an Efficient Genome Editing Method by CRISPR/Cas9 in a Fish Cell Line.

    PubMed

    Dehler, Carola E; Boudinot, Pierre; Martin, Samuel A M; Collet, Bertrand

    2016-08-01

    CRISPR/Cas9 system has been used widely in animals and plants to direct mutagenesis. To date, no such method exists for fish somatic cell lines. We describe an efficient procedure for genome editing in the Chinook salmon Oncorhynchus tshawytscha CHSE. This cell line was genetically modified to firstly overexpress a monomeric form of EGFP (cell line CHSE-E Geneticin resistant) and additionally to overexpress nCas9n, a nuclear version of Cas9 (cell line CHSE-EC, Hygromycin and Geneticin resistant). A pre-validated sgRNA was produced in vitro and used to transfect CHSE-EC cells. The EGFP gene was disrupted in 34.6 % of cells, as estimated by FACS and microscopy. The targeted locus was characterised by PCR amplification, cloning and sequencing of PCR products; inactivation of the EGFP gene by deletions in the expected site was validated in 25 % of clones. This method opens perspectives for functional genomic studies compatible with high-throughput screening.

  11. Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations

    PubMed Central

    Van Vooren, Steven; Thienpont, Bernard; Menten, Björn; Speleman, Frank; Moor, Bart De; Vermeesch, Joris; Moreau, Yves

    2007-01-01

    Biomedical literature provides a rich but unstructured source of associations between chromosomal regions and biomedical concepts. By mining MEDLINE abstracts, we annotate the human genome at the level of cytogenetic bands. Our method creates a set of chromosomal aberration maps that associate cytogenetic bands to biomedical concepts from a variety of controlled vocabularies, including disease, dysmorphology, anatomy, development and Gene Ontology branches. The association between a band (e.g. 4p16.3) and a concept (e.g. microcephaly) is assessed by the statistical overrepresentation of this concept in the abstracts relating to this band. Our method is validated using existing genome annotation resources and known chromosomal aberration maps and is further illustrated through a case study on heart disease. Our chromosomal aberration maps provide diagnostics support to clinical geneticists, aid cytogeneticists to interpret and report cytogenetic findings and support researchers interested in human gene function. The method is available as a web application, aBandApart, at http://www.esat.kuleuven.be/abandapart/. PMID:17403693

  12. Identification of endometrial cancer methylation features using combined methylation analysis methods

    PubMed Central

    Trimarchi, Michael P.; Yan, Pearlly; Groden, Joanna; Bundschuh, Ralf; Goodfellow, Paul J.

    2017-01-01

    Background DNA methylation is a stable epigenetic mark that is frequently altered in tumors. DNA methylation features are attractive biomarkers for disease states given the stability of DNA methylation in living cells and in biologic specimens typically available for analysis. Widespread accumulation of methylation in regulatory elements in some cancers (specifically the CpG island methylator phenotype, CIMP) can play an important role in tumorigenesis. High resolution assessment of CIMP for the entire genome, however, remains cost prohibitive and requires quantities of DNA not available for many tissue samples of interest. Genome-wide scans of methylation have been undertaken for large numbers of tumors, and higher resolution analyses for a limited number of cancer specimens. Methods for analyzing such large datasets and integrating findings from different studies continue to evolve. An approach for comparison of findings from a genome-wide assessment of the methylated component of tumor DNA and more widely applied methylation scans was developed. Methods Methylomes for 76 primary endometrial cancer and 12 normal endometrial samples were generated using methylated fragment capture and second generation sequencing, MethylCap-seq. Publically available Infinium HumanMethylation 450 data from The Cancer Genome Atlas (TCGA) were compared to MethylCap-seq data. Results Analysis of methylation in promoter CpG islands (CGIs) identified a subset of tumors with a methylator phenotype. We used a two-stage approach to develop a 13-region methylation signature associated with a “hypermethylator state.” High level methylation for the 13-region methylation signatures was associated with mismatch repair deficiency, high mutation rate, and low somatic copy number alteration in the TCGA test set. In addition, the signature devised showed good agreement with previously described methylation clusters devised by TCGA. Conclusion We identified a methylation signature for a “hypermethylator phenotype” in endometrial cancer and developed methods that may prove useful for identifying extreme methylation phenotypes in other cancers. PMID:28278225

  13. An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future.

    PubMed

    Malin, Bradley A

    2005-01-01

    The incorporation of genomic data into personal medical records poses many challenges to patient privacy. In response, various systems for preserving patient privacy in shared genomic data have been developed and deployed. Although these systems de-identify the data by removing explicit identifiers (e.g., name, address, or Social Security number) and incorporate sound security design principles, they suffer from a lack of formal modeling of inferences learnable from shared data. This report evaluates the extent to which current protection systems are capable of withstanding a range of re-identification methods, including genotype-phenotype inferences, location-visit patterns, family structures, and dictionary attacks. For a comparative re-identification analysis, the systems are mapped to a common formalism. Although there is variation in susceptibility, each system is deficient in its protection capacity. The author discovers patterns of protection failure and discusses several of the reasons why these systems are susceptible. The analyses and discussion within provide guideposts for the development of next-generation protection methods amenable to formal proofs.

  14. An Evaluation of the Current State of Genomic Data Privacy Protection Technology and a Roadmap for the Future

    PubMed Central

    Malin, Bradley A.

    2005-01-01

    The incorporation of genomic data into personal medical records poses many challenges to patient privacy. In response, various systems for preserving patient privacy in shared genomic data have been developed and deployed. Although these systems de-identify the data by removing explicit identifiers (e.g., name, address, or Social Security number) and incorporate sound security design principles, they suffer from a lack of formal modeling of inferences learnable from shared data. This report evaluates the extent to which current protection systems are capable of withstanding a range of re-identification methods, including genotype–phenotype inferences, location–visit patterns, family structures, and dictionary attacks. For a comparative re-identification analysis, the systems are mapped to a common formalism. Although there is variation in susceptibility, each system is deficient in its protection capacity. The author discovers patterns of protection failure and discusses several of the reasons why these systems are susceptible. The analyses and discussion within provide guideposts for the development of next-generation protection methods amenable to formal proofs. PMID:15492030

  15. The future of transposable element annotation and their classification in the light of functional genomics - what we can learn from the fables of Jean de la Fontaine?

    PubMed

    Arensburger, Peter; Piégu, Benoît; Bigot, Yves

    2016-01-01

    Transposable element (TE) science has been significantly influenced by the pioneering ideas of David Finnegan near the end of the last century, as well as by the classification systems that were subsequently developed. Today, whole genome TE annotation is mostly done using tools that were developed to aid gene annotation rather than to specifically study TEs. We argue that further progress in the TE field is impeded both by current TE classification schemes and by a failure to recognize that TE biology is fundamentally different from that of multicellular organisms. Novel genome wide TE annotation methods are helping to redefine our understanding of TE sequence origins and evolution. We briefly discuss some of these new methods as well as ideas for possible alternative classification schemes. Our hope is to encourage the formation of a society to organize a larger debate on these questions and to promote the adoption of standards for annotation and an improved TE classification.

  16. Editing the Neuronal Genome: a CRISPR View of Chromatin Regulation in Neuronal Development, Function, and Plasticity

    PubMed Central

    Yang, Marty G.; West, Anne E.

    2016-01-01

    The dynamic orchestration of gene expression is crucial for the proper differentiation, function, and adaptation of cells. In the brain, transcriptional regulation underlies the incredible diversity of neuronal cell types and contributes to the ability of neurons to adapt their function to the environment. Recently, novel methods for genome and epigenome editing have begun to revolutionize our understanding of gene regulatory mechanisms. In particular, the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system has proven to be a particularly accessible and adaptable technique for genome engineering. Here, we review the use of CRISPR/Cas9 in neurobiology and discuss how these studies have advanced understanding of nervous system development and plasticity. We cover four especially salient applications of CRISPR/Cas9: testing the consequences of enhancer mutations, tagging genes and gene products for visualization in live cells, directly activating or repressing enhancers in vivo, and manipulating the epigenome. In each case, we summarize findings from recent studies and discuss evolving adaptations of the method. PMID:28018138

  17. Using flow cytometry to estimate pollen DNA content: improved methodology and applications

    PubMed Central

    Kron, Paul; Husband, Brian C.

    2012-01-01

    Background and Aims Flow cytometry has been used to measure nuclear DNA content in pollen, mostly to understand pollen development and detect unreduced gametes. Published data have not always met the high-quality standards required for some applications, in part due to difficulties inherent in the extraction of nuclei. Here we describe a simple and relatively novel method for extracting pollen nuclei, involving the bursting of pollen through a nylon mesh, compare it with other methods and demonstrate its broad applicability and utility. Methods The method was tested across 80 species, 64 genera and 33 families, and the data were evaluated using established criteria for estimating genome size and analysing cell cycle. Filter bursting was directly compared with chopping in five species, yields were compared with published values for sonicated samples, and the method was applied by comparing genome size estimates for leaf and pollen nuclei in six species. Key Results Data quality met generally applied standards for estimating genome size in 81 % of species and the higher best practice standards for cell cycle analysis in 51 %. In 41 % of species we met the most stringent criterion of screening 10 000 pollen grains per sample. In direct comparison with two chopping techniques, our method produced better quality histograms with consistently higher nuclei yields, and yields were higher than previously published results for sonication. In three binucleate and three trinucleate species we found that pollen-based genome size estimates differed from leaf tissue estimates by 1·5 % or less when 1C pollen nuclei were used, while estimates from 2C generative nuclei differed from leaf estimates by up to 2·5 %. Conclusions The high success rate, ease of use and wide applicability of the filter bursting method show that this method can facilitate the use of pollen for estimating genome size and dramatically improve unreduced pollen production estimation with flow cytometry. PMID:22875815

  18. Identification of cis-suppression of human disease mutations by comparative genomics.

    PubMed

    Jordan, Daniel M; Frangakis, Stephan G; Golzio, Christelle; Cassa, Christopher A; Kurtzberg, Joanne; Davis, Erica E; Sunyaev, Shamil R; Katsanis, Nicholas

    2015-08-13

    Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.

  19. Towards fully automated structure-based function prediction in structural genomics: a case study.

    PubMed

    Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M

    2007-04-13

    As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.

  20. Coloc-stats: a unified web interface to perform colocalization analysis of genomic features.

    PubMed

    Simovski, Boris; Kanduri, Chakravarthi; Gundersen, Sveinung; Titov, Dmytro; Domanska, Diana; Bock, Christoph; Bossini-Castillo, Lara; Chikina, Maria; Favorov, Alexander; Layer, Ryan M; Mironov, Andrey A; Quinlan, Aaron R; Sheffield, Nathan C; Trynka, Gosia; Sandve, Geir K

    2018-06-05

    Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.

  1. CORALINA: a universal method for the generation of gRNA libraries for CRISPR-based screening.

    PubMed

    Köferle, Anna; Worf, Karolina; Breunig, Christopher; Baumann, Valentin; Herrero, Javier; Wiesbeck, Maximilian; Hutter, Lukas H; Götz, Magdalena; Fuchs, Christiane; Beck, Stephan; Stricker, Stefan H

    2016-11-14

    The bacterial CRISPR system is fast becoming the most popular genetic and epigenetic engineering tool due to its universal applicability and adaptability. The desire to deploy CRISPR-based methods in a large variety of species and contexts has created an urgent need for the development of easy, time- and cost-effective methods enabling large-scale screening approaches. Here we describe CORALINA (comprehensive gRNA library generation through controlled nuclease activity), a method for the generation of comprehensive gRNA libraries for CRISPR-based screens. CORALINA gRNA libraries can be derived from any source of DNA without the need of complex oligonucleotide synthesis. We show the utility of CORALINA for human and mouse genomic DNA, its reproducibility in covering the most relevant genomic features including regulatory, coding and non-coding sequences and confirm the functionality of CORALINA generated gRNAs. The simplicity and cost-effectiveness make CORALINA suitable for any experimental system. The unprecedented sequence complexities obtainable with CORALINA libraries are a necessary pre-requisite for less biased large scale genomic and epigenomic screens.

  2. Genome-wide regression and prediction with the BGLR statistical package.

    PubMed

    Pérez, Paulino; de los Campos, Gustavo

    2014-10-01

    Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.

  3. The detection of large deletions or duplications in genomic DNA.

    PubMed

    Armour, J A L; Barton, D E; Cockburn, D J; Taylor, G R

    2002-11-01

    While methods for the detection of point mutations and small insertions or deletions in genomic DNA are well established, the detection of larger (>100 bp) genomic duplications or deletions can be more difficult. Most mutation scanning methods use PCR as a first step, but the subsequent analyses are usually qualitative rather than quantitative. Gene dosage methods based on PCR need to be quantitative (i.e., they should report molar quantities of starting material) or semi-quantitative (i.e., they should report gene dosage relative to an internal standard). Without some sort of quantitation, heterozygous deletions and duplications may be overlooked and therefore be under-ascertained. Gene dosage methods provide the additional benefit of reporting allele drop-out in the PCR. This could impact on SNP surveys, where large-scale genotyping may miss null alleles. Here we review recent developments in techniques for the detection of this type of mutation and compare their relative strengths and weaknesses. We emphasize that comprehensive mutation analysis should include scanning for large insertions and deletions and duplications. Copyright 2002 Wiley-Liss, Inc.

  4. Genome shuffling of Lactobacillus plantarum C88 improves adhesion.

    PubMed

    Zhao, Yujuan; Duan, Cuicui; Gao, Lei; Yu, Xue; Niu, Chunhua; Li, Shengyu

    2017-01-01

    Genome shuffling is an important method for rapid improvement in microbial strains for desired phenotypes. In this study, ultraviolet irradiation and nitrosoguanidine were used as mutagens to enhance the adhesion of the wild-type Lactobacillus plantarum C88. Four strains with better property were screened after mutagenesis to develop a library of parent strains for three rounds of genome shuffling. Fusants F3-1, F3-2, F3-3, and F3-4 were screened as the improved strains. The in vivo and in vitro tests results indicated that the population after three rounds of genome shuffling exhibited improved adhesive property. Random Amplified Polymorphic DNA results showed significant differences between the parent strain and recombinant strains at DNA level. These results suggest that the adhesive property of L. plantarum C88 can be significantly improved by genome shuffling. Improvement in the adhesive property of bacterial cells by genome shuffling enhances the colonization of probiotic strains which further benefits to exist probiotic function.

  5. Dcode.org anthology of comparative genomic tools.

    PubMed

    Loots, Gabriela G; Ovcharenko, Ivan

    2005-07-01

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.

  6. Investigative pathology: leading the post-genomic revolution.

    PubMed

    Berman, David M; Bosenberg, Marcus W; Orwant, Robin L; Thurberg, Beth L; Draetta, Gulio F; Fletcher, Christopher D M; Loda, Massimo

    2012-01-01

    The completion of the Human Genome Project and the development of genome-based technologies over the past decade have set the stage for a new era of personalized medicine. By all rights, molecularly trained investigative pathologists should be leading this revolution. Singularly well suited for this work, molecular pathologists have the rare ability to wed genomic tools with unique diagnostic skills and tissue-based pathology techniques for integrated diagnosis of human disease. However, the number of pathologists with expertise in genome-based research has remained relatively low due to outdated training methods and a reluctance among some traditional pathologists to embrace new technologies. Moreover, because budding pathologists may not appreciate the vast selection of jobs available to them, they often end up choosing jobs that focus almost entirely on routine diagnosis rather than new frontiers in molecular pathology. This review calls for changes aimed at rectifying these troubling trends to ensure that pathology continues to guide patient care in a post-genomic era.

  7. Genomic Sequencing of Bordetella pertussis for Epidemiology and Global Surveillance of Whooping Cough.

    PubMed

    Bouchez, Valérie; Guglielmini, Julien; Dazas, Mélody; Landier, Annie; Toubiana, Julie; Guillot, Sophie; Criscuolo, Alexis; Brisse, Sylvain

    2018-06-01

    Bordetella pertussis causes whooping cough, a highly contagious respiratory disease that is reemerging in many world regions. The spread of antigen-deficient strains may threaten acellular vaccine efficacy. Dynamics of strain transmission are poorly defined because of shortcomings in current strain genotyping methods. Our objective was to develop a whole-genome genotyping strategy with sufficient resolution for local epidemiologic questions and sufficient reproducibility to enable international comparisons of clinical isolates. We defined a core genome multilocus sequence typing scheme comprising 2,038 loci and demonstrated its congruence with whole-genome single-nucleotide polymorphism variation. Most cases of intrafamilial groups of isolates or of multiple isolates recovered from the same patient were distinguished from temporally and geographically cocirculating isolates. However, epidemiologically unrelated isolates were sometimes nearly undistinguishable. We set up a publicly accessible core genome multilocus sequence typing database to enable global comparisons of B. pertussis isolates, opening the way for internationally coordinated surveillance.

  8. Enhanced production of fructosyltransferase in Aspergillus oryzae by genome shuffling.

    PubMed

    Wang, Shenghai; Duan, Mengjie; Liu, Yalan; Fan, Sen; Lin, Xiaoshan; Zhang, Yi

    2017-03-01

    To breed Aspergillus oryzae strains with high fructosyltransferase (FTase) activity using intraspecific protoplast fusion via genome-shuffling. A candidate library was developed using UV/LiCl of the conidia of A. oryzae SBB201. By screening for enzyme activity and cell biomass, two mutants (UV-11 and UV-76) were chosen for protoplast fusion and subsequent genome shuffling. After three rounds of genome recombination, a fusion mutant RIII-7 was obtained. Its FTase activity was 180 U g -1 , approximately double that of the original strain, and RIII-7 was genetically stable. In fermentation culture, FTase activity of the genome-shuffled strain reached a maximum of 353 U g -1 using substrate-feeding method, and this value was approximately 3.4-times higher than that of the original strain A. oryzae SBB201. Intraspecific protoplast fusion of A. oryzae significantly enhanced FTase activity and generated a potentially useful strain for industrial production.

  9. History of genome editing in yeast.

    PubMed

    Fraczek, Marcin G; Naseeb, Samina; Delneri, Daniela

    2018-05-01

    For thousands of years humans have used the budding yeast Saccharomyces cerevisiae for the production of bread and alcohol; however, in the last 30-40 years our understanding of the yeast biology has dramatically increased, enabling us to modify its genome. Although S. cerevisiae has been the main focus of many research groups, other non-conventional yeasts have also been studied and exploited for biotechnological purposes. Our experiments and knowledge have evolved from recombination to high-throughput PCR-based transformations to highly accurate CRISPR methods in order to alter yeast traits for either research or industrial purposes. Since the release of the genome sequence of S. cerevisiae in 1996, the precise and targeted genome editing has increased significantly. In this 'Budding topic' we discuss the significant developments of genome editing in yeast, mainly focusing on Cre-loxP mediated recombination, delitto perfetto and CRISPR/Cas. © 2018 The Authors. Yeast published by John Wiley & Sons, Ltd.

  10. Transcriptome analysis and related databases of Lactococcus lactis.

    PubMed

    Kuipers, Oscar P; de Jong, Anne; Baerends, Richard J S; van Hijum, Sacha A F T; Zomer, Aldert L; Karsens, Harma A; den Hengst, Chris D; Kramer, Naomi E; Buist, Girbe; Kok, Jan

    2002-08-01

    Several complete genome sequences of Lactococcus lactis and their annotations will become available in the near future, next to the already published genome sequence of L. lactis ssp. lactis IL 1403. This will allow intraspecies comparative genomics studies as well as functional genomics studies aimed at a better understanding of physiological processes and regulatory networks operating in lactococci. This paper describes the initial set-up of a DNA-microarray facility in our group, to enable transcriptome analysis of various Gram-positive bacteria, including a ssp. lactis and a ssp. cremoris strain of Lactococcus lactis. Moreover a global description will be given of the hardware and software requirements for such a set-up, highlighting the crucial integration of relevant bioinformatics tools and methods. This includes the development of MolGenIS, an information system for transcriptome data storage and retrieval, and LactococCye, a metabolic pathway/genome database of Lactococcus lactis.

  11. Tapping the promise of genomics in species with complex, nonmodel genomes.

    PubMed

    Hirsch, Candice N; Buell, C Robin

    2013-01-01

    Genomics is enabling a renaissance in all disciplines of plant biology. However, many plant genomes are complex and remain recalcitrant to current genomic technologies. The complexities of these nonmodel plant genomes are attributable to gene and genome duplication, heterozygosity, ploidy, and/or repetitive sequences. Methods are available to simplify the genome and reduce these barriers, including inbreeding and genome reduction, making these species amenable to current sequencing and assembly methods. Some, but not all, of the complexities in nonmodel genomes can be bypassed by sequencing the transcriptome rather than the genome. Additionally, comparative genomics approaches, which leverage phylogenetic relatedness, can aid in the interpretation of complex genomes. Although there are limitations in accessing complex nonmodel plant genomes using current sequencing technologies, genome manipulation and resourceful analyses can allow access to even the most recalcitrant plant genomes.

  12. Genome engineering in cattle: recent technological advancements.

    PubMed

    Wang, Zhongde

    2015-02-01

    Great strides in technological advancements have been made in the past decade in cattle genome engineering. First, the success of cloning cattle by somatic cell nuclear transfer (SCNT) or chromatin transfer (CT) is a significant advancement that has made obsolete the need for using embryonic stem (ES) cells to conduct cell-mediated genome engineering, whereby site-specific genetic modifications can be conducted in bovine somatic cells via DNA homologous recombination (HR) and whereby genetically engineered cattle can subsequently be produced by animal cloning from the genetically modified cells. With this approach, a chosen bovine genomic locus can be precisely modified in somatic cells, such as to knock out (KO) or knock in (KI) a gene via HR, a gene-targeting strategy that had almost exclusively been used in mouse ES cells. Furthermore, by the creative application of embryonic cloning to rejuvenate somatic cells, cattle genome can be sequentially modified in the same line of somatic cells and complex genetic modifications have been achieved in cattle. Very recently, the development of designer nucleases-such as zinc finger nucleases (ZFNs) and transcription activator-like effector nuclease (TALENs), and clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9)-has enabled highly efficient and more facile genome engineering in cattle. Most notably, by employing such designer nucleases, genomes can be engineered at single-nucleotide precision; this process is now often referred to as genome or gene editing. The above achievements are a drastic departure from the traditional methods of creating genetically modified cattle, where foreign DNAs are randomly integrated into the animal genome, most often along with the integrations of bacterial or viral DNAs. Here, I review the most recent technological developments in cattle genome engineering by highlighting some of the major achievements in creating genetically engineered cattle for agricultural and biomedical applications.

  13. IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes

    PubMed Central

    Lee, Wonhoon; Park, Jongsun; Choi, Jaeyoung; Jung, Kyongyong; Park, Bongsoo; Kim, Donghan; Lee, Jaeyoung; Ahn, Kyohun; Song, Wonho; Kang, Seogchan; Lee, Yong-Hwan; Lee, Seunghwan

    2009-01-01

    Background Sequences and organization of the mitochondrial genome have been used as markers to investigate evolutionary history and relationships in many taxonomic groups. The rapidly increasing mitochondrial genome sequences from diverse insects provide ample opportunities to explore various global evolutionary questions in the superclass Hexapoda. To adequately support such questions, it is imperative to establish an informatics platform that facilitates the retrieval and utilization of available mitochondrial genome sequence data. Results The Insect Mitochondrial Genome Database (IMGD) is a new integrated platform that archives the mitochondrial genome sequences from 25,747 hexapod species, including 112 completely sequenced and 20 nearly completed genomes and 113,985 partially sequenced mitochondrial genomes. The Species-driven User Interface (SUI) of IMGD supports data retrieval and diverse analyses at multi-taxon levels. The Phyloviewer implemented in IMGD provides three methods for drawing phylogenetic trees and displays the resulting trees on the web. The SNP database incorporated to IMGD presents the distribution of SNPs and INDELs in the mitochondrial genomes of multiple isolates within eight species. A newly developed comparative SNU Genome Browser supports the graphical presentation and interactive interface for the identified SNPs/INDELs. Conclusion The IMGD provides a solid foundation for the comparative mitochondrial genomics and phylogenetics of insects. All data and functions described here are available at the web site . PMID:19351385

  14. Early experience with formalin-fixed paraffin-embedded (FFPE) based commercial clinical genomic profiling of gliomas-robust and informative with caveats.

    PubMed

    Movassaghi, Masoud; Shabihkhani, Maryam; Hojat, Seyed A; Williams, Ryan R; Chung, Lawrance K; Im, Kyuseok; Lucey, Gregory M; Wei, Bowen; Mareninov, Sergey; Wang, Michael W; Ng, Denise W; Tashjian, Randy S; Magaki, Shino; Perez-Rosendahl, Mari; Yang, Isaac; Khanlou, Negar; Vinters, Harry V; Liau, Linda M; Nghiemphu, Phioanh L; Lai, Albert; Cloughesy, Timothy F; Yong, William H

    2017-08-01

    Commercial targeted genomic profiling with next generation sequencing using formalin-fixed paraffin embedded (FFPE) tissue has recently entered into clinical use for diagnosis and for the guiding of therapy. However, there is limited independent data regarding the accuracy or robustness of commercial genomic profiling in gliomas. As part of patient care, FFPE samples of gliomas from 71 patients were submitted for targeted genomic profiling to one commonly used commercial vendor, Foundation Medicine. Genomic alterations were determined for the following grades or groups of gliomas; Grade I/II, Grade III, primary glioblastomas (GBMs), recurrent primary GBMs, and secondary GBMs. In addition, FFPE samples from the same patients were independently assessed with conventional methods such as immunohistochemistry (IHC), Quantitative real-time PCR (qRT-PCR), or Fluorescence in situ hybridization (FISH) for three genetic alterations: IDH1 mutations, EGFR amplification, and EGFRvIII expression. A total of 100 altered genes were detected by the aforementioned targeted genomic profiling assay. The number of different genomic alterations was significantly different between the five groups of gliomas and consistent with the literature. CDKN2A/B, TP53, and TERT were the most common genomic alterations seen in primary GBMs, whereas IDH1, TP53, and PIK3CA were the most common in secondary GBMs. Targeted genomic profiling demonstrated 92.3%-100% concordance with conventional methods. The targeted genomic profiling report provided an average of 5.5 drugs, and listed an average of 8.4 clinical trials for the 71 glioma patients studied but only a third of the trials were appropriate for glioma patients. In this limited comparison study, this commercial next generation sequencing based-targeted genomic profiling showed a high concordance rate with conventional methods for the 3 genetic alterations and identified mutations expected for the type of glioma. While it may not be feasible to exhaustively independently validate a commercial genomic profiling assay, examination of a few markers provides some reassurance of its robustness. While potential targeted drugs are recommended based on genetic alterations, to date most targeted therapies have failed in glioblasomas so the usefulness of such recommendations will increase with development of novel and efficacious drugs. Copyright © 2017. Published by Elsevier Inc.

  15. Integrated genomics and molecular breeding approaches for dissecting the complex quantitative traits in crop plants.

    PubMed

    Kujur, Alice; Saxena, Maneesha S; Bajaj, Deepak; Laxmi; Parida, Swarup K

    2013-12-01

    The enormous population growth, climate change and global warming are now considered major threats to agriculture and world's food security. To improve the productivity and sustainability of agriculture, the development of highyielding and durable abiotic and biotic stress-tolerant cultivars and/climate resilient crops is essential. Henceforth, understanding the molecular mechanism and dissection of complex quantitative yield and stress tolerance traits is the prime objective in current agricultural biotechnology research. In recent years, tremendous progress has been made in plant genomics and molecular breeding research pertaining to conventional and next-generation whole genome, transcriptome and epigenome sequencing efforts, generation of huge genomic, transcriptomic and epigenomic resources and development of modern genomics-assisted breeding approaches in diverse crop genotypes with contrasting yield and abiotic stress tolerance traits. Unfortunately, the detailed molecular mechanism and gene regulatory networks controlling such complex quantitative traits is not yet well understood in crop plants. Therefore, we propose an integrated strategies involving available enormous and diverse traditional and modern -omics (structural, functional, comparative and epigenomics) approaches/resources and genomics-assisted breeding methods which agricultural biotechnologist can adopt/utilize to dissect and decode the molecular and gene regulatory networks involved in the complex quantitative yield and stress tolerance traits in crop plants. This would provide clues and much needed inputs for rapid selection of novel functionally relevant molecular tags regulating such complex traits to expedite traditional and modern marker-assisted genetic enhancement studies in target crop species for developing high-yielding stress-tolerant varieties.

  16. [The ENCODE project and functional genomics studies].

    PubMed

    Ding, Nan; Qu, Hongzhu; Fang, Xiangdong

    2014-03-01

    Upon the completion of the Human Genome Project, scientists have been trying to interpret the underlying genomic code for human biology. Since 2003, National Human Genome Research Institute (NHGRI) has invested nearly $0.3 billion and gathered over 440 scientists from more than 32 institutions in the United States, China, United Kingdom, Japan, Spain and Singapore to initiate the Encyclopedia of DNA Elements (ENCODE) project, aiming to identify and analyze all regulatory elements in the human genome. Taking advantage of the development of next-generation sequencing technologies and continuous improvement of experimental methods, ENCODE had made remarkable achievements: identified methylation and histone modification of DNA sequences and their regulatory effects on gene expression through altering chromatin structures, categorized binding sites of various transcription factors and constructed their regulatory networks, further revised and updated database for pseudogenes and non-coding RNA, and identified SNPs in regulatory sequences associated with diseases. These findings help to comprehensively understand information embedded in gene and genome sequences, the function of regulatory elements as well as the molecular mechanism underlying the transcriptional regulation by noncoding regions, and provide extensive data resource for life sciences, particularly for translational medicine. We re-viewed the contributions of high-throughput sequencing platform development and bioinformatical technology improve-ment to the ENCODE project, the association between epigenetics studies and the ENCODE project, and the major achievement of the ENCODE project. We also provided our prospective on the role of the ENCODE project in promoting the development of basic and clinical medicine.

  17. Horizontal Gene Transfer and the History of Life

    PubMed Central

    Daubin, Vincent; Szöllősi, Gergely J.

    2016-01-01

    Microbes acquire DNA from a variety of sources. The last decades, which have seen the development of genome sequencing, have revealed that horizontal gene transfer has been a major evolutionary force that has constantly reshaped genomes throughout evolution. However, because the history of life must ultimately be deduced from gene phylogenies, the lack of methods to account for horizontal gene transfer has thrown into confusion the very concept of the tree of life. As a result, many questions remain open, but emerging methodological developments promise to use information conveyed by horizontal gene transfer that remains unexploited today. PMID:26801681

  18. Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers.

    PubMed

    Da, Yang

    2015-12-18

    The amount of functional genomic information has been growing rapidly but remains largely unused in genomic selection. Genomic prediction and estimation using haplotypes in genome regions with functional elements such as all genes of the genome can be an approach to integrate functional and structural genomic information for genomic selection. Towards this goal, this article develops a new haplotype approach for genomic prediction and estimation. A multi-allelic haplotype model treating each haplotype as an 'allele' was developed for genomic prediction and estimation based on the partition of a multi-allelic genotypic value into additive and dominance values. Each additive value is expressed as a function of h - 1 additive effects, where h = number of alleles or haplotypes, and each dominance value is expressed as a function of h(h - 1)/2 dominance effects. For a sample of q individuals, the limit number of effects is 2q - 1 for additive effects and is the number of heterozygous genotypes for dominance effects. Additive values are factorized as a product between the additive model matrix and the h - 1 additive effects, and dominance values are factorized as a product between the dominance model matrix and the h(h - 1)/2 dominance effects. Genomic additive relationship matrix is defined as a function of the haplotype model matrix for additive effects, and genomic dominance relationship matrix is defined as a function of the haplotype model matrix for dominance effects. Based on these results, a mixed model implementation for genomic prediction and variance component estimation that jointly use haplotypes and single markers is established, including two computing strategies for genomic prediction and variance component estimation with identical results. The multi-allelic genetic partition fills a theoretical gap in genetic partition by providing general formulations for partitioning multi-allelic genotypic values and provides a haplotype method based on the quantitative genetics model towards the utilization of functional and structural genomic information for genomic prediction and estimation.

  19. Comparing Mycobacterium tuberculosis genomes using genome topology networks.

    PubMed

    Jiang, Jianping; Gu, Jianlei; Zhang, Liang; Zhang, Chenyi; Deng, Xiao; Dou, Tonghai; Zhao, Guoping; Zhou, Yan

    2015-02-14

    Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria. In this work, we introduce a 'Genome Topology Network' (GTN) method based on gene homology and gene locations to analyze genomic SVs and perform phylogenetic analysis. Furthermore, the concept of 'unfixed ortholog' has been proposed, whose members are affected by SVs in genome topology among close species. To improve the precision of 'unfixed ortholog' recognition, a strategy to detect annotation differences and complete gene annotation was applied. To assess the GTN method, a set of thirteen complete M. tuberculosis genomes was analyzed as a case study. GTNs with two different gene homology-assigning methods were built, the Clusters of Orthologous Groups (COG) method and the orthoMCL clustering method, and two phylogenetic trees were constructed accordingly, which may provide additional insights into whole genome-based phylogenetic analysis. We obtained 24 unfixable COG groups, of which most members were related to immunogenicity and drug resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309). The GTN method has been implemented in PERL and released on our website. The tool can be downloaded from http://homepage.fudan.edu.cn/zhouyan/gtn/ , and allows re-annotating the 'lost' genes among closely related genomes, analyzing genes affected by SVs, and performing phylogenetic analysis. With this tool, many immunogenic-related and drug resistance-related genes were found to be affected by SVs in M. tuberculosis genomes. We believe that the GTN method will be suitable for the exploration of genomic SVs in connection with biological features of bacterial strains, and that GTN-based phylogenetic analysis will provide additional insights into whole genome-based phylogenetic analysis.

  20. Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data

    PubMed Central

    Lemoine, Frédéric; Lespinet, Olivier; Labedan, Bernard

    2007-01-01

    Background Comparison of completely sequenced microbial genomes has revealed how fluid these genomes are. Detecting synteny blocks requires reliable methods to determining the orthologs among the whole set of homologs detected by exhaustive comparisons between each pair of completely sequenced genomes. This is a complex and difficult problem in the field of comparative genomics but will help to better understand the way prokaryotic genomes are evolving. Results We have developed a suite of programs that automate three essential steps to study conservation of gene order, and validated them with a set of 107 bacteria and archaea that cover the majority of the prokaryotic taxonomic space. We identified the whole set of shared homologs between two or more species and computed the evolutionary distance separating each pair of homologs. We applied two strategies to extract from the set of homologs a collection of valid orthologs shared by at least two genomes. The first computes the Reciprocal Smallest Distance (RSD) using the PAM distances separating pairs of homologs. The second method groups homologs in families and reconstructs each family's evolutionary tree, distinguishing bona fide orthologs as well as paralogs created after the last speciation event. Although the phylogenetic tree method often succeeds where RSD fails, the reverse could occasionally be true. Accordingly, we used the data obtained with either methods or their intersection to number the orthologs that are adjacent in for each pair of genomes, the Positional Orthologous Genes (POGs), and to further study their properties. Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms. Conclusion The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance. Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes. PMID:18047665

  1. Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data.

    PubMed

    Bolser, Dan; Staines, Daniel M; Pritchard, Emily; Kersey, Paul

    2016-01-01

    Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for a growing number of sequenced plant species (currently 33). Data provided includes genome sequence, gene models, functional annotation, and polymorphic loci. Various additional information are provided for variation data, including population structure, individual genotypes, linkage, and phenotype data. In each release, comparative analyses are performed on whole genome and protein sequences, and genome alignments and gene trees are made available that show the implied evolutionary history of each gene family. Access to the data is provided through a genome browser incorporating many specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These access routes are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests, and pollinators.Ensembl Plants is updated 4-5 times a year and is developed in collaboration with our international partners in the Gramene ( http://www.gramene.org ) and transPLANT projects ( http://www.transplantdb.org ).

  2. The Biofuel Feedstock Genomics Resource: a web-based portal and database to enable functional genomics of plant biofuel feedstock species.

    PubMed

    Childs, Kevin L; Konganti, Kranti; Buell, C Robin

    2012-01-01

    Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.

  3. Reconstructing Past Admixture Processes from Local Genomic Ancestry Using Wavelet Transformation

    PubMed Central

    Sanderson, Jean; Sudoyo, Herawati; Karafet, Tatiana M.; Hammer, Michael F.; Cox, Murray P.

    2015-01-01

    Admixture between long-separated populations is a defining feature of the genomes of many species. The mosaic block structure of admixed genomes can provide information about past contact events, including the time and extent of admixture. Here, we describe an improved wavelet-based technique that better characterizes ancestry block structure from observed genomic patterns. principal components analysis is first applied to genomic data to identify the primary population structure, followed by wavelet decomposition to develop a new characterization of local ancestry information along the chromosomes. For testing purposes, this method is applied to human genome-wide genotype data from Indonesia, as well as virtual genetic data generated using genome-scale sequential coalescent simulations under a wide range of admixture scenarios. Time of admixture is inferred using an approximate Bayesian computation framework, providing robust estimates of both admixture times and their associated levels of uncertainty. Crucially, we demonstrate that this revised wavelet approach, which we have released as the R package adwave, provides improved statistical power over existing wavelet-based techniques and can be used to address a broad range of admixture questions. PMID:25852078

  4. Next-generation genome-scale models for metabolic engineering.

    PubMed

    King, Zachary A; Lloyd, Colton J; Feist, Adam M; Palsson, Bernhard O

    2015-12-01

    Constraint-based reconstruction and analysis (COBRA) methods have become widely used tools for metabolic engineering in both academic and industrial laboratories. By employing a genome-scale in silico representation of the metabolic network of a host organism, COBRA methods can be used to predict optimal genetic modifications that improve the rate and yield of chemical production. A new generation of COBRA models and methods is now being developed--encompassing many biological processes and simulation strategies-and next-generation models enable new types of predictions. Here, three key examples of applying COBRA methods to strain optimization are presented and discussed. Then, an outlook is provided on the next generation of COBRA models and the new types of predictions they will enable for systems metabolic engineering. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes

    PubMed Central

    2012-01-01

    Background Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. Results We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. Conclusions Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem. PMID:22759433

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lapidus, Alla L.

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly ofmore » whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.« less

  7. [Prospects for application of breakthrough technologies in breeding: The CRISPR/Cas9 system for plant genome editing].

    PubMed

    Khlestkina, E K; Shumny, V K

    2016-07-01

    Integration of the methods of contemporary genetics and biotechnology into the breeding process is assessed, and the potential role and efficacy of genome editing as a novel approach is discussed. Use of molecular (DNA) markers for breeding was proposed more than 30 years ago. Nowadays, they are widely used as an accessory tool in order to select plants by mono- and olygogenic traits. Presently, the genomic approaches are actively introduced into the breeding processes owing to automatization of DNA polymorphism analyses and development of comparatively cheap methods of DNA sequencing. These approaches provide effective selection by complex quantitative traits, and are based on the full-genome genotyping of the breeding material. Moreover, biotechnological tools, such as doubled haploids production, which provides fast obtainment of homozygotes, are widely used in plant breeding. Use of genomic and biotechnological approaches makes the development of varieties less time consuming. It also decreases the cultivated areas and financial expenditures required for accomplishment of the breeding process. However, the capacities of modern breeding are not limited to only these advantages. Experiments carried out on plants about 10 years ago provided the first data on genome editing. In the last two years, we have observed a sharp increase in the number of publications that report about successful experiments aimed at plant genome editing owing to the use of the relatively simple and convenient CRISPR/Cas9 system. The goal of some of these experiments was to modify agriculturally valuable genes of cultivated plants, such as potato, cabbage, tomato, maize, rice, wheat, barley, soybean and sorghum. These studies show that it is possible to obtain nontransgenic plants carrying stably inherited, specifically determined mutations using the CRISPR/Cas9 system. This possibility offers the challenge to obtain varieties with predetermined mono- and olygogenic traits.

  8. Genome-wide survey and analysis of microsatellites in giant panda (Ailuropoda melanoleuca), with a focus on the applications of a novel microsatellite marker system.

    PubMed

    Huang, Jie; Li, Yu-Zhi; Du, Lian-Ming; Yang, Bo; Shen, Fu-Jun; Zhang, He-Min; Zhang, Zhi-He; Zhang, Xiu-Yue; Yue, Bi-Song

    2015-02-07

    The giant panda (Ailuropoda melanoleuca) is a critically endangered species endemic to China. Microsatellites have been preferred as the most popular molecular markers and proven effective in estimating population size, paternity test, genetic diversity for the critically endangered species. The availability of the giant panda complete genome sequences provided the opportunity to carry out genome-wide scans for all types of microsatellites markers, which now opens the way for the analysis and development of microsatellites in giant panda. By screening the whole genome sequence of giant panda in silico mining, we identified microsatellites in the genome of giant panda and analyzed their frequency and distribution in different genomic regions. Based on our search criteria, a repertoire of 855,058 SSRs was detected, with mono-nucleotides being the most abundant. SSRs were found in all genomic regions and were more abundant in non-coding regions than coding regions. A total of 160 primer pairs were designed to screen for polymorphic microsatellites using the selected tetranucleotide microsatellite sequences. The 51 novel polymorphic tetranucleotide microsatellite loci were discovered based on genotyping blood DNA from 22 captive giant pandas in this study. Finally, a total of 15 markers, which showed good polymorphism, stability, and repetition in faecal samples, were used to establish the novel microsatellite marker system for giant panda. Meanwhile, a genotyping database for Chengdu captive giant pandas (n = 57) were set up using this standardized system. What's more, a universal individual identification method was established and the genetic diversity were analysed in this study as the applications of this marker system. The microsatellite abundance and diversity were characterized in giant panda genomes. A total of 154,677 tetranucleotide microsatellites were identified and 15 of them were discovered as the polymorphic and stable loci. The individual identification method and the genetic diversity analysis method in this study provided adequate material for the future study of giant panda.

  9. Detecting and characterizing genomic signatures of positive selection in global populations.

    PubMed

    Liu, Xuanyao; Ong, Rick Twee-Hee; Pillai, Esakimuthu Nisha; Elzein, Abier M; Small, Kerrin S; Clark, Taane G; Kwiatkowski, Dominic P; Teo, Yik-Ying

    2013-06-06

    Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  10. Gamete selection for forage quality improvement in tall fescue

    USDA-ARS?s Scientific Manuscript database

    Within the Festuca-Lolium genome complex there is a need for modern breeding approaches that facilitate the rapid development of improved germplasm or cultivars. Traditional recurrent or mass-selection methods for population or synthetic development are labor intensive and time consuming. The use ...

  11. Appliation of rad-sequencing to linkage mapping in citrus

    USDA-ARS?s Scientific Manuscript database

    High density linkage maps can be developed for modest cost using high-throughput DNA sequencing to genotype a defined fraction (representation) of the genome. We developed linkage maps in two citrus populations using the RAD (Restriction site Associated DNA) genotyping method which involves restrict...

  12. Virophages to viromes: a report from the frontier of viral oceanography.

    PubMed

    Culley, Alexander I

    2011-07-01

    The investigation of marine viruses has advanced our understanding of ecology, evolution, microbiology, oceanography and virology. Significant findings discussed in this review include the discovery of giant viruses that have genome sizes and metabolic capabilities that distort the line between virus and cell, viruses that participate in photosynthesis and apoptosis, the detection of communities of viruses of all genomic compositions and the preeminence of viruses in the evolution of marine microbes. Although we have made great progress, we have yet to synthesize the rich archive of viral genomic data with oceanographic processes. The development of cutting edge methods such as single virus genomics now provide a toolset to better integrate viruses into the ecology of the ocean. Copyright © 2011 Elsevier B.V. All rights reserved.

  13. Combining genomic and proteomic approaches for epigenetics research

    PubMed Central

    Han, Yumiao; Garcia, Benjamin A

    2014-01-01

    Epigenetics is the study of changes in gene expression or cellular phenotype that do not change the DNA sequence. In this review, current methods, both genomic and proteomic, associated with epigenetics research are discussed. Among them, chromatin immunoprecipitation (ChIP) followed by sequencing and other ChIP-based techniques are powerful techniques for genome-wide profiling of DNA-binding proteins, histone post-translational modifications or nucleosome positions. However, mass spectrometry-based proteomics is increasingly being used in functional biological studies and has proved to be an indispensable tool to characterize histone modifications, as well as DNA–protein and protein–protein interactions. With the development of genomic and proteomic approaches, combination of ChIP and mass spectrometry has the potential to expand our knowledge of epigenetics research to a higher level. PMID:23895656

  14. Outreach and online training services at the Saccharomyces Genome Database.

    PubMed

    MacPherson, Kevin A; Starr, Barry; Wong, Edith D; Dalusag, Kyla S; Hellerstedt, Sage T; Lang, Olivia W; Nash, Robert S; Skrzypek, Marek S; Engel, Stacia R; Cherry, J Michael

    2017-01-01

    The Saccharomyces Genome Database (SGD; www.yeastgenome.org ), the primary genetics and genomics resource for the budding yeast S. cerevisiae , provides free public access to expertly curated information about the yeast genome and its gene products. As the central hub for the yeast research community, SGD engages in a variety of social outreach efforts to inform our users about new developments, promote collaboration, increase public awareness of the importance of yeast to biomedical research, and facilitate scientific discovery. Here we describe these various outreach methods, from networking at scientific conferences to the use of online media such as blog posts and webinars, and include our perspectives on the benefits provided by outreach activities for model organism databases. http://www.yeastgenome.org. © The Author(s) 2017. Published by Oxford University Press.

  15. Genomics and metagenomics in medical microbiology.

    PubMed

    Padmanabhan, Roshan; Mishra, Ajay Kumar; Raoult, Didier; Fournier, Pierre-Edouard

    2013-12-01

    Over the last two decades, sequencing tools have evolved from laborious time-consuming methodologies to real-time detection and deciphering of genomic DNA. Genome sequencing, especially using next generation sequencing (NGS) has revolutionized the landscape of microbiology and infectious disease. This deluge of sequencing data has not only enabled advances in fundamental biology but also helped improve diagnosis, typing of pathogen, virulence and antibiotic resistance detection, and development of new vaccines and culture media. In addition, NGS also enabled efficient analysis of complex human micro-floras, both commensal, and pathological, through metagenomic methods, thus helping the comprehension and management of human diseases such as obesity. This review summarizes technological advances in genomics and metagenomics relevant to the field of medical microbiology. Copyright © 2013 Elsevier B.V. All rights reserved.

  16. A PCR method for the detection and differentiation of Lentinus edodes and Trametes versicolor in defined-mixed cultures used for wastewater treatment.

    PubMed

    García-Mena, Jaime; Cano-Ramirez, Claudia; Garibay-Orijel, Claudio; Ramirez-Canseco, Sergio; Poggi-Varaldo, Héctor M

    2005-06-01

    A PCR-based method for the quantitative detection of Lentinus edodes and Trametes versicolor, two ligninolytic fungi applied for wastewater treatment and bioremediation, was developed. Genomic DNA was used to optimize a PCR method targeting the conserved copper-binding sequence of laccase genes. The method allowed the quantitative detection and differentiation of these fungi in single and defined-mixed cultures after fractionation of the PCR products by electrophoresis in agarose gels. Amplified products of about 150 bp for L. edodes, and about 200 bp for T. versicolor were purified and cloned. The PCR method showed a linear detection response in the 1.0 microg-1 ng range. The same method was tested with genomic DNA from a third fungus (Phanerochaete chrysosporium), yielding a fragment of about 400 bp. Southern-blot and DNA sequence analysis indicated that a specific PCR product was amplified from each genome, and that these corresponded to sequences of laccase genes. This PCR protocol permits the detection and differentiation of three ligninolytic fungi by amplifying DNA fragments of different sizes using a single pair of primers, without further enzymatic restriction of the PCR products. This method has potential use in the monitoring, evaluation, and improvement of fungal cultures used in wastewater treatment processes.

  17. Thinking too positive? Revisiting current methods of population genetic selection inference.

    PubMed

    Bank, Claudia; Ewing, Gregory B; Ferrer-Admettla, Anna; Foll, Matthieu; Jensen, Jeffrey D

    2014-12-01

    In the age of next-generation sequencing, the availability of increasing amounts and improved quality of data at decreasing cost ought to allow for a better understanding of how natural selection is shaping the genome than ever before. However, alternative forces, such as demography and background selection (BGS), obscure the footprints of positive selection that we would like to identify. In this review, we illustrate recent developments in this area, and outline a roadmap for improved selection inference. We argue (i) that the development and obligatory use of advanced simulation tools is necessary for improved identification of selected loci, (ii) that genomic information from multiple time points will enhance the power of inference, and (iii) that results from experimental evolution should be utilized to better inform population genomic studies. Copyright © 2014 Elsevier Ltd. All rights reserved.

  18. [The development of molecular human genetics and its significance for perspectives of modern medicine].

    PubMed

    Coutelle, C; Speer, A; Grade, K; Rosenthal, A; Hunger, H D

    1989-01-01

    The introduction of molecular human genetics has become a paradigma for the application of genetic engineering in medicine. The main principles of this technology are the isolation of molecular probes, their application in hybridization reactions, specific gene-amplification by the polymerase chain reaction, and DNA sequencing reactions. These methods are used for the analysis of monogenic diseases by linkage studies and the elucidation of the molecular defect causing these conditions, respectively. They are also the basis for genomic diagnosis of monogenic diseases, introduced into the health care system of the GDR by a national project on Duchenne/Becker muscular dystrophy, Cystic Fibrosis and Phenylketonuria. The rapid development of basic research on the molecular analysis of the human genome and genomic diagnosis indicates, that human molecular genetics is becoming a decisive basic discipline of modern medicine.

  19. Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances.

    PubMed

    Xu, Jianping

    2006-06-01

    Microbial ecology examines the diversity and activity of micro-organisms in Earth's biosphere. In the last 20 years, the application of genomics tools have revolutionized microbial ecological studies and drastically expanded our view on the previously underappreciated microbial world. This review first introduces the basic concepts in microbial ecology and the main genomics methods that have been used to examine natural microbial populations and communities. In the ensuing three specific sections, the applications of the genomics in microbial ecological research are highlighted. The first describes the widespread application of multilocus sequence typing and representational difference analysis in studying genetic variation within microbial species. Such investigations have identified that migration, horizontal gene transfer and recombination are common in natural microbial populations and that microbial strains can be highly variable in genome size and gene content. The second section highlights and summarizes the use of four specific genomics methods (phylogenetic analysis of ribosomal RNA, DNA-DNA re-association kinetics, metagenomics, and micro-arrays) in analysing the diversity and potential activity of microbial populations and communities from a variety of terrestrial and aquatic environments. Such analyses have identified many unexpected phylogenetic lineages in viruses, bacteria, archaea, and microbial eukaryotes. Functional analyses of environmental DNA also revealed highly prevalent, but previously unknown, metabolic processes in natural microbial communities. In the third section, the ecological implications of sequenced microbial genomes are briefly discussed. Comparative analyses of prokaryotic genomic sequences suggest the importance of ecology in determining microbial genome size and gene content. The significant variability in genome size and gene content among strains and species of prokaryotes indicate the highly fluid nature of prokaryotic genomes, a result consistent with those from multilocus sequence typing and representational difference analyses. The integration of various levels of ecological analyses coupled to the application and further development of high throughput technologies are accelerating the pace of discovery in microbial ecology.

  20. Development of a chemiluminescence competitive PCR for the detection and quantification of parvovirus B19 DNA using a microplate luminometer.

    PubMed

    Fini, F; Gallinella, G; Girotti, S; Zerbini, M; Musiani, M

    1999-09-01

    Quantitative PCR of viral nucleic acids can be useful clinically in diagnosis, risk assessment, and monitoring of antiviral therapy. We wished to develop a chemiluminescence competitive PCR (cPCR) for parvovirus B19. Parvovirus DNA target sequences and competitor sequences were coamplified and directly labeled. Amplified products were then separately hybridized by specific biotin-labeled probes, captured onto streptavidin-coated ELISA microplates, and detected immunoenzymatically using chemiluminescent substrates of peroxidase. Chemiluminescent signals were quantitatively analyzed by a microplate luminometer and were correlated to the amounts of amplified products. Luminol-based systems displayed constant emission but had a higher detection limit (100-1000 genome copies) than the acridan-based system (20 genome copies). The detection limit of chemiluminescent substrates was lower (20 genome copies) than colorimetric substrates (50 genome copies). In chemiluminescence cPCR, the titration curves showed linear correlation above 100 target genome copies. Chemiluminescence cPCR was positive in six serum samples from patients with parvovirus infections and negative in six control sera. The chemiluminescence cPCR appears to be a sensitive and specific method for the quantitative detection of viral DNAs.

Top