Sample records for large-scale dna sequencing

  1. Random access in large-scale DNA data storage.

    PubMed

    Organick, Lee; Ang, Siena Dumas; Chen, Yuan-Jyue; Lopez, Randolph; Yekhanin, Sergey; Makarychev, Konstantin; Racz, Miklos Z; Kamath, Govinda; Gopalan, Parikshit; Nguyen, Bichlien; Takahashi, Christopher N; Newman, Sharon; Parker, Hsing-Yeh; Rashtchian, Cyrus; Stewart, Kendall; Gupta, Gagan; Carlson, Robert; Mulligan, John; Carmean, Douglas; Seelig, Georg; Ceze, Luis; Strauss, Karin

    2018-03-01

    Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. Here, we encode and store 35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides, and show that we can recover each file individually and with no errors, using a random access approach. We design and validate a large library of primers that enable individual recovery of all files stored within the DNA. We also develop an algorithm that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. These advances demonstrate a viable, large-scale system for DNA data storage and retrieval.

  2. Advances in DNA sequencing technologies for high resolution HLA typing.

    PubMed

    Cereb, Nezih; Kim, Hwa Ran; Ryu, Jaejun; Yang, Soo Young

    2015-12-01

    This communication describes our experience in large-scale G group-level high resolution HLA typing using three different DNA sequencing platforms - ABI 3730 xl, Illumina MiSeq and PacBio RS II. Recent advances in DNA sequencing technologies, so-called next generation sequencing (NGS), have brought breakthroughs in deciphering the genetic information in all living species at a large scale and at an affordable level. The NGS DNA indexing system allows sequencing multiple genes for large number of individuals in a single run. Our laboratory has adopted and used these technologies for HLA molecular testing services. We found that each sequencing technology has its own strengths and weaknesses, and their sequencing performances complement each other. HLA genes are highly complex and genotyping them is quite challenging. Using these three sequencing platforms, we were able to meet all requirements for G group-level high resolution and high volume HLA typing. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.

  3. Large-scale DNA Barcode Library Generation for Biomolecule Identification in High-throughput Screens.

    PubMed

    Lyons, Eli; Sheridan, Paul; Tremmel, Georg; Miyano, Satoru; Sugano, Sumio

    2017-10-24

    High-throughput screens allow for the identification of specific biomolecules with characteristics of interest. In barcoded screens, DNA barcodes are linked to target biomolecules in a manner allowing for the target molecules making up a library to be identified by sequencing the DNA barcodes using Next Generation Sequencing. To be useful in experimental settings, the DNA barcodes in a library must satisfy certain constraints related to GC content, homopolymer length, Hamming distance, and blacklisted subsequences. Here we report a novel framework to quickly generate large-scale libraries of DNA barcodes for use in high-throughput screens. We show that our framework dramatically reduces the computation time required to generate large-scale DNA barcode libraries, compared with a naїve approach to DNA barcode library generation. As a proof of concept, we demonstrate that our framework is able to generate a library consisting of one million DNA barcodes for use in a fragment antibody phage display screening experiment. We also report generating a general purpose one billion DNA barcode library, the largest such library yet reported in literature. Our results demonstrate the value of our novel large-scale DNA barcode library generation framework for use in high-throughput screening applications.

  4. Optical mapping and its potential for large-scale sequencing projects.

    PubMed

    Aston, C; Mishra, B; Schwartz, D C

    1999-07-01

    Physical mapping has been rediscovered as an important component of large-scale sequencing projects. Restriction maps provide landmark sequences at defined intervals, and high-resolution restriction maps can be assembled from ensembles of single molecules by optical means. Such optical maps can be constructed from both large-insert clones and genomic DNA, and are used as a scaffold for accurately aligning sequence contigs generated by shotgun sequencing.

  5. A rapid and cost-effective method for sequencing pooled cDNA clones by using a combination of transposon insertion and Gateway technology.

    PubMed

    Morozumi, Takeya; Toki, Daisuke; Eguchi-Ogawa, Tomoko; Uenishi, Hirohide

    2011-09-01

    Large-scale cDNA-sequencing projects require an efficient strategy for mass sequencing. Here we describe a method for sequencing pooled cDNA clones using a combination of transposon insertion and Gateway technology. Our method reduces the number of shotgun clones that are unsuitable for reconstruction of cDNA sequences, and has the advantage of reducing the total costs of the sequencing project.

  6. A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

    USDA-ARS?s Scientific Manuscript database

    A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...

  7. Large-Scale Biomonitoring of Remote and Threatened Ecosystems via High-Throughput Sequencing

    PubMed Central

    Gibson, Joel F.; Shokralla, Shadi; Curry, Colin; Baird, Donald J.; Monk, Wendy A.; King, Ian; Hajibabaei, Mehrdad

    2015-01-01

    Biodiversity metrics are critical for assessment and monitoring of ecosystems threatened by anthropogenic stressors. Existing sorting and identification methods are too expensive and labour-intensive to be scaled up to meet management needs. Alternately, a high-throughput DNA sequencing approach could be used to determine biodiversity metrics from bulk environmental samples collected as part of a large-scale biomonitoring program. Here we show that both morphological and DNA sequence-based analyses are suitable for recovery of individual taxonomic richness, estimation of proportional abundance, and calculation of biodiversity metrics using a set of 24 benthic samples collected in the Peace-Athabasca Delta region of Canada. The high-throughput sequencing approach was able to recover all metrics with a higher degree of taxonomic resolution than morphological analysis. The reduced cost and increased capacity of DNA sequence-based approaches will finally allow environmental monitoring programs to operate at the geographical and temporal scale required by industrial and regulatory end-users. PMID:26488407

  8. Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

    PubMed

    Li, Qing; Hermanson, Peter J; Springer, Nathan M

    2018-01-01

    DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.

  9. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Identification of differentially methylated sites with weak methylation effect

    USDA-ARS?s Scientific Manuscript database

    DNA methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect dif...

  11. Intrinsic flexibility of B-DNA: the experimental TRX scale.

    PubMed

    Heddi, Brahim; Oguey, Christophe; Lavelle, Christophe; Foloppe, Nicolas; Hartmann, Brigitte

    2010-01-01

    B-DNA flexibility, crucial for DNA-protein recognition, is sequence dependent. Free DNA in solution would in principle be the best reference state to uncover the relation between base sequences and their intrinsic flexibility; however, this has long been hampered by a lack of suitable experimental data. We investigated this relationship by compiling and analyzing a large dataset of NMR (31)P chemical shifts in solution. These measurements reflect the BI <--> BII equilibrium in DNA, intimately correlated to helicoidal descriptors of the curvature, winding and groove dimensions. Comparing the ten complementary DNA dinucleotide steps indicates that some steps are much more flexible than others. This malleability is primarily controlled at the dinucleotide level, modulated by the tetranucleotide environment. Our analyses provide an experimental scale called TRX that quantifies the intrinsic flexibility of the ten dinucleotide steps in terms of Twist, Roll, and X-disp (base pair displacement). Applying the TRX scale to DNA sequences optimized for nucleosome formation reveals a 10 base-pair periodic alternation of stiff and flexible regions. Thus, DNA flexibility captured by the TRX scale is relevant to nucleosome formation, suggesting that this scale may be of general interest to better understand protein-DNA recognition.

  12. DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

    PubMed

    Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

    2012-01-01

    DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.

  13. Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.

    PubMed

    Christen, Matthias; Del Medico, Luca; Christen, Heinz; Christen, Beat

    2017-01-01

    Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.

  14. (New hosts and vectors for genome cloning)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    The main goal of our project remains the development of new bacterial hosts and vectors for the stable propagation of human DNA clones in E. coli. During the past six months of our current budget period, we have (1) continued to develop new hosts that permit the stable maintenance of unstable features of human DNA, and (2) developed a series of vectors for (a) cloning large DNA inserts, (b) assessing the frequency of human sequences that are lethal to the growth of E. coli, and (c) assessing the stability of human sequences cloned in M13 for large-scale sequencing projects.

  15. [New hosts and vectors for genome cloning]. Progress report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    The main goal of our project remains the development of new bacterial hosts and vectors for the stable propagation of human DNA clones in E. coli. During the past six months of our current budget period, we have (1) continued to develop new hosts that permit the stable maintenance of unstable features of human DNA, and (2) developed a series of vectors for (a) cloning large DNA inserts, (b) assessing the frequency of human sequences that are lethal to the growth of E. coli, and (c) assessing the stability of human sequences cloned in M13 for large-scale sequencing projects.

  16. Nanowire-nanopore transistor sensor for DNA detection during translocation

    NASA Astrophysics Data System (ADS)

    Xie, Ping; Xiong, Qihua; Fang, Ying; Qing, Quan; Lieber, Charles

    2011-03-01

    Nanopore sequencing, as a promising low cost, high throughput sequencing technique, has been proposed more than a decade ago. Due to the incompatibility between small ionic current signal and fast translocation speed and the technical difficulties on large scale integration of nanopore for direct ionic current sequencing, alternative methods rely on integrated DNA sensors have been proposed, such as using capacitive coupling or tunnelling current etc. But none of them have been experimentally demonstrated yet. Here we show that for the first time an amplified sensor signal has been experimentally recorded from a nanowire-nanopore field effect transistor sensor during DNA translocation. Independent multi-channel recording was also demonstrated for the first time. Our results suggest that the signal is from highly localized potential change caused by DNA translocation in none-balanced buffer condition. Given this method may produce larger signal for smaller nanopores, we hope our experiment can be a starting point for a new generation of nanopore sequencing devices with larger signal, higher bandwidth and large-scale multiplexing capability and finally realize the ultimate goal of low cost high throughput sequencing.

  17. Transposon facilitated DNA sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berg, D.E.; Berg, C.M.; Huang, H.V.

    1990-01-01

    The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses,more » and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.« less

  18. Environmental DNA sequencing primers for eutardigrades and bdelloid rotifers

    PubMed Central

    2009-01-01

    Background The time it takes to isolate individuals from environmental samples and then extract DNA from each individual is one of the problems with generating molecular data from meiofauna such as eutardigrades and bdelloid rotifers. The lack of consistent morphological information and the extreme abundance of these classes makes morphological identification of rare, or even common cryptic taxa a large and unwieldy task. This limits the ability to perform large-scale surveys of the diversity of these organisms. Here we demonstrate a culture-independent molecular survey approach that enables the generation of large amounts of eutardigrade and bdelloid rotifer sequence data directly from soil. Our PCR primers, specific to the 18s small-subunit rRNA gene, were developed for both eutardigrades and bdelloid rotifers. Results The developed primers successfully amplified DNA of their target organism from various soil DNA extracts. This was confirmed by both the BLAST similarity searches and phylogenetic analyses. Tardigrades showed much better phylogenetic resolution than bdelloids. Both groups of organisms exhibited varying levels of endemism. Conclusion The development of clade-specific primers for characterizing eutardigrades and bdelloid rotifers from environmental samples should greatly increase our ability to characterize the composition of these taxa in environmental samples. Environmental sequencing as shown here differs from other molecular survey methods in that there is no need to pre-isolate the organisms of interest from soil in order to amplify their DNA. The DNA sequences obtained from methods that do not require culturing can be identified post-hoc and placed phylogenetically as additional closely related sequences are obtained from morphologically identified conspecifics. Our non-cultured environmental sequence based approach will be able to provide a rapid and large-scale screening of the presence, absence and diversity of Bdelloidea and Eutardigrada in a variety of soils. PMID:20003362

  19. Organization and evolution of highly repeated satellite DNA sequences in plant chromosomes.

    PubMed

    Sharma, S; Raina, S N

    2005-01-01

    A major component of the plant nuclear genome is constituted by different classes of repetitive DNA sequences. The structural, functional and evolutionary aspects of the satellite repetitive DNA families, and their organization in the chromosomes is reviewed. The tandem satellite DNA sequences exhibit characteristic chromosomal locations, usually at subtelomeric and centromeric regions. The repetitive DNA family(ies) may be widely distributed in a taxonomic family or a genus, or may be specific for a species, genome or even a chromosome. They may acquire large-scale variations in their sequence and copy number over an evolutionary time-scale. These features have formed the basis of extensive utilization of repetitive sequences for taxonomic and phylogenetic studies. Hybrid polyploids have especially proven to be excellent models for studying the evolution of repetitive DNA sequences. Recent studies explicitly show that some repetitive DNA families localized at the telomeres and centromeres have acquired important structural and functional significance. The repetitive elements are under different evolutionary constraints as compared to the genes. Satellite DNA families are thought to arise de novo as a consequence of molecular mechanisms such as unequal crossing over, rolling circle amplification, replication slippage and mutation that constitute "molecular drive". Copyright 2005 S. Karger AG, Basel.

  20. Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications

    PubMed Central

    Del Medico, Luca; Christen, Heinz; Christen, Beat

    2017-01-01

    Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner. PMID:28531174

  1. [New hosts and vectors for genome cloning]. Progress report, 1990--1991

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    The main goal of our project remains the development of new bacterial hosts and vectors for the stable propagation of human DNA clones in E. coli. During the past six months of our current budget period, we have (1) continued to develop new hosts that permit the stable maintenance of unstable features of human DNA, and (2) developed a series of vectors for (a) cloning large DNA inserts, (b) assessing the frequency of human sequences that are lethal to the growth of E. coli, and (c) assessing the stability of human sequences cloned in M13 for large-scale sequencing projects.

  2. Chromosome evolution in the Thermotogales: large-scale inversions and strain diversification of CRISPR sequences.

    PubMed

    DeBoy, Robert T; Mongodin, Emmanuel F; Emerson, Joanne B; Nelson, Karen E

    2006-04-01

    In the present study, the chromosomes of two members of the Thermotogales were compared. A whole-genome alignment of Thermotoga maritima MSB8 and Thermotoga neapolitana NS-E has revealed numerous large-scale DNA rearrangements, most of which are associated with CRISPR DNA repeats and/or tRNA genes. These DNA rearrangements do not include the putative origin of DNA replication but move within the same replichore, i.e., the same replicating half of the chromosome (delimited by the replication origin and terminus). Based on cumulative GC skew analysis, both the T. maritima and T. neapolitana lineages contain one or two major inverted DNA segments. Also, based on PCR amplification and sequence analysis of the DNA joints that are associated with the major rearrangements, the overall chromosome architecture was found to be conserved at most DNA joints for other strains of T. neapolitana. Taken together, the results from this analysis suggest that the observed chromosomal rearrangements in the Thermotogales likely occurred by successive inversions after their divergence from a common ancestor and before strain diversification. Finally, sequence analysis shows that size polymorphisms in the DNA joints associated with CRISPRs can be explained by expansion and possibly contraction of the DNA repeat and spacer unit, providing a tool for discerning the relatedness of strains from different geographic locations.

  3. An integrated semiconductor device enabling non-optical genome sequencing.

    PubMed

    Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

    2011-07-20

    The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

  4. DNA-encoded chemistry: enabling the deeper sampling of chemical space.

    PubMed

    Goodnow, Robert A; Dumelin, Christoph E; Keefe, Anthony D

    2017-02-01

    DNA-encoded chemical library technologies are increasingly being adopted in drug discovery for hit and lead generation. DNA-encoded chemistry enables the exploration of chemical spaces four to five orders of magnitude more deeply than is achievable by traditional high-throughput screening methods. Operation of this technology requires developing a range of capabilities including aqueous synthetic chemistry, building block acquisition, oligonucleotide conjugation, large-scale molecular biological transformations, selection methodologies, PCR, sequencing, sequence data analysis and the analysis of large chemistry spaces. This Review provides an overview of the development and applications of DNA-encoded chemistry, highlighting the challenges and future directions for the use of this technology.

  5. Using herbarium-derived DNAs to assemble a large-scale DNA barcode library for the vascular plants of Canada.

    PubMed

    Kuzmina, Maria L; Braukmann, Thomas W A; Fazekas, Aron J; Graham, Sean W; Dewaard, Stephanie L; Rodrigues, Anuar; Bennett, Bruce A; Dickinson, Timothy A; Saarela, Jeffery M; Catling, Paul M; Newmaster, Steven G; Percy, Diana M; Fenneman, Erin; Lauron-Moreau, Aurélien; Ford, Bruce; Gillespie, Lynn; Subramanyam, Ragupathy; Whitton, Jeannette; Jennings, Linda; Metsger, Deborah; Warne, Connor P; Brown, Allison; Sears, Elizabeth; Dewaard, Jeremy R; Zakharov, Evgeny V; Hebert, Paul D N

    2017-12-01

    Constructing complete, accurate plant DNA barcode reference libraries can be logistically challenging for large-scale floras. Here we demonstrate the promise and challenges of using herbarium collections for building a DNA barcode reference library for the vascular plant flora of Canada. Our study examined 20,816 specimens representing 5076 of 5190 vascular plant species in Canada (98%). For 98% of the specimens, at least one of the DNA barcode regions was recovered from the plastid loci rbcL and matK and from the nuclear ITS2 region. We used beta regression to quantify the effects of age, type of preservation, and taxonomic affiliation (family) on DNA sequence recovery. Specimen age and method of preservation had significant effects on sequence recovery for all markers, but influenced some families more (e.g., Boraginaceae) than others (e.g., Asteraceae). Our DNA barcode library represents an unparalleled resource for metagenomic and ecological genetic research working on temperate and arctic biomes. An observed decline in sequence recovery with specimen age may be associated with poor primer matches, intragenomic variation (for ITS2), or inhibitory secondary compounds in some taxa.

  6. Using herbarium-derived DNAs to assemble a large-scale DNA barcode library for the vascular plants of Canada1

    PubMed Central

    Kuzmina, Maria L.; Braukmann, Thomas W. A.; Fazekas, Aron J.; Graham, Sean W.; Dewaard, Stephanie L.; Rodrigues, Anuar; Bennett, Bruce A.; Dickinson, Timothy A.; Saarela, Jeffery M.; Catling, Paul M.; Newmaster, Steven G.; Percy, Diana M.; Fenneman, Erin; Lauron-Moreau, Aurélien; Ford, Bruce; Gillespie, Lynn; Subramanyam, Ragupathy; Whitton, Jeannette; Jennings, Linda; Metsger, Deborah; Warne, Connor P.; Brown, Allison; Sears, Elizabeth; Dewaard, Jeremy R.; Zakharov, Evgeny V.; Hebert, Paul D. N.

    2017-01-01

    Premise of the study: Constructing complete, accurate plant DNA barcode reference libraries can be logistically challenging for large-scale floras. Here we demonstrate the promise and challenges of using herbarium collections for building a DNA barcode reference library for the vascular plant flora of Canada. Methods: Our study examined 20,816 specimens representing 5076 of 5190 vascular plant species in Canada (98%). For 98% of the specimens, at least one of the DNA barcode regions was recovered from the plastid loci rbcL and matK and from the nuclear ITS2 region. We used beta regression to quantify the effects of age, type of preservation, and taxonomic affiliation (family) on DNA sequence recovery. Results: Specimen age and method of preservation had significant effects on sequence recovery for all markers, but influenced some families more (e.g., Boraginaceae) than others (e.g., Asteraceae). Discussion: Our DNA barcode library represents an unparalleled resource for metagenomic and ecological genetic research working on temperate and arctic biomes. An observed decline in sequence recovery with specimen age may be associated with poor primer matches, intragenomic variation (for ITS2), or inhibitory secondary compounds in some taxa. PMID:29299394

  7. A new and fast method for preparing high quality lambda DNA suitable for sequencing.

    PubMed Central

    Manfioletti, G; Schneider, C

    1988-01-01

    A method is described for the rapid purification of high quality lambda DNA. The method can be used from either liquid or plate lysates and on a small scale or a large scale. It relies on the preadsobtion of all polyanions present in the lysate to an "insoluble" anion-exchange matrix (DEAE or TEAE). Phage particles are then disrupted by combined treatment with EDTA/proteinase K and the resulting DNA is precipitated by the addition of the cationic detergent cetyl (or hexadecyl)-trimethyl ammonium bromide-CTAB ("soluble" anion-exchange matrix). The precipitated CTAB-DNA complex is then exchanged to Na-DNA and ethanol precipitated. The resultant purified DNA is suitable for enzymatic reactions and provides a high quality template for dideoxy-sequence analysis. Images PMID:2966928

  8. Precision medicine in the age of big data: The present and future role of large-scale unbiased sequencing in drug discovery and development.

    PubMed

    Vicini, P; Fields, O; Lai, E; Litwack, E D; Martin, A-M; Morgan, T M; Pacanowski, M A; Papaluca, M; Perez, O D; Ringel, M S; Robson, M; Sakul, H; Vockley, J; Zaks, T; Dolsten, M; Søgaard, M

    2016-02-01

    High throughput molecular and functional profiling of patients is a key driver of precision medicine. DNA and RNA characterization has been enabled at unprecedented cost and scale through rapid, disruptive progress in sequencing technology, but challenges persist in data management and interpretation. We analyze the state-of-the-art of large-scale unbiased sequencing in drug discovery and development, including technology, application, ethical, regulatory, policy and commercial considerations, and discuss issues of LUS implementation in clinical and regulatory practice. © 2015 American Society for Clinical Pharmacology and Therapeutics.

  9. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  10. Preparation of fosmid libraries and functional metagenomic analysis of microbial community DNA.

    PubMed

    Martínez, Asunción; Osburne, Marcia S

    2013-01-01

    One of the most important challenges in contemporary microbial ecology is to assign a functional role to the large number of novel genes discovered through large-scale sequencing of natural microbial communities that lack similarity to genes of known function. Functional screening of metagenomic libraries, that is, screening environmental DNA clones for the ability to confer an activity of interest to a heterologous bacterial host, is a promising approach for bridging the gap between metagenomic DNA sequencing and functional characterization. Here, we describe methods for isolating environmental DNA and constructing metagenomic fosmid libraries, as well as methods for designing and implementing successful functional screens of such libraries. © 2013 Elsevier Inc. All rights reserved.

  11. Flow cytometry for enrichment and titration in massively parallel DNA sequencing

    PubMed Central

    Sandberg, Julia; Ståhl, Patrik L.; Ahmadian, Afshin; Bjursell, Magnus K.; Lundeberg, Joakim

    2009-01-01

    Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. PMID:19304748

  12. Targeted enrichment strategies for next-generation plant biology

    Treesearch

    Richard Cronn; Brian J. Knaus; Aaron Liston; Peter J. Maughan; Matthew Parks; John V. Syring; Joshua Udall

    2012-01-01

    The dramatic advances offered by modem DNA sequencers continue to redefine the limits of what can be accomplished in comparative plant biology. Even with recent achievements, however, plant genomes present obstacles that can make it difficult to execute large-scale population and phylogenetic studies on next-generation sequencing platforms. Factors like large genome...

  13. Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

    PubMed

    Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

    2017-01-01

    Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.

  14. The Neandertal genome and ancient DNA authenticity

    PubMed Central

    Green, Richard E; Briggs, Adrian W; Krause, Johannes; Prüfer, Kay; Burbano, Hernán A; Siebauer, Michael; Lachmann, Michael; Pääbo, Svante

    2009-01-01

    Recent advances in high-thoughput DNA sequencing have made genome-scale analyses of genomes of extinct organisms possible. With these new opportunities come new difficulties in assessing the authenticity of the DNA sequences retrieved. We discuss how these difficulties can be addressed, particularly with regard to analyses of the Neandertal genome. We argue that only direct assays of DNA sequence positions in which Neandertals differ from all contemporary humans can serve as a reliable means to estimate human contamination. Indirect measures, such as the extent of DNA fragmentation, nucleotide misincorporations, or comparison of derived allele frequencies in different fragment size classes, are unreliable. Fortunately, interim approaches based on mtDNA differences between Neandertals and current humans, detection of male contamination through Y chromosomal sequences, and repeated sequencing from the same fossil to detect autosomal contamination allow initial large-scale sequencing of Neandertal genomes. This will result in the discovery of fixed differences in the nuclear genome between Neandertals and current humans that can serve as future direct assays for contamination. For analyses of other fossil hominins, which may become possible in the future, we suggest a similar ‘boot-strap' approach in which interim approaches are applied until sufficient data for more definitive direct assays are acquired. PMID:19661919

  15. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique.

    PubMed

    Chechetkin, V R; Lobzin, V V

    2017-08-07

    Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Comprehensive Analysis of DNA Methylation Data with RnBeads

    PubMed Central

    Walter, Jörn; Lengauer, Thomas; Bock, Christoph

    2014-01-01

    RnBeads is a software tool for large-scale analysis and interpretation of DNA methylation data, providing a user-friendly analysis workflow that yields detailed hypertext reports (http://rnbeads.mpi-inf.mpg.de). Supported assays include whole genome bisulfite sequencing, reduced representation bisulfite sequencing, Infinium microarrays, and any other protocol that produces high-resolution DNA methylation data. Important applications of RnBeads include the analysis of epigenome-wide association studies and epigenetic biomarker discovery in cancer cohorts. PMID:25262207

  17. The sequence of sequencers: The history of sequencing DNA

    PubMed Central

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  18. Pooled-DNA Sequencing for Elucidating New Genomic Risk Factors, Rare Variants Underlying Alzheimer's Disease.

    PubMed

    Jin, Sheng Chih; Benitez, Bruno A; Deming, Yuetiva; Cruchaga, Carlos

    2016-01-01

    Analyses of genome-wide association studies (GWAS) for complex disorders usually identify common variants with a relatively small effect size that only explain a small proportion of phenotypic heritability. Several studies have suggested that a significant fraction of heritability may be explained by low-frequency (minor allele frequency (MAF) of 1-5 %) and rare-variants that are not contained in the commercial GWAS genotyping arrays (Schork et al., Curr Opin Genet Dev 19:212, 2009). Rare variants can also have relatively large effects on risk for developing human diseases or disease phenotype (Cruchaga et al., PLoS One 7:e31039, 2012). However, it is necessary to perform next-generation sequencing (NGS) studies in a large population (>4,000 samples) to detect a significant rare-variant association. Several NGS methods, such as custom capture sequencing and amplicon-based sequencing, are designed to screen a small proportion of the genome, but most of these methods are limited in the number of samples that can be multiplexed (i.e. most sequencing kits only provide 96 distinct index). Additionally, the sequencing library preparation for 4,000 samples remains expensive and thus conducting NGS studies with the aforementioned methods are not feasible for most research laboratories.The need for low-cost large scale rare-variant detection makes pooled-DNA sequencing an ideally efficient and cost-effective technique to identify rare variants in target regions by sequencing hundreds to thousands of samples. Our recent work has demonstrated that pooled-DNA sequencing can accurately detect rare variants in targeted regions in multiple DNA samples with high sensitivity and specificity (Jin et al., Alzheimers Res Ther 4:34, 2012). In these studies we used a well-established pooled-DNA sequencing approach and a computational package, SPLINTER (short indel prediction by large deviation inference and nonlinear true frequency estimation by recursion) (Vallania et al., Genome Res 20:1711, 2010), for accurate identification of rare variants in large DNA pools. Given an average sequencing coverage of 30× per haploid genome, SPLINTER can detect rare variants and short indels up to 4 base pairs (bp) with high sensitivity and specificity (up to 1 haploid allele in a pool as large as 500 individuals). Step-by-step instructions on how to conduct pooled-DNA sequencing experiments and data analyses are described in this chapter.

  19. Biophysics of protein-DNA interactions and chromosome organization

    PubMed Central

    Marko, John F.

    2014-01-01

    The function of DNA in cells depends on its interactions with protein molecules, which recognize and act on base sequence patterns along the double helix. These notes aim to introduce basic polymer physics of DNA molecules, biophysics of protein-DNA interactions and their study in single-DNA experiments, and some aspects of large-scale chromosome structure. Mechanisms for control of chromosome topology will also be discussed. PMID:25419039

  20. Genome sequencing in microfabricated high-density picolitre reactors.

    PubMed

    Margulies, Marcel; Egholm, Michael; Altman, William E; Attiya, Said; Bader, Joel S; Bemben, Lisa A; Berka, Jan; Braverman, Michael S; Chen, Yi-Ju; Chen, Zhoutao; Dewell, Scott B; Du, Lei; Fierro, Joseph M; Gomes, Xavier V; Godwin, Brian C; He, Wen; Helgesen, Scott; Ho, Chun Heen; Ho, Chun He; Irzyk, Gerard P; Jando, Szilveszter C; Alenquer, Maria L I; Jarvie, Thomas P; Jirage, Kshama B; Kim, Jong-Bum; Knight, James R; Lanza, Janna R; Leamon, John H; Lefkowitz, Steven M; Lei, Ming; Li, Jing; Lohman, Kenton L; Lu, Hong; Makhijani, Vinod B; McDade, Keith E; McKenna, Michael P; Myers, Eugene W; Nickerson, Elizabeth; Nobile, John R; Plant, Ramona; Puc, Bernard P; Ronan, Michael T; Roth, George T; Sarkis, Gary J; Simons, Jan Fredrik; Simpson, John W; Srinivasan, Maithreyan; Tartaro, Karrie R; Tomasz, Alexander; Vogt, Kari A; Volkmer, Greg A; Wang, Shally H; Wang, Yong; Weiner, Michael P; Yu, Pengguang; Begley, Richard F; Rothberg, Jonathan M

    2005-09-15

    The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.

  1. Application and comparison of large-scale solution-based DNA capture-enrichment methods on ancient DNA

    PubMed Central

    Ávila-Arcos, María C.; Cappellini, Enrico; Romero-Navarro, J. Alberto; Wales, Nathan; Moreno-Mayar, J. Víctor; Rasmussen, Morten; Fordyce, Sarah L.; Montiel, Rafael; Vielle-Calzada, Jean-Philippe; Willerslev, Eske; Gilbert, M. Thomas P.

    2011-01-01

    The development of second-generation sequencing technologies has greatly benefitted the field of ancient DNA (aDNA). Its application can be further exploited by the use of targeted capture-enrichment methods to overcome restrictions posed by low endogenous and contaminating DNA in ancient samples. We tested the performance of Agilent's SureSelect and Mycroarray's MySelect in-solution capture systems on Illumina sequencing libraries built from ancient maize to identify key factors influencing aDNA capture experiments. High levels of clonality as well as the presence of multiple-copy sequences in the capture targets led to biases in the data regardless of the capture method. Neither method consistently outperformed the other in terms of average target enrichment, and no obvious difference was observed either when two tiling designs were compared. In addition to demonstrating the plausibility of capturing aDNA from ancient plant material, our results also enable us to provide useful recommendations for those planning targeted-sequencing on aDNA. PMID:22355593

  2. Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

    PubMed

    Li, Zhi; Cao, Hongyan; Cui, Yuehua; Zhang, Yanbo

    2016-01-25

    DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences. We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract "DNA words" that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods. The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary. Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.

  3. Methods and materials relating to IMPDH and GMP production

    DOEpatents

    Collart, Frank R.; Huberman, Eliezer

    1997-01-01

    Disclosed are purified and isolated DNA sequences encoding eukaryotic proteins possessing biological properties of inosine 5'-monophosphate dehydrogenase ("IMPDH"). Illustratively, mammalian (e.g., human) IMPDH-encoding DNA sequences are useful in transformation or transfection of host cells for the large scale recombinant production of the enzymatically active expression products and/or products (e.g., GMP) resulting from IMPDH catalyzed synthesis in cells. Vectors including IMPDH-encoding DNA sequences are useful in gene amplification procedures. Recombinant proteins and synthetic peptides provided by the invention are useful as immunological reagents and in the preparation of antibodies (including polyclonal and monoclonal antibodies) for quantitative detection of IMPDH.

  4. Genetic diversity of armored scales (Hemiptera: Diaspididae) and soft scales (Hemiptera: Coccidae) in Chile.

    PubMed

    Amouroux, P; Crochard, D; Germain, J-F; Correa, M; Ampuero, J; Groussier, G; Kreiter, P; Malausa, T; Zaviezo, T

    2017-05-17

    Scale insects (Sternorrhyncha: Coccoidea) are one of the most invasive and agriculturally damaging insect groups. Their management and the development of new control methods are currently jeopardized by the scarcity of identification data, in particular in regions where no large survey coupling morphological and DNA analyses have been performed. In this study, we sampled 116 populations of armored scales (Hemiptera: Diaspididae) and 112 populations of soft scales (Hemiptera: Coccidae) in Chile, over a latitudinal gradient ranging from 18°S to 41°S, on fruit crops, ornamental plants and trees. We sequenced the COI and 28S genes in each population. In total, 19 Diaspididae species and 11 Coccidae species were identified morphologically. From the 63 COI haplotypes and the 54 28S haplotypes uncovered, and using several DNA data analysis methods (Automatic Barcode Gap Discovery, K2P distance, NJ trees), up to 36 genetic clusters were detected. Morphological and DNA data were congruent, except for three species (Aspidiotus nerii, Hemiberlesia rapax and Coccus hesperidum) in which DNA data revealed highly differentiated lineages. More than 50% of the haplotypes obtained had no high-scoring matches with any of the sequences in the GenBank database. This study provides 63 COI and 54 28S barcode sequences for the identification of Coccoidea from Chile.

  5. Ancient DNA studies: new perspectives on old samples

    PubMed Central

    2012-01-01

    In spite of past controversies, the field of ancient DNA is now a reliable research area due to recent methodological improvements. A series of recent large-scale studies have revealed the true potential of ancient DNA samples to study the processes of evolution and to test models and assumptions commonly used to reconstruct patterns of evolution and to analyze population genetics and palaeoecological changes. Recent advances in DNA technologies, such as next-generation sequencing make it possible to recover DNA information from archaeological and paleontological remains allowing us to go back in time and study the genetic relationships between extinct organisms and their contemporary relatives. With the next-generation sequencing methodologies, DNA sequences can be retrieved even from samples (for example human remains) for which the technical pitfalls of classical methodologies required stringent criteria to guaranty the reliability of the results. In this paper, we review the methodologies applied to ancient DNA analysis and the perspectives that next-generation sequencing applications provide in this field. PMID:22697611

  6. Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets.

    PubMed

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2017-05-01

    We developed the cloud-based MOTIFSIM on Amazon Web Services (AWS) cloud. The tool is an extended version from our web-based tool version 2.0, which was developed based on a novel algorithm for detecting similarity in multiple DNA motif data sets. This cloud-based version further allows researchers to exploit the computing resources available from AWS to detect similarity in multiple large-scale DNA motif data sets resulting from the next-generation sequencing technology. The tool is highly scalable with expandable AWS.

  7. The sequence of sequencers: The history of sequencing DNA.

    PubMed

    Heather, James M; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Analysis of genetic diversity using SNP markers in oat

    USDA-ARS?s Scientific Manuscript database

    A large-scale single nucleotide polymorphism (SNP) discovery was carried out in cultivated oat using Roche 454 sequencing methods. DNA sequences were generated from cDNAs originating from a panel of 20 diverse oat cultivars, and from Diversity Array Technology (DArT) genomic complexity reductions fr...

  9. Successful application of FTA Classic Card technology and use of bacteriophage phi29 DNA polymerase for large-scale field sampling and cloning of complete maize streak virus genomes.

    PubMed

    Owor, Betty E; Shepherd, Dionne N; Taylor, Nigel J; Edema, Richard; Monjane, Adérito L; Thomson, Jennifer A; Martin, Darren P; Varsani, Arvind

    2007-03-01

    Leaf samples from 155 maize streak virus (MSV)-infected maize plants were collected from 155 farmers' fields in 23 districts in Uganda in May/June 2005 by leaf-pressing infected samples onto FTA Classic Cards. Viral DNA was successfully extracted from cards stored at room temperature for 9 months. The diversity of 127 MSV isolates was analysed by PCR-generated RFLPs. Six representative isolates having different RFLP patterns and causing either severe, moderate or mild disease symptoms, were chosen for amplification from FTA cards by bacteriophage phi29 DNA polymerase using the TempliPhi system. Full-length genomes were inserted into a cloning vector using a unique restriction enzyme site, and sequenced. The 1.3-kb PCR product amplified directly from FTA-eluted DNA and used for RFLP analysis was also cloned and sequenced. Comparison of cloned whole genome sequences with those of the original PCR products indicated that the correct virus genome had been cloned and that no errors were introduced by the phi29 polymerase. This is the first successful large-scale application of FTA card technology to the field, and illustrates the ease with which large numbers of infected samples can be collected and stored for downstream molecular applications such as diversity analysis and cloning of potentially new virus genomes.

  10. Selectivity by host plants affects the distribution of arbuscular mycorrhizal fungi: evidence from ITS rDNA sequence metadata.

    PubMed

    Yang, Haishui; Zang, Yanyan; Yuan, Yongge; Tang, Jianjun; Chen, Xin

    2012-04-12

    Arbuscular mycorrhizal fungi (AMF) can form obligate symbioses with the vast majority of land plants, and AMF distribution patterns have received increasing attention from researchers. At the local scale, the distribution of AMF is well documented. Studies at large scales, however, are limited because intensive sampling is difficult. Here, we used ITS rDNA sequence metadata obtained from public databases to study the distribution of AMF at continental and global scales. We also used these sequence metadata to investigate whether host plant is the main factor that affects the distribution of AMF at large scales. We defined 305 ITS virtual taxa (ITS-VTs) among all sequences of the Glomeromycota by using a comprehensive maximum likelihood phylogenetic analysis. Each host taxonomic order averaged about 53% specific ITS-VTs, and approximately 60% of the ITS-VTs were host specific. Those ITS-VTs with wide host range showed wide geographic distribution. Most ITS-VTs occurred in only one type of host functional group. The distributions of most ITS-VTs were limited across ecosystem, across continent, across biogeographical realm, and across climatic zone. Non-metric multidimensional scaling analysis (NMDS) showed that AMF community composition differed among functional groups of hosts, and among ecosystem, continent, biogeographical realm, and climatic zone. The Mantel test showed that AMF community composition was significantly correlated with plant community composition among ecosystem, among continent, among biogeographical realm, and among climatic zone. The structural equation modeling (SEM) showed that the effects of ecosystem, continent, biogeographical realm, and climatic zone were mainly indirect on AMF distribution, but plant had strongly direct effects on AMF. The distribution of AMF as indicated by ITS rDNA sequences showed a pattern of high endemism at large scales. This pattern indicates high specificity of AMF for host at different scales (plant taxonomic order and functional group) and high selectivity from host plants for AMF. The effects of ecosystemic, biogeographical, continental and climatic factors on AMF distribution might be mediated by host plants.

  11. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes.

    PubMed

    Fredlake, Christopher P; Hert, Daniel G; Kan, Cheuk-Wai; Chiesl, Thomas N; Root, Brian E; Forster, Ryan E; Barron, Annelise E

    2008-01-15

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require approximately 70 min to deliver approximately 650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered "hybrid" mechanism of DNA electromigration, in which DNA molecules alternate rapidly between repeating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs.

  12. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes

    PubMed Central

    Fredlake, Christopher P.; Hert, Daniel G.; Kan, Cheuk-Wai; Chiesl, Thomas N.; Root, Brian E.; Forster, Ryan E.; Barron, Annelise E.

    2008-01-01

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require ≈70 min to deliver ≈650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered “hybrid” mechanism of DNA electromigration, in which DNA molecules alternate rapidly between reptating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs. PMID:18184818

  13. A private DNA motif finding algorithm.

    PubMed

    Chen, Rui; Peng, Yun; Choi, Byron; Xu, Jianliang; Hu, Haibo

    2014-08-01

    With the increasing availability of genomic sequence data, numerous methods have been proposed for finding DNA motifs. The discovery of DNA motifs serves a critical step in many biological applications. However, the privacy implication of DNA analysis is normally neglected in the existing methods. In this work, we propose a private DNA motif finding algorithm in which a DNA owner's privacy is protected by a rigorous privacy model, known as ∊-differential privacy. It provides provable privacy guarantees that are independent of adversaries' background knowledge. Our algorithm makes use of the n-gram model and is optimized for processing large-scale DNA sequences. We evaluate the performance of our algorithm over real-life genomic data and demonstrate the promise of integrating privacy into DNA motif finding. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    PubMed

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  15. Sequence-dependent DNA deformability studied using molecular dynamics simulations.

    PubMed

    Fujii, Satoshi; Kono, Hidetoshi; Takenaka, Shigeori; Go, Nobuhiro; Sarai, Akinori

    2007-01-01

    Proteins recognize specific DNA sequences not only through direct contact between amino acids and bases, but also indirectly based on the sequence-dependent conformation and deformability of the DNA (indirect readout). We used molecular dynamics simulations to analyze the sequence-dependent DNA conformations of all 136 possible tetrameric sequences sandwiched between CGCG sequences. The deformability of dimeric steps obtained by the simulations is consistent with that by the crystal structures. The simulation results further showed that the conformation and deformability of the tetramers can highly depend on the flanking base pairs. The conformations of xATx tetramers show the most rigidity and are not affected by the flanking base pairs and the xYRx show by contrast the greatest flexibility and change their conformations depending on the base pairs at both ends, suggesting tetramers with the same central dimer can show different deformabilities. These results suggest that analysis of dimeric steps alone may overlook some conformational features of DNA and provide insight into the mechanism of indirect readout during protein-DNA recognition. Moreover, the sequence dependence of DNA conformation and deformability may be used to estimate the contribution of indirect readout to the specificity of protein-DNA recognition as well as nucleosome positioning and large-scale behavior of nucleic acids.

  16. Pipeline for large-scale microdroplet bisulfite PCR-based sequencing allows the tracking of hepitype evolution in tumors.

    PubMed

    Herrmann, Alexander; Haake, Andrea; Ammerpohl, Ole; Martin-Guerrero, Idoia; Szafranski, Karol; Stemshorn, Kathryn; Nothnagel, Michael; Kotsopoulos, Steve K; Richter, Julia; Warner, Jason; Olson, Jeff; Link, Darren R; Schreiber, Stefan; Krawczak, Michael; Platzer, Matthias; Nürnberg, Peter; Siebert, Reiner; Hampe, Jochen

    2011-01-01

    Cytosine methylation provides an epigenetic level of cellular plasticity that is important for development, differentiation and cancerogenesis. We adopted microdroplet PCR to bisulfite treated target DNA in combination with second generation sequencing to simultaneously assess DNA sequence and methylation. We show measurement of methylation status in a wide range of target sequences (total 34 kb) with an average coverage of 95% (median 100%) and good correlation to the opposite strand (rho = 0.96) and to pyrosequencing (rho = 0.87). Data from lymphoma and colorectal cancer samples for SNRPN (imprinted gene), FGF6 (demethylated in the cancer samples) and HS3ST2 (methylated in the cancer samples) serve as a proof of principle showing the integration of SNP data and phased DNA-methylation information into "hepitypes" and thus the analysis of DNA methylation phylogeny in the somatic evolution of cancer.

  17. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

    PubMed

    Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

    2009-06-01

    The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.

  18. A streamlined collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, exemplified by the Indonesian Biodiversity Discovery and Information System (IndoBioSys).

    PubMed

    Schmidt, Olga; Hausmann, Axel; Cancian de Araujo, Bruno; Sutrisno, Hari; Peggie, Djunijanti; Schmidt, Stefan

    2017-01-01

    Here we present a general collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, and a comparison with alternative preserving and vouchering methods. About 98% of the sequenced specimens processed using the present collecting and preparation protocol yielded sequences with more than 500 base pairs. The study is based on the first outcomes of the Indonesian Biodiversity Discovery and Information System (IndoBioSys). IndoBioSys is a German-Indonesian research project that is conducted by the Museum für Naturkunde in Berlin and the Zoologische Staatssammlung München, in close cooperation with the Research Center for Biology - Indonesian Institute of Sciences (RCB-LIPI, Bogor).

  19. Googling DNA sequences on the World Wide Web.

    PubMed

    Hajibabaei, Mehrdad; Singer, Gregory A C

    2009-11-10

    New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.

  20. A MBD-seq protocol for large-scale methylome-wide studies with (very) low amounts of DNA.

    PubMed

    Aberg, Karolina A; Chan, Robin F; Shabalin, Andrey A; Zhao, Min; Turecki, Gustavo; Staunstrup, Nicklas Heine; Starnawska, Anna; Mors, Ole; Xie, Lin Y; van den Oord, Edwin Jcg

    2017-09-01

    We recently showed that, after optimization, our methyl-CpG binding domain sequencing (MBD-seq) application approximates the methylome-wide coverage obtained with whole-genome bisulfite sequencing (WGB-seq), but at a cost that enables adequately powered large-scale association studies. A prior drawback of MBD-seq is the relatively large amount of genomic DNA (ideally >1 µg) required to obtain high-quality data. Biomaterials are typically expensive to collect, provide a finite amount of DNA, and may simply not yield sufficient starting material. The ability to use low amounts of DNA will increase the breadth and number of studies that can be conducted. Therefore, we further optimized the enrichment step. With this low starting material protocol, MBD-seq performed equally well, or better, than the protocol requiring ample starting material (>1 µg). Using only 15 ng of DNA as input, there is minimal loss in data quality, achieving 93% of the coverage of WGB-seq (with standard amounts of input DNA) at similar false/positive rates. Furthermore, across a large number of genomic features, the MBD-seq methylation profiles closely tracked those observed for WGB-seq with even slightly larger effect sizes. This suggests that MBD-seq provides similar information about the methylome and classifies methylation status somewhat more accurately. Performance decreases with <15 ng DNA as starting material but, even with as little as 5 ng, MBD-seq still achieves 90% of the coverage of WGB-seq with comparable genome-wide methylation profiles. Thus, the proposed protocol is an attractive option for adequately powered and cost-effective methylome-wide investigations using (very) low amounts of DNA.

  1. Genome Calligrapher: A Web Tool for Refactoring Bacterial Genome Sequences for de Novo DNA Synthesis.

    PubMed

    Christen, Matthias; Deutsch, Samuel; Christen, Beat

    2015-08-21

    Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher  .

  2. Identification of tissue-specific, abiotic stress-responsive gene expression patterns in wine grape (Vitis vinifera L.) based on curation and mining of large-scale EST data sets

    PubMed Central

    2011-01-01

    Background Abiotic stresses, such as water deficit and soil salinity, result in changes in physiology, nutrient use, and vegetative growth in vines, and ultimately, yield and flavor in berries of wine grape, Vitis vinifera L. Large-scale expressed sequence tags (ESTs) were generated, curated, and analyzed to identify major genetic determinants responsible for stress-adaptive responses. Although roots serve as the first site of perception and/or injury for many types of abiotic stress, EST sequencing in root tissues of wine grape exposed to abiotic stresses has been extremely limited to date. To overcome this limitation, large-scale EST sequencing was conducted from root tissues exposed to multiple abiotic stresses. Results A total of 62,236 expressed sequence tags (ESTs) were generated from leaf, berry, and root tissues from vines subjected to abiotic stresses and compared with 32,286 ESTs sequenced from 20 public cDNA libraries. Curation to correct annotation errors, clustering and assembly of the berry and leaf ESTs with currently available V. vinifera full-length transcripts and ESTs yielded a total of 13,278 unique sequences, with 2302 singletons and 10,976 mapped to V. vinifera gene models. Of these, 739 transcripts were found to have significant differential expression in stressed leaves and berries including 250 genes not described previously as being abiotic stress responsive. In a second analysis of 16,452 ESTs from a normalized root cDNA library derived from roots exposed to multiple, short-term, abiotic stresses, 135 genes with root-enriched expression patterns were identified on the basis of their relative EST abundance in roots relative to other tissues. Conclusions The large-scale analysis of relative EST frequency counts among a diverse collection of 23 different cDNA libraries from leaf, berry, and root tissues of wine grape exposed to a variety of abiotic stress conditions revealed distinct, tissue-specific expression patterns, previously unrecognized stress-induced genes, and many novel genes with root-enriched mRNA expression for improving our understanding of root biology and manipulation of rootstock traits in wine grape. mRNA abundance estimates based on EST library-enriched expression patterns showed only modest correlations between microarray and quantitative, real-time reverse transcription-polymerase chain reaction (qRT-PCR) methods highlighting the need for deep-sequencing expression profiling methods. PMID:21592389

  3. Gold nanoparticles for high-throughput genotyping of long-range haplotypes

    NASA Astrophysics Data System (ADS)

    Chen, Peng; Pan, Dun; Fan, Chunhai; Chen, Jianhua; Huang, Ke; Wang, Dongfang; Zhang, Honglu; Li, You; Feng, Guoyin; Liang, Peiji; He, Lin; Shi, Yongyong

    2011-10-01

    Completion of the Human Genome Project and the HapMap Project has led to increasing demands for mapping complex traits in humans to understand the aetiology of diseases. Identifying variations in the DNA sequence, which affect how we develop disease and respond to pathogens and drugs, is important for this purpose, but it is difficult to identify these variations in large sample sets. Here we show that through a combination of capillary sequencing and polymerase chain reaction assisted by gold nanoparticles, it is possible to identify several DNA variations that are associated with age-related macular degeneration and psoriasis on significant regions of human genomic DNA. Our method is accurate and promising for large-scale and high-throughput genetic analysis of susceptibility towards disease and drug resistance.

  4. Combined Targeted DNA Sequencing in Non-Small Cell Lung Cancer (NSCLC) Using UNCseq and NGScopy, and RNA Sequencing Using UNCqeR for the Detection of Genetic Aberrations in NSCLC

    PubMed Central

    Walter, Vonn; Patel, Nirali M.; Eberhard, David A.; Hayward, Michele C.; Salazar, Ashley H.; Jo, Heejoon; Soloway, Matthew G.; Wilkerson, Matthew D.; Parker, Joel S.; Yin, Xiaoying; Zhang, Guosheng; Siegel, Marni B.; Rosson, Gary B.; Earp, H. Shelton; Sharpless, Norman E.; Gulley, Margaret L.; Weck, Karen E.

    2015-01-01

    The recent FDA approval of the MiSeqDx platform provides a unique opportunity to develop targeted next generation sequencing (NGS) panels for human disease, including cancer. We have developed a scalable, targeted panel-based assay termed UNCseq, which involves a NGS panel of over 200 cancer-associated genes and a standardized downstream bioinformatics pipeline for detection of single nucleotide variations (SNV) as well as small insertions and deletions (indel). In addition, we developed a novel algorithm, NGScopy, designed for samples with sparse sequencing coverage to detect large-scale copy number variations (CNV), similar to human SNP Array 6.0 as well as small-scale intragenic CNV. Overall, we applied this assay to 100 snap-frozen lung cancer specimens lacking same-patient germline DNA (07–0120 tissue cohort) and validated our results against Sanger sequencing, SNP Array, and our recently published integrated DNA-seq/RNA-seq assay, UNCqeR, where RNA-seq of same-patient tumor specimens confirmed SNV detected by DNA-seq, if RNA-seq coverage depth was adequate. In addition, we applied the UNCseq assay on an independent lung cancer tumor tissue collection with available same-patient germline DNA (11–1115 tissue cohort) and confirmed mutations using assays performed in a CLIA-certified laboratory. We conclude that UNCseq can identify SNV, indel, and CNV in tumor specimens lacking germline DNA in a cost-efficient fashion. PMID:26076459

  5. End-to-end distance and contour length distribution functions of DNA helices

    NASA Astrophysics Data System (ADS)

    Zoli, Marco

    2018-06-01

    I present a computational method to evaluate the end-to-end and the contour length distribution functions of short DNA molecules described by a mesoscopic Hamiltonian. The method generates a large statistical ensemble of possible configurations for each dimer in the sequence, selects the global equilibrium twist conformation for the molecule, and determines the average base pair distances along the molecule backbone. Integrating over the base pair radial and angular fluctuations, I derive the room temperature distribution functions as a function of the sequence length. The obtained values for the most probable end-to-end distance and contour length distance, providing a measure of the global molecule size, are used to examine the DNA flexibility at short length scales. It is found that, also in molecules with less than ˜60 base pairs, coiled configurations maintain a large statistical weight and, consistently, the persistence lengths may be much smaller than in kilo-base DNA.

  6. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

    PubMed

    Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min

    2015-06-01

    The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.

  7. BAC sequencing using pooled methods.

    PubMed

    Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

    2015-01-01

    Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.

  8. [Genome-scale sequence data processing and epigenetic analysis of DNA methylation].

    PubMed

    Wang, Ting-Zhang; Shan, Gao; Xu, Jian-Hong; Xue, Qing-Zhong

    2013-06-01

    A new approach recently developed for detecting cytosine DNA methylation (mC) and analyzing the genome-scale DNA methylation profiling, is called BS-Seq which is based on bisulfite conversion of genomic DNA combined with next-generation sequencing. The method can not only provide an insight into the difference of genome-scale DNA methylation among different organisms, but also reveal the conservation of DNA methylation in all contexts and nucleotide preference for different genomic regions, including genes, exons, and repetitive DNA sequences. It will be helpful to under-stand the epigenetic impacts of cytosine DNA methylation on the regulation of gene expression and maintaining silence of repetitive sequences, such as transposable elements. In this paper, we introduce the preprocessing steps of DNA methylation data, by which cytosine (C) and guanine (G) in the reference sequence are transferred to thymine (T) and adenine (A), and cytosine in reads is transferred to thymine, respectively. We also comprehensively review the main content of the DNA methylation analysis on the genomic scale: (1) the cytosine methylation under the context of different sequences; (2) the distribution of genomic methylcytosine; (3) DNA methylation context and the preference for the nucleotides; (4) DNA- protein interaction sites of DNA methylation; (5) degree of methylation of cytosine in the different structural elements of genes. DNA methylation analysis technique provides a powerful tool for the epigenome study in human and other species, and genes and environment interaction, and founds the theoretical basis for further development of disease diagnostics and therapeutics in human.

  9. A streamlined collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, exemplified by the Indonesian Biodiversity Discovery and Information System (IndoBioSys)

    PubMed Central

    Hausmann, Axel; Cancian de Araujo, Bruno; Sutrisno, Hari; Peggie, Djunijanti; Schmidt, Stefan

    2017-01-01

    Abstract Here we present a general collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, and a comparison with alternative preserving and vouchering methods. About 98% of the sequenced specimens processed using the present collecting and preparation protocol yielded sequences with more than 500 base pairs. The study is based on the first outcomes of the Indonesian Biodiversity Discovery and Information System (IndoBioSys). IndoBioSys is a German-Indonesian research project that is conducted by the Museum für Naturkunde in Berlin and the Zoologische Staatssammlung München, in close cooperation with the Research Center for Biology – Indonesian Institute of Sciences (RCB-LIPI, Bogor). PMID:29134041

  10. Working the kinks out of nucleosomal DNA

    PubMed Central

    Olson, Wilma K.; Zhurkin, Victor B.

    2011-01-01

    Condensation of DNA in the nucleosome takes advantage of its double-helical architecture. The DNA deforms at sites where the base pairs face the histone octamer. The largest so-called kink-and-slide deformations occur in the vicinity of arginines that penetrate the minor groove. Nucleosome structures formed from the 601 positioning sequence differ subtly from those incorporating an AT-rich human α-satellite DNA. Restraints imposed by the histone arginines on the displacement of base pairs can modulate the sequence-dependent deformability of DNA and potentially contribute to the unique features of the different nucleosomes. Steric barriers mimicking constraints found in the nucleosome induce the simulated large-scale rearrangement of canonical B-DNA to kink-and-slide states. The pathway to these states shows non-harmonic behavior consistent with bending profiles inferred from AFM measurements. PMID:21482100

  11. Satellite DNA-based artificial chromosomes for use in gene therapy.

    PubMed

    Hadlaczky, G

    2001-04-01

    Satellite DNA-based artificial chromosomes (SATACs) can be made by induced de novo chromosome formation in cells of different mammalian species. These artificially generated accessory chromosomes are composed of predictable DNA sequences and they contain defined genetic information. Prototype human SATACs have been successfully constructed in different cell types from 'neutral' endogenous DNA sequences from the short arm of the human chromosome 15. SATACs have already passed a number of hurdles crucial to their further development as gene therapy vectors, including: large-scale purification; transfer of purified artificial chromosomes into different cells and embryos; generation of transgenic animals and germline transmission with purified SATACs; and the tissue-specific expression of a therapeutic gene from an artificial chromosome in the milk of transgenic animals.

  12. Blueprints for green biotech: development and application of standards for plant synthetic biology.

    PubMed

    Patron, Nicola J

    2016-06-15

    Synthetic biology aims to apply engineering principles to the design and modification of biological systems and to the construction of biological parts and devices. The ability to programme cells by providing new instructions written in DNA is a foundational technology of the field. Large-scale de novo DNA synthesis has accelerated synthetic biology by offering custom-made molecules at ever decreasing costs. However, for large fragments and for experiments in which libraries of DNA sequences are assembled in different combinations, assembly in the laboratory is still desirable. Biological assembly standards allow DNA parts, even those from multiple laboratories and experiments, to be assembled together using the same reagents and protocols. The adoption of such standards for plant synthetic biology has been cohesive for the plant science community, facilitating the application of genome editing technologies to plant systems and streamlining progress in large-scale, multi-laboratory bioengineering projects. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.

  13. Rapid DNA extraction protocol for detection of alpha-1 antitrypsin deficiency from dried blood spots by real-time PCR.

    PubMed

    Struniawski, R; Szpechcinski, A; Poplawska, B; Skronski, M; Chorostowska-Wynimko, J

    2013-01-01

    The dried blood spot (DBS) specimens have been successfully employed for the large-scale diagnostics of α1-antitrypsin (AAT) deficiency as an easy to collect and transport alternative to plasma/serum. In the present study we propose a fast, efficient, and cost effective protocol of DNA extraction from dried blood spot (DBS) samples that provides sufficient quantity and quality of DNA and effectively eliminates any natural PCR inhibitors, allowing for successful AAT genotyping by real-time PCR and direct sequencing. DNA extracted from 84 DBS samples from chronic obstructive pulmonary disease patients was genotyped for AAT deficiency variants by real-time PCR. The results of DBS AAT genotyping were validated by serum IEF phenotyping and AAT concentration measurement. The proposed protocol allowed successful DNA extraction from all analyzed DBS samples. Both quantity and quality of DNA were sufficient for further real-time PCR and, if necessary, for genetic sequence analysis. A 100% concordance between AAT DBS genotypes and serum phenotypes in positive detection of two major deficiency S- and Z- alleles was achieved. Both assays, DBS AAT genotyping by real-time PCR and serum AAT phenotyping by IEF, positively identified PI*S and PI*Z allele in 8 out of the 84 (9.5%) and 16 out of 84 (19.0%) patients, respectively. In conclusion, the proposed protocol noticeably reduces the costs and the hand-on-time of DBS samples preparation providing genomic DNA of sufficient quantity and quality for further real-time PCR or genetic sequence analysis. Consequently, it is ideally suited for large-scale AAT deficiency screening programs and should be method of choice.

  14. Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples

    PubMed Central

    Wang, Jingwen; Skoog, Tiina; Einarsdottir, Elisabet; Kaartokallio, Tea; Laivuori, Hannele; Grauers, Anna; Gerdhem, Paul; Hytönen, Marjo; Lohi, Hannes; Kere, Juha; Jiao, Hong

    2016-01-01

    High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies. PMID:27633116

  15. A simple method for semi-random DNA amplicon fragmentation using the methylation-dependent restriction enzyme MspJI.

    PubMed

    Shinozuka, Hiroshi; Cogan, Noel O I; Shinozuka, Maiko; Marshall, Alexis; Kay, Pippa; Lin, Yi-Han; Spangenberg, German C; Forster, John W

    2015-04-11

    Fragmentation at random nucleotide locations is an essential process for preparation of DNA libraries to be used on massively parallel short-read DNA sequencing platforms. Although instruments for physical shearing, such as the Covaris S2 focused-ultrasonicator system, and products for enzymatic shearing, such as the Nextera technology and NEBNext dsDNA Fragmentase kit, are commercially available, a simple and inexpensive method is desirable for high-throughput sequencing library preparation. MspJI is a recently characterised restriction enzyme which recognises the sequence motif CNNR (where R = G or A) when the first base is modified to 5-methylcytosine or 5-hydroxymethylcytosine. A semi-random enzymatic DNA amplicon fragmentation method was developed based on the unique cleavage properties of MspJI. In this method, random incorporation of 5-methyl-2'-deoxycytidine-5'-triphosphate is achieved through DNA amplification with DNA polymerase, followed by DNA digestion with MspJI. Due to the recognition sequence of the enzyme, DNA amplicons are fragmented in a relatively sequence-independent manner. The size range of the resulting fragments was capable of control through optimisation of 5-methyl-2'-deoxycytidine-5'-triphosphate concentration in the reaction mixture. A library suitable for sequencing using the Illumina MiSeq platform was prepared and processed using the proposed method. Alignment of generated short reads to a reference sequence demonstrated a relatively high level of random fragmentation. The proposed method may be performed with standard laboratory equipment. Although the uniformity of coverage was slightly inferior to the Covaris physical shearing procedure, due to efficiencies of cost and labour, the method may be more suitable than existing approaches for implementation in large-scale sequencing activities, such as bacterial artificial chromosome (BAC)-based genome sequence assembly, pan-genomic studies and locus-targeted genotyping-by-sequencing.

  16. A feasibility study of colorectal cancer diagnosis via circulating tumor DNA derived CNV detection.

    PubMed

    Molparia, Bhuvan; Oliveira, Glenn; Wagner, Jennifer L; Spencer, Emily G; Torkamani, Ali

    2018-01-01

    Circulating tumor DNA (ctDNA) has shown great promise as a biomarker for early detection of cancer. However, due to the low abundance of ctDNA, especially at early stages, it is hard to detect at high accuracies while keeping sequencing costs low. Here we present a pilot stage study to detect large scale somatic copy numbers variations (CNVs), which contribute more molecules to ctDNA signal compared to point mutations, via cell free DNA sequencing. We show that it is possible to detect somatic CNVs in early stage colorectal cancer (CRC) patients and subsequently discriminate them from normal patients. With 25 normal and 24 CRC samples, we achieve 100% specificity (lower bound confidence interval: 86%) and ~79% sensitivity (95% confidence interval: 63% - 95%,), though the performance should be considered with caution given the limited sample size. We report a lack of concordance between the CNVs detected via cfDNA sequencing and CNVs identified in parent tissue samples. However, recent findings suggest that a lack of concordance is expected for CNVs in CRC because of their sub-clonal nature. Finally, the CNVs we detect very likely contribute to cancer progression as they lie in functionally important regions, and have been shown to be associated with CRC specifically. This study paves the path for a larger scale exploration of the potential of CNV detection for both diagnoses and prognoses of cancer.

  17. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    PubMed

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  18. How life changes itself: the Read-Write (RW) genome.

    PubMed

    Shapiro, James A

    2013-09-01

    The genome has traditionally been treated as a Read-Only Memory (ROM) subject to change by copying errors and accidents. In this review, I propose that we need to change that perspective and understand the genome as an intricately formatted Read-Write (RW) data storage system constantly subject to cellular modifications and inscriptions. Cells operate under changing conditions and are continually modifying themselves by genome inscriptions. These inscriptions occur over three distinct time-scales (cell reproduction, multicellular development and evolutionary change) and involve a variety of different processes at each time scale (forming nucleoprotein complexes, epigenetic formatting and changes in DNA sequence structure). Research dating back to the 1930s has shown that genetic change is the result of cell-mediated processes, not simply accidents or damage to the DNA. This cell-active view of genome change applies to all scales of DNA sequence variation, from point mutations to large-scale genome rearrangements and whole genome duplications (WGDs). This conceptual change to active cell inscriptions controlling RW genome functions has profound implications for all areas of the life sciences. © 2013 Elsevier B.V. All rights reserved.

  19. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. Results Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. Conclusions 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated. PMID:21232141

  20. Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy

    NASA Astrophysics Data System (ADS)

    Chen, Ellson Y.

    1997-05-01

    So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.

  1. Integrated sequencing of exome and mRNA of large-sized single cells.

    PubMed

    Wang, Lily Yan; Guo, Jiajie; Cao, Wei; Zhang, Meng; He, Jiankui; Li, Zhoufang

    2018-01-10

    Current approaches of single cell DNA-RNA integrated sequencing are difficult to call SNPs, because a large amount of DNA and RNA is lost during DNA-RNA separation. Here, we performed simultaneous single-cell exome and transcriptome sequencing on individual mouse oocytes. Using microinjection, we kept the nuclei intact to avoid DNA loss, while retaining the cytoplasm inside the cell membrane, to maximize the amount of DNA and RNA captured from the single cell. We then conducted exome-sequencing on the isolated nuclei and mRNA-sequencing on the enucleated cytoplasm. For single oocytes, exome-seq can cover up to 92% of exome region with an average sequencing depth of 10+, while mRNA-sequencing reveals more than 10,000 expressed genes in enucleated cytoplasm, with similar performance for intact oocytes. This approach provides unprecedented opportunities to study DNA-RNA regulation, such as RNA editing at single nucleotide level in oocytes. In future, this method can also be applied to other large cells, including neurons, large dendritic cells and large tumour cells for integrated exome and transcriptome sequencing.

  2. Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).

    PubMed

    Hoskins, Roger A; Stapleton, Mark; George, Reed A; Yu, Charles; Wan, Kenneth H; Carlson, Joseph W; Celniker, Susan E

    2005-12-02

    cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT-PCR approaches.

  3. Simultaneous non-contiguous deletions using large synthetic DNA and site-specific recombinases

    PubMed Central

    Krishnakumar, Radha; Grose, Carissa; Haft, Daniel H.; Zaveri, Jayshree; Alperovich, Nina; Gibson, Daniel G.; Merryman, Chuck; Glass, John I.

    2014-01-01

    Toward achieving rapid and large scale genome modification directly in a target organism, we have developed a new genome engineering strategy that uses a combination of bioinformatics aided design, large synthetic DNA and site-specific recombinases. Using Cre recombinase we swapped a target 126-kb segment of the Escherichia coli genome with a 72-kb synthetic DNA cassette, thereby effectively eliminating over 54 kb of genomic DNA from three non-contiguous regions in a single recombination event. We observed complete replacement of the native sequence with the modified synthetic sequence through the action of the Cre recombinase and no competition from homologous recombination. Because of the versatility and high-efficiency of the Cre-lox system, this method can be used in any organism where this system is functional as well as adapted to use with other highly precise genome engineering systems. Compared to present-day iterative approaches in genome engineering, we anticipate this method will greatly speed up the creation of reduced, modularized and optimized genomes through the integration of deletion analyses data, transcriptomics, synthetic biology and site-specific recombination. PMID:24914053

  4. Applications of species accumulation curves in large-scale biological data analysis.

    PubMed

    Deng, Chao; Daley, Timothy; Smith, Andrew D

    2015-09-01

    The species accumulation curve, or collector's curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45-63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and k -mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible.

  5. Applications of species accumulation curves in large-scale biological data analysis

    PubMed Central

    Deng, Chao; Daley, Timothy; Smith, Andrew D

    2016-01-01

    The species accumulation curve, or collector’s curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45–63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and k-mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible. PMID:27252899

  6. Static and Dynamic Properties of DNA Confined in Nanochannels

    NASA Astrophysics Data System (ADS)

    Gupta, Damini

    Next-generation sequencing (NGS) techniques have considerably reduced the cost of high-throughput DNA sequencing. However, it is challenging to detect large-scale genomic variations by NGS due to short read lengths. Genome mapping can easily detect large-scale structural variations because it operates on extremely large intact molecules of DNA with adequate resolution. One of the promising methods of genome mapping is based on confining large DNA molecules inside a nanochannel whose cross-sectional dimensions are approximately 50 nm. Even though this genome mapping technology has been commercialized, the current understanding of the polymer physics of DNA in nanochannel confinement is based on theories and lacks much needed experimental support. The results of this dissertation are aimed at providing a detailed experimental understanding of equilibrium properties of nanochannel-confined DNA molecules. The results are divided into three parts. In first part, we evaluate the role of channel shape on thermodynamic properties of channel confined DNA molecules using a combination of fluorescence microscopy and simulations. Specifically, we show that high aspect ratio of rectangular channels significantly alters the chain statistics as compared to an equivalent square channel with same cross-sectional area. In the second part, we present experimental evidence that weak excluded volume effects arise in DNA nanochannel confinement, which form the physical basis for the extended de Gennes regime. We also show how confinement spectroscopy and simulations can be combined to reduce molecular weight dispersity effects arising from shearing, photo-cleavage, and nonuniform staining of DNA. Finally, the third part of the thesis concerns the dynamic properties of nanochannel confined DNA. We directly measure the center-of-mass diffusivity of single DNA molecules in confinement and show that that it is necessary to modify the classical results of de Gennes to account for local chain stiffness of DNA in order to explain the experimental results. In the end, we believe that our findings from the experimental test of the phase diagram for channel-confined DNA, with careful control over molecular weight dispersity, channel geometry, and electrostatic interactions, will provide a firm foundation for the emerging genome mapping technology.

  7. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing

    PubMed Central

    Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis. PMID:26505622

  8. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    PubMed

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis.

  9. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    PubMed Central

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  10. Perturbations in DNA structure upon interaction with porphyrins revealed by chemical probes, DNA footprinting and molecular modelling.

    PubMed

    Ford, K G; Neidle, S

    1995-06-01

    The interactions of several porphyrins with a 74 base-pair DNA sequence have been examined by footprinting and chemical protection methods. Tetra-(4-N-methyl-(pyridyl)) porphyrin (TMPy), two of its metal complexes and tetra-(4-trimethylanilinium) porphyrin (TMAP) bind to closely similar AT-rich sequences. The three TMPy ligands produce modest changes in DNA structure and base accessibility on binding, in contrast to the large-scale conformational changes observed with TMAP. Molecular modelling studies have been performed on TMPy and TMAP bound in the AT-rich minor groove of an oligonucleotide. These have shown that significant structural change is needed to accommodate the bulky trimethyl substituent groups of TMAP, in contrast to the facile minor groove fit of TMPy.

  11. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2005-01-01

    GenBank is a comprehensive database that contains publicly available DNA sequences for more than 165,000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps to ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.

  12. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2006-01-01

    GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 205 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the Web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at www.ncbi.nlm.nih.gov.

  13. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    PubMed Central

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  14. Genome-wide map of Apn1 binding sites under oxidative stress in Saccharomyces cerevisiae.

    PubMed

    Morris, Lydia P; Conley, Andrew B; Degtyareva, Natalya; Jordan, I King; Doetsch, Paul W

    2017-11-01

    The DNA is cells is continuously exposed to reactive oxygen species resulting in toxic and mutagenic DNA damage. Although the repair of oxidative DNA damage occurs primarily through the base excision repair (BER) pathway, the nucleotide excision repair (NER) pathway processes some of the same lesions. In addition, damage tolerance mechanisms, such as recombination and translesion synthesis, enable cells to tolerate oxidative DNA damage, especially when BER and NER capacities are exceeded. Thus, disruption of BER alone or disruption of BER and NER in Saccharomyces cerevisiae leads to increased mutations as well as large-scale genomic rearrangements. Previous studies demonstrated that a particular region of chromosome II is susceptible to chronic oxidative stress-induced chromosomal rearrangements, suggesting the existence of DNA damage and/or DNA repair hotspots. Here we investigated the relationship between oxidative damage and genomic instability utilizing chromatin immunoprecipitation combined with DNA microarray technology to profile DNA repair sites along yeast chromosomes under different oxidative stress conditions. We targeted the major yeast AP endonuclease Apn1 as a representative BER protein. Our results indicate that Apn1 target sequences are enriched for cytosine and guanine nucleotides. We predict that BER protects these sites in the genome because guanines and cytosines are thought to be especially susceptible to oxidative attack, thereby preventing large-scale genome destabilization from chronic accumulation of DNA damage. Information from our studies should provide insight into how regional deployment of oxidative DNA damage management systems along chromosomes protects against large-scale rearrangements. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  15. Fluctuations in the DNA double helix

    NASA Astrophysics Data System (ADS)

    Peyrard, M.; López, S. C.; Angelov, D.

    2007-08-01

    DNA is not the static entity suggested by the famous double helix structure. It shows large fluctuational openings, in which the bases, which contain the genetic code, are temporarily open. Therefore it is an interesting system to study the effect of nonlinearity on the physical properties of a system. A simple model for DNA, at a mesoscopic scale, can be investigated by computer simulation, in the same spirit as the original work of Fermi, Pasta and Ulam. These calculations raise fundamental questions in statistical physics because they show a temporary breaking of equipartition of energy, regions with large amplitude fluctuations being able to coexist with regions where the fluctuations are very small, even when the model is studied in the canonical ensemble. This phenomenon can be related to nonlinear excitations in the model. The ability of the model to describe the actual properties of DNA is discussed by comparing theoretical and experimental results for the probability that base pairs open an a given temperature in specific DNA sequences. These studies give us indications on the proper description of the effect of the sequence in the mesoscopic model.

  16. CORALINA: a universal method for the generation of gRNA libraries for CRISPR-based screening.

    PubMed

    Köferle, Anna; Worf, Karolina; Breunig, Christopher; Baumann, Valentin; Herrero, Javier; Wiesbeck, Maximilian; Hutter, Lukas H; Götz, Magdalena; Fuchs, Christiane; Beck, Stephan; Stricker, Stefan H

    2016-11-14

    The bacterial CRISPR system is fast becoming the most popular genetic and epigenetic engineering tool due to its universal applicability and adaptability. The desire to deploy CRISPR-based methods in a large variety of species and contexts has created an urgent need for the development of easy, time- and cost-effective methods enabling large-scale screening approaches. Here we describe CORALINA (comprehensive gRNA library generation through controlled nuclease activity), a method for the generation of comprehensive gRNA libraries for CRISPR-based screens. CORALINA gRNA libraries can be derived from any source of DNA without the need of complex oligonucleotide synthesis. We show the utility of CORALINA for human and mouse genomic DNA, its reproducibility in covering the most relevant genomic features including regulatory, coding and non-coding sequences and confirm the functionality of CORALINA generated gRNAs. The simplicity and cost-effectiveness make CORALINA suitable for any experimental system. The unprecedented sequence complexities obtainable with CORALINA libraries are a necessary pre-requisite for less biased large scale genomic and epigenomic screens.

  17. Atomic-scale imaging of DNA using scanning tunnelling microscopy.

    PubMed

    Driscoll, R J; Youngquist, M G; Baldeschwieler, J D

    1990-07-19

    The scanning tunnelling microscope (STM) has been used to visualize DNA under water, under oil and in air. Images of single-stranded DNA have shown that submolecular resolution is possible. Here we describe atomic-resolution imaging of duplex DNA. Topographic STM images of uncoated duplex DNA on a graphite substrate obtained in ultra-high vacuum are presented that show double-helical structure, base pairs, and atomic-scale substructure. Experimental STM profiles show excellent correlation with atomic contours of the van der Waals surface of A-form DNA derived from X-ray crystallography. A comparison of variations in the barrier to quantum mechanical tunnelling (barrier-height) with atomic-scale topography shows correlation over the phosphate-sugar backbone but anticorrelation over the base pairs. This relationship may be due to the different chemical characteristics of parts of the molecule. Further investigation of this phenomenon should lead to a better understanding of the physics of imaging adsorbates with the STM and may prove useful in sequencing DNA. The improved resolution compared with previously published STM images of DNA may be attributable to ultra-high vacuum, high data-pixel density, slow scan rate, a fortuitously clean and sharp tip and/or a relatively dilute and extremely clean sample solution. This work demonstrates the potential of the STM for characterization of large biomolecular structures, but additional development will be required to make such high resolution imaging of DNA and other large molecules routine.

  18. Comprehensive analysis of RNA-protein interactions by high-throughput sequencing-RNA affinity profiling.

    PubMed

    Tome, Jacob M; Ozer, Abdullah; Pagano, John M; Gheba, Dan; Schroth, Gary P; Lis, John T

    2014-06-01

    RNA-protein interactions play critical roles in gene regulation, but methods to quantitatively analyze these interactions at a large scale are lacking. We have developed a high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay by adapting a high-throughput DNA sequencer to quantify the binding of fluorescently labeled protein to millions of RNAs anchored to sequenced cDNA templates. Using HiTS-RAP, we measured the affinity of mutagenized libraries of GFP-binding and NELF-E-binding aptamers to their respective targets and identified critical regions of interaction. Mutations additively affected the affinity of the NELF-E-binding aptamer, whose interaction depended mainly on a single-stranded RNA motif, but not that of the GFP aptamer, whose interaction depended primarily on secondary structure.

  19. Best practices for mapping replication origins in eukaryotic chromosomes.

    PubMed

    Besnard, Emilie; Desprat, Romain; Ryan, Michael; Kahli, Malik; Aladjem, Mirit I; Lemaitre, Jean-Marc

    2014-09-02

    Understanding the regulatory principles ensuring complete DNA replication in each cell division is critical for deciphering the mechanisms that maintain genomic stability. Recent advances in genome sequencing technology facilitated complete mapping of DNA replication sites and helped move the field from observing replication patterns at a handful of single loci to analyzing replication patterns genome-wide. These advances address issues, such as the relationship between replication initiation events, transcription, and chromatin modifications, and identify potential replication origin consensus sequences. This unit summarizes the technological and fundamental aspects of replication profiling and briefly discusses novel insights emerging from mining large datasets, published in the last 3 years, and also describes DNA replication dynamics on a whole-genome scale. Copyright © 2014 John Wiley & Sons, Inc.

  20. Filling Gaps in Biodiversity Knowledge for Macrofungi: Contributions and Assessment of an Herbarium Collection DNA Barcode Sequencing Project

    PubMed Central

    Osmundson, Todd W.; Robert, Vincent A.; Schoch, Conrad L.; Baker, Lydia J.; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M.

    2013-01-01

    Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1–2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa. PMID:23638077

  1. Filling gaps in biodiversity knowledge for macrofungi: contributions and assessment of an herbarium collection DNA barcode sequencing project.

    PubMed

    Osmundson, Todd W; Robert, Vincent A; Schoch, Conrad L; Baker, Lydia J; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M

    2013-01-01

    Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1-2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa.

  2. VIP Barcoding: composition vector-based software for rapid species identification based on DNA barcoding.

    PubMed

    Fan, Long; Hui, Jerome H L; Yu, Zu Guo; Chu, Ka Hou

    2014-07-01

    Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time-consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user-friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two-stage algorithm. First, an alignment-free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment-based K2P distance nearest-neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment-free methods and (ii) higher scalability than alignment-based distance methods and character-based methods. These results suggest that this platform is able to deal with both large-scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/. © 2014 John Wiley & Sons Ltd.

  3. Exploring bacterial epigenomics in the next-generation sequencing era: a new approach for an emerging frontier.

    PubMed

    Chen, Poyin; Jeannotte, Richard; Weimer, Bart C

    2014-05-01

    Epigenetics has an important role for the success of foodborne pathogen persistence in diverse host niches. Substantial challenges exist in determining DNA methylation to situation-specific phenotypic traits. DNA modification, mediated by restriction-modification systems, functions as an immune response against antagonistic external DNA, and bacteriophage-acquired methyltransferases (MTase) and orphan MTases - those lacking the cognate restriction endonuclease - facilitate evolution of new phenotypes via gene expression modulation via DNA and RNA modifications, including methylation and phosphorothioation. Recent establishment of large-scale genome sequencing projects will result in a significant increase in genome availability that will lead to new demands for data analysis including new predictive bioinformatics approaches that can be verified with traditional scientific rigor. Sequencing technologies that detect modification coupled with mass spectrometry to discover new adducts is a powerful tactic to study bacterial epigenetics, which is poised to make novel and far-reaching discoveries that link biological significance and the bacterial epigenome. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9.

    PubMed

    Sternberg, Samuel H; Redding, Sy; Jinek, Martin; Greene, Eric C; Doudna, Jennifer A

    2014-03-06

    The clustered regularly interspaced short palindromic repeats (CRISPR)-associated enzyme Cas9 is an RNA-guided endonuclease that uses RNA-DNA base-pairing to target foreign DNA in bacteria. Cas9-guide RNA complexes are also effective genome engineering agents in animals and plants. Here we use single-molecule and bulk biochemical experiments to determine how Cas9-RNA interrogates DNA to find specific cleavage sites. We show that both binding and cleavage of DNA by Cas9-RNA require recognition of a short trinucleotide protospacer adjacent motif (PAM). Non-target DNA binding affinity scales with PAM density, and sequences fully complementary to the guide RNA but lacking a nearby PAM are ignored by Cas9-RNA. Competition assays provide evidence that DNA strand separation and RNA-DNA heteroduplex formation initiate at the PAM and proceed directionally towards the distal end of the target sequence. Furthermore, PAM interactions trigger Cas9 catalytic activity. These results reveal how Cas9 uses PAM recognition to quickly identify potential target sites while scanning large DNA molecules, and to regulate scission of double-stranded DNA.

  5. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9

    NASA Astrophysics Data System (ADS)

    Sternberg, Samuel H.; Redding, Sy; Jinek, Martin; Greene, Eric C.; Doudna, Jennifer A.

    2014-03-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)-associated enzyme Cas9 is an RNA-guided endonuclease that uses RNA-DNA base-pairing to target foreign DNA in bacteria. Cas9-guide RNA complexes are also effective genome engineering agents in animals and plants. Here we use single-molecule and bulk biochemical experiments to determine how Cas9-RNA interrogates DNA to find specific cleavage sites. We show that both binding and cleavage of DNA by Cas9-RNA require recognition of a short trinucleotide protospacer adjacent motif (PAM). Non-target DNA binding affinity scales with PAM density, and sequences fully complementary to the guide RNA but lacking a nearby PAM are ignored by Cas9-RNA. Competition assays provide evidence that DNA strand separation and RNA-DNA heteroduplex formation initiate at the PAM and proceed directionally towards the distal end of the target sequence. Furthermore, PAM interactions trigger Cas9 catalytic activity. These results reveal how Cas9 uses PAM recognition to quickly identify potential target sites while scanning large DNA molecules, and to regulate scission of double-stranded DNA.

  6. Using complementary DNA from MyoD-transduced fibroblasts to sequence large muscle genes.

    PubMed

    Waddell, Leigh B; Monnier, Nicole; Cooper, Sandra T; North, Kathryn N; Clarke, Nigel F

    2011-08-01

    Large muscle genes are often sequenced using complementary DNA (cDNA) made from muscle messenger RNA (mRNA) to reduce the cost and workload associated with sequencing from genomic DNA. Two potential barriers are the availability of a frozen muscle biopsy, and difficulties in detecting nonsense mutations due to nonsense-mediated mRNA decay (NMD). We present patient examples showing that use of MyoD-transduced fibroblasts as a source of muscle-specific mRNA overcomes these potential difficulties in sequencing large muscle-related genes. Copyright © 2011 Wiley Periodicals, Inc.

  7. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas

    PubMed Central

    Hou, Yu; Guo, Huahu; Cao, Chen; Li, Xianlong; Hu, Boqiang; Zhu, Ping; Wu, Xinglong; Wen, Lu; Tang, Fuchou; Huang, Yanyi; Peng, Jirun

    2016-01-01

    Single-cell genome, DNA methylome, and transcriptome sequencing methods have been separately developed. However, to accurately analyze the mechanism by which transcriptome, genome and DNA methylome regulate each other, these omic methods need to be performed in the same single cell. Here we demonstrate a single-cell triple omics sequencing technique, scTrio-seq, that can be used to simultaneously analyze the genomic copy-number variations (CNVs), DNA methylome, and transcriptome of an individual mammalian cell. We show that large-scale CNVs cause proportional changes in RNA expression of genes within the gained or lost genomic regions, whereas these CNVs generally do not affect DNA methylation in these regions. Furthermore, we applied scTrio-seq to 25 single cancer cells derived from a human hepatocellular carcinoma tissue sample. We identified two subpopulations within these cells based on CNVs, DNA methylome, or transcriptome of individual cells. Our work offers a new avenue of dissecting the complex contribution of genomic and epigenomic heterogeneities to the transcriptomic heterogeneity within a population of cells. PMID:26902283

  8. GenBank

    PubMed Central

    Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.

    2007-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (). PMID:17202161

  9. ExprAlign - the identification of ESTs in non-model species by alignment of cDNA microarray expression profiles

    PubMed Central

    2009-01-01

    Background Sequence identification of ESTs from non-model species offers distinct challenges particularly when these species have duplicated genomes and when they are phylogenetically distant from sequenced model organisms. For the common carp, an environmental model of aquacultural interest, large numbers of ESTs remained unidentified using BLAST sequence alignment. We have used the expression profiles from large-scale microarray experiments to suggest gene identities. Results Expression profiles from ~700 cDNA microarrays describing responses of 7 major tissues to multiple environmental stressors were used to define a co-expression landscape. This was based on the Pearsons correlation coefficient relating each gene with all other genes, from which a network description provided clusters of highly correlated genes as 'mountains'. We show that these contain genes with known identities and genes with unknown identities, and that the correlation constitutes evidence of identity in the latter. This procedure has suggested identities to 522 of 2701 unknown carp ESTs sequences. We also discriminate several common carp genes and gene isoforms that were not discriminated by BLAST sequence alignment alone. Precision in identification was substantially improved by use of data from multiple tissues and treatments. Conclusion The detailed analysis of co-expression landscapes is a sensitive technique for suggesting an identity for the large number of BLAST unidentified cDNAs generated in EST projects. It is capable of detecting even subtle changes in expression profiles, and thereby of distinguishing genes with a common BLAST identity into different identities. It benefits from the use of multiple treatments or contrasts, and from the large-scale microarray data. PMID:19939286

  10. Application of DNA barcodes in wildlife conservation in Tropical East Asia.

    PubMed

    Wilson, John-James; Sing, Kong-Wah; Lee, Ping-Shin; Wee, Alison K S

    2016-10-01

    Over the past 50 years, Tropical East Asia has lost more biodiversity than any tropical region. Tropical East Asia is a megadiverse region with an acute taxonomic impediment. DNA barcodes are short standardized DNA sequences used for taxonomic purposes and have the potential to lessen the challenges of biodiversity inventory and assessments in regions where they are most needed. We reviewed DNA barcoding efforts in Tropical East Asia relative to other tropical regions. We suggest DNA barcodes (or metabarcodes from next-generation sequencers) may be especially useful for characterizing and connecting species-level biodiversity units in inventories encompassing taxa lacking formal description (particularly arthropods) and in large-scale, minimal-impact approaches to vertebrate monitoring and population assessments through secondary sources of DNA (invertebrate derived DNA and environmental DNA). We suggest interest and capacity for DNA barcoding are slowly growing in Tropical East Asia, particularly among the younger generation of researchers who can connect with the barcoding analogy and understand the need for new approaches to the conservation challenges being faced. © 2016 Society for Conservation Biology.

  11. A versatile phenotyping system and analytics platform reveals diverse temporal responses to water availability in Setaria

    USDA-ARS?s Scientific Manuscript database

    With rapid advances in DNA sequencing, phenotyping has become the rate-limiting step in using large-scale genomic data to understand and improve agricultural crops. Here, the Bellwether Phenotyping platform for controlled-environment plant growth and automated, multimodal phenotyping is described. T...

  12. Wolbachia and DNA barcoding insects: patterns, potential, and problems.

    PubMed

    Smith, M Alex; Bertrand, Claudia; Crosby, Kate; Eveleigh, Eldon S; Fernandez-Triana, Jose; Fisher, Brian L; Gibbs, Jason; Hajibabaei, Mehrdad; Hallwachs, Winnie; Hind, Katharine; Hrcek, Jan; Huang, Da-Wei; Janda, Milan; Janzen, Daniel H; Li, Yanwei; Miller, Scott E; Packer, Laurence; Quicke, Donald; Ratnasingham, Sujeevan; Rodriguez, Josephine; Rougerie, Rodolphe; Shaw, Mark R; Sheffield, Cory; Stahlhut, Julie K; Steinke, Dirk; Whitfield, James; Wood, Monty; Zhou, Xin

    2012-01-01

    Wolbachia is a genus of bacterial endosymbionts that impacts the breeding systems of their hosts. Wolbachia can confuse the patterns of mitochondrial variation, including DNA barcodes, because it influences the pathways through which mitochondria are inherited. We examined the extent to which these endosymbionts are detected in routine DNA barcoding, assessed their impact upon the insect sequence divergence and identification accuracy, and considered the variation present in Wolbachia COI. Using both standard PCR assays (Wolbachia surface coding protein--wsp), and bacterial COI fragments we found evidence of Wolbachia in insect total genomic extracts created for DNA barcoding library construction. When >2 million insect COI trace files were examined on the Barcode of Life Datasystem (BOLD) Wolbachia COI was present in 0.16% of the cases. It is possible to generate Wolbachia COI using standard insect primers; however, that amplicon was never confused with the COI of the host. Wolbachia alleles recovered were predominantly Supergroup A and were broadly distributed geographically and phylogenetically. We conclude that the presence of the Wolbachia DNA in total genomic extracts made from insects is unlikely to compromise the accuracy of the DNA barcode library; in fact, the ability to query this DNA library (the database and the extracts) for endosymbionts is one of the ancillary benefits of such a large scale endeavor--which we provide several examples. It is our conclusion that regular assays for Wolbachia presence and type can, and should, be adopted by large scale insect barcoding initiatives. While COI is one of the five multi-locus sequence typing (MLST) genes used for categorizing Wolbachia, there is limited overlap with the eukaryotic DNA barcode region.

  13. Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.

    PubMed

    Wu, Tiee-Jian; Huang, Ying-Hsueh; Li, Lung-An

    2005-11-15

    Several measures of DNA sequence dissimilarity have been developed. The purpose of this paper is 3-fold. Firstly, we compare the performance of several word-based or alignment-based methods. Secondly, we give a general guideline for choosing the window size and determining the optimal word sizes for several word-based measures at different window sizes. Thirdly, we use a large-scale simulation method to simulate data from the distribution of SK-LD (symmetric Kullback-Leibler discrepancy). These simulated data can be used to estimate the degree of dissimilarity beta between any pair of DNA sequences. Our study shows (1) for whole sequence similiarity/dissimilarity identification the window size taken should be as large as possible, but probably not >3000, as restricted by CPU time in practice, (2) for each measure the optimal word size increases with window size, (3) when the optimal word size is used, SK-LD performance is superior in both simulation and real data analysis, (4) the estimate beta of beta based on SK-LD can be used to filter out quickly a large number of dissimilar sequences and speed alignment-based database search for similar sequences and (5) beta is also applicable in local similarity comparison situations. For example, it can help in selecting oligo probes with high specificity and, therefore, has potential in probe design for microarrays. The algorithm SK-LD, estimate beta and simulation software are implemented in MATLAB code, and are available at http://www.stat.ncku.edu.tw/tjwu

  14. DNA sequence chromatogram browsing using JAVA and CORBA.

    PubMed

    Parsons, J D; Buehler, E; Hillier, L

    1999-03-01

    DNA sequence chromatograms (traces) are the primary data source for all large-scale genomic and expressed sequence tags (ESTs) sequencing projects. Access to the sequencing trace assists many later analyses, for example contig assembly and polymorphism detection, but obtaining and using traces is problematic. Traces are not collected and published centrally, they are much larger than the base calls derived from them, and viewing them requires the interactivity of a local graphical client with local data. To provide efficient global access to DNA traces, we developed a client/server system based on flexible Java components integrated into other applications including an applet for use in a WWW browser and a stand-alone trace viewer. Client/server interaction is facilitated by CORBA middleware which provides a well-defined interface, a naming service, and location independence. [The software is packaged as a Jar file available from the following URL: http://www.ebi.ac.uk/jparsons. Links to working examples of the trace viewers can be found at http://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.

  15. Using GenBank.

    PubMed

    Wheeler, David

    2007-01-01

    GenBank(R) is a comprehensive database of publicly available DNA sequences for more than 205,000 named organisms and for more than 60,000 within the embryophyta, obtained through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Daily data exchange with the European Molecular Biology Laboratory (EMBL) in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases with taxonomy, genome, mapping, protein structure, and domain information and the biomedical journal literature through PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available through FTP. GenBank usage scenarios ranging from local analyses of the data available through FTP to online analyses supported by the NCBI Web-based tools are discussed. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.

  16. Unlinking the methylome pattern from nucleotide sequence, revealed by large-scale in vivo genome engineering and methylome editing in medaka fish

    PubMed Central

    Nakamura, Ryohei; Uno, Ayako; Kumagai, Masahiko; Fukushima, Hiroto S.; Morishita, Shinichi; Takeda, Hiroyuki

    2017-01-01

    The heavily methylated vertebrate genomes are punctuated by stretches of poorly methylated DNA sequences that usually mark gene regulatory regions. It is known that the methylation state of these regions confers transcriptional control over their associated genes. Given its governance on the transcriptome, cellular functions and identity, genome-wide DNA methylation pattern is tightly regulated and evidently predefined. However, how is the methylation pattern determined in vivo remains enigmatic. Based on in silico and in vitro evidence, recent studies proposed that the regional hypomethylated state is primarily determined by local DNA sequence, e.g., high CpG density and presence of specific transcription factor binding sites. Nonetheless, the dependency of DNA methylation on nucleotide sequence has not been carefully validated in vertebrates in vivo. Herein, with the use of medaka (Oryzias latipes) as a model, the sequence dependency of DNA methylation was intensively tested in vivo. Our statistical modeling confirmed the strong statistical association between nucleotide sequence pattern and methylation state in the medaka genome. However, by manipulating the methylation state of a number of genomic sequences and reintegrating them into medaka embryos, we demonstrated that artificially conferred DNA methylation states were predominantly and robustly maintained in vivo, regardless of their sequences and endogenous states. This feature was also observed in the medaka transgene that had passed across generations. Thus, despite the observed statistical association, nucleotide sequence was unable to autonomously determine its own methylation state in medaka in vivo. Our results apparently argue against the notion of the governance on the DNA methylation by nucleotide sequence, but instead suggest the involvement of other epigenetic factors in defining and maintaining the DNA methylation landscape. Further investigation in other vertebrate models in vivo will be needed for the generalization of our observations made in medaka. PMID:29267279

  17. Fractal landscape analysis of DNA walks

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-01-01

    By mapping nucleotide sequences onto a "DNA walk", we uncovered remarkably long-range power law correlations [Nature 356 (1992) 168] that imply a new scale invariant property of DNA. We found such long-range correlations in intron-containing genes and in non-transcribed regulatory DNA sequences, but not in cDNA sequences or intron-less genes. In this paper, we present more explicit evidences to support our findings.

  18. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

    PubMed

    Kwok, Hin; Chiang, Alan Kwok Shing

    2016-02-24

    Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  19. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika

    2010-01-27

    Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set ofmore » tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.« less

  20. GrigoraSNPs: Optimized Analysis of SNPs for DNA Forensics.

    PubMed

    Ricke, Darrell O; Shcherbina, Anna; Michaleas, Adam; Fremont-Smith, Philip

    2018-04-16

    High-throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) enables additional DNA forensic capabilities not attainable using traditional STR panels. However, the inclusion of sets of loci selected for mixture analysis, extended kinship, phenotype, biogeographic ancestry prediction, etc., can result in large panel sizes that are difficult to analyze in a rapid fashion. GrigoraSNP was developed to address the allele-calling bottleneck that was encountered when analyzing SNP panels with more than 5000 loci using HTS. GrigoraSNPs uses a MapReduce parallel data processing on multiple computational threads plus a novel locus-identification hashing strategy leveraging target sequence tags. This tool optimizes the SNP calling module of the DNA analysis pipeline with runtimes that scale linearly with the number of HTS reads. Results are compared with SNP analysis pipelines implemented with SAMtools and GATK. GrigoraSNPs removes a computational bottleneck for processing forensic samples with large HTS SNP panels. Published 2018. This article is a U.S. Government work and is in the public domain in the USA.

  1. Large-Scale Low-Cost NGS Library Preparation Using a Robust Tn5 Purification and Tagmentation Protocol

    PubMed Central

    Hennig, Bianca P.; Velten, Lars; Racke, Ines; Tu, Chelsea Szu; Thoms, Matthias; Rybin, Vladimir; Besir, Hüseyin; Remans, Kim; Steinmetz, Lars M.

    2017-01-01

    Efficient preparation of high-quality sequencing libraries that well represent the biological sample is a key step for using next-generation sequencing in research. Tn5 enables fast, robust, and highly efficient processing of limited input material while scaling to the parallel processing of hundreds of samples. Here, we present a robust Tn5 transposase purification strategy based on an N-terminal His6-Sumo3 tag. We demonstrate that libraries prepared with our in-house Tn5 are of the same quality as those processed with a commercially available kit (Nextera XT), while they dramatically reduce the cost of large-scale experiments. We introduce improved purification strategies for two versions of the Tn5 enzyme. The first version carries the previously reported point mutations E54K and L372P, and stably produces libraries of constant fragment size distribution, even if the Tn5-to-input molecule ratio varies. The second Tn5 construct carries an additional point mutation (R27S) in the DNA-binding domain. This construct allows for adjustment of the fragment size distribution based on enzyme concentration during tagmentation, a feature that opens new opportunities for use of Tn5 in customized experimental designs. We demonstrate the versatility of our Tn5 enzymes in different experimental settings, including a novel single-cell polyadenylation site mapping protocol as well as ultralow input DNA sequencing. PMID:29118030

  2. Analysis of DNA Sequences by an Optical Time-Integrating Correlator: Proposal

    DTIC Science & Technology

    1991-11-01

    OF THE PROBLEM AND CURRENT TECHNOLOGY 2 3.0 TIME-INTEGRATING CORRELATOR 2 4.0 REPRESENTATIONS OF THE DNA BASES 8 5.0 DNA ANALYSIS STRATEGY 8 6.0... DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 9 Figure 5: The flow of data in a DNA analysis system based on an...logarithmic scale and a linear scale. 15 x LIST OF TABLES PAGE Table 1: Short representations of the DNA bases where each base is represented by 7-bits

  3. Scaling in nature: From DNA through heartbeats to weather

    NASA Astrophysics Data System (ADS)

    Havlin, S.; Buldyrev, S. V.; Bunde, A.; Goldberger, A. L.; Ivanov, P. Ch.; Peng, C.-K.; Stanley, H. E.

    1999-12-01

    The purpose of this talk is to describe some recent progress in applying scaling concepts to various systems in nature. We review several systems characterized by scaling laws such as DNA sequences, heartbeat rates and weather variations. We discuss the finding that the exponent α quantifying the scaling in DNA in smaller for coding than for noncoding sequences. We also discuss the application of fractal scaling analysis to the dynamics of heartbeat regulation, and report the recent finding that the scaling exponent α is smaller during sleep periods compared to wake periods. We also discuss the recent findings that suggest a universal scaling exponent characterizing the weather fluctuations.

  4. Scaling in nature: from DNA through heartbeats to weather

    NASA Technical Reports Server (NTRS)

    Havlin, S.; Buldyrev, S. V.; Bunde, A.; Goldberger, A. L.; Peng, C. K.; Stanley, H. E.

    1999-01-01

    The purpose of this report is to describe some recent progress in applying scaling concepts to various systems in nature. We review several systems characterized by scaling laws such as DNA sequences, heartbeat rates and weather variations. We discuss the finding that the exponent alpha quantifying the scaling in DNA in smaller for coding than for noncoding sequences. We also discuss the application of fractal scaling analysis to the dynamics of heartbeat regulation, and report the recent finding that the scaling exponent alpha is smaller during sleep periods compared to wake periods. We also discuss the recent findings that suggest a universal scaling exponent characterizing the weather fluctuations.

  5. Micronuclear DNA of Oxytricha nova contains sequences with autonomously replicating activity in Saccharomyces cerevisiae.

    PubMed Central

    Colombo, M M; Swanton, M T; Donini, P; Prescott, D M

    1984-01-01

    Oxytricha nova is a hypotrichous ciliate with micronuclei and macronuclei. Micronuclei, which contain large, chromosomal-sized DNA, are genetically inert but undergo meiosis and exchange during cell mating. Macronuclei, which contain only small, gene-sized DNA molecules, provide all of the nuclear RNA needed to run the cell. After cell mating the macronucleus is derived from a micronucleus, a derivation that includes excision of the genes from chromosomes and elimination of the remaining DNA. The eliminated DNA includes all of the repetitious sequences and approximately 95% of the unique sequences. We cloned large restriction fragments from the micronucleus that confer replication ability on a replication-deficient plasmid in Saccharomyces cerevisiae. Sequences that confer replication ability are called autonomously replicating sequences. The frequency and effectiveness of autonomously replicating sequences in micronuclear DNA are similar to those reported for DNAs of other organisms introduced into yeast cells. Of the 12 micronuclear fragments with autonomously replicating sequence activity, 9 also showed homology to macronuclear DNA, indicating that they contain a macronuclear gene sequence. We conclude from this that autonomously replicating sequence activity is nonrandomly distributed throughout micronuclear DNA and is preferentially associated with those regions of micronuclear DNA that contain genes. Images PMID:6092934

  6. Design of DNA pooling to allow incorporation of covariates in rare variants analysis.

    PubMed

    Guan, Weihua; Li, Chun

    2014-01-01

    Rapid advances in next-generation sequencing technologies facilitate genetic association studies of an increasingly wide array of rare variants. To capture the rare or less common variants, a large number of individuals will be needed. However, the cost of a large scale study using whole genome or exome sequencing is still high. DNA pooling can serve as a cost-effective approach, but with a potential limitation that the identity of individual genomes would be lost and therefore individual characteristics and environmental factors could not be adjusted in association analysis, which may result in power loss and a biased estimate of genetic effect. For case-control studies, we propose a design strategy for pool creation and an analysis strategy that allows covariate adjustment, using multiple imputation technique. Simulations show that our approach can obtain reasonable estimate for genotypic effect with only slight loss of power compared to the much more expensive approach of sequencing individual genomes. Our design and analysis strategies enable more powerful and cost-effective sequencing studies of complex diseases, while allowing incorporation of covariate adjustment.

  7. Quantitative characterization of conformational-specific protein-DNA binding using a dual-spectral interferometric imaging biosensor

    NASA Astrophysics Data System (ADS)

    Zhang, Xirui; Daaboul, George G.; Spuhler, Philipp S.; Dröge, Peter; Ünlü, M. Selim

    2016-03-01

    DNA-binding proteins play crucial roles in the maintenance and functions of the genome and yet, their specific binding mechanisms are not fully understood. Recently, it was discovered that DNA-binding proteins recognize specific binding sites to carry out their functions through an indirect readout mechanism by recognizing and capturing DNA conformational flexibility and deformation. High-throughput DNA microarray-based methods that provide large-scale protein-DNA binding information have shown effective and comprehensive analysis of protein-DNA binding affinities, but do not provide information of DNA conformational changes in specific protein-DNA complexes. Building on the high-throughput capability of DNA microarrays, we demonstrate a quantitative approach that simultaneously measures the amount of protein binding to DNA and nanometer-scale DNA conformational change induced by protein binding in a microarray format. Both measurements rely on spectral interferometry on a layered substrate using a single optical instrument in two distinct modalities. In the first modality, we quantitate the amount of binding of protein to surface-immobilized DNA in each DNA spot using a label-free spectral reflectivity technique that accurately measures the surface densities of protein and DNA accumulated on the substrate. In the second modality, for each DNA spot, we simultaneously measure DNA conformational change using a fluorescence vertical sectioning technique that determines average axial height of fluorophores tagged to specific nucleotides of the surface-immobilized DNA. The approach presented in this paper, when combined with current high-throughput DNA microarray-based technologies, has the potential to serve as a rapid and simple method for quantitative and large-scale characterization of conformational specific protein-DNA interactions.DNA-binding proteins play crucial roles in the maintenance and functions of the genome and yet, their specific binding mechanisms are not fully understood. Recently, it was discovered that DNA-binding proteins recognize specific binding sites to carry out their functions through an indirect readout mechanism by recognizing and capturing DNA conformational flexibility and deformation. High-throughput DNA microarray-based methods that provide large-scale protein-DNA binding information have shown effective and comprehensive analysis of protein-DNA binding affinities, but do not provide information of DNA conformational changes in specific protein-DNA complexes. Building on the high-throughput capability of DNA microarrays, we demonstrate a quantitative approach that simultaneously measures the amount of protein binding to DNA and nanometer-scale DNA conformational change induced by protein binding in a microarray format. Both measurements rely on spectral interferometry on a layered substrate using a single optical instrument in two distinct modalities. In the first modality, we quantitate the amount of binding of protein to surface-immobilized DNA in each DNA spot using a label-free spectral reflectivity technique that accurately measures the surface densities of protein and DNA accumulated on the substrate. In the second modality, for each DNA spot, we simultaneously measure DNA conformational change using a fluorescence vertical sectioning technique that determines average axial height of fluorophores tagged to specific nucleotides of the surface-immobilized DNA. The approach presented in this paper, when combined with current high-throughput DNA microarray-based technologies, has the potential to serve as a rapid and simple method for quantitative and large-scale characterization of conformational specific protein-DNA interactions. Electronic supplementary information (ESI) available: DNA sequences and nomenclature (Table 1S); SDS-PAGE assay of IHF stock solution (Fig. 1S); determination of the concentration of IHF stock solution by Bradford assay (Fig. 2S); equilibrium binding isotherm fitting results of other DNA sequences (Table 2S); calculation of dissociation constants (Fig. 3S, 4S; Table 2S); geometric model for quantitation of DNA bending angle induced by specific IHF binding (Fig. 4S); customized flow cell assembly (Fig. 5S); real-time measurement of average fluorophore height change by SSFM (Fig. 6S); summary of binding parameters obtained from additive isotherm model fitting (Table 3S); average surface densities of 10 dsDNA spots and bound IHF at equilibrium (Table 4S); effects of surface densities on the binding and bending of dsDNA (Tables 5S, 6S and Fig. 7S-10S). See DOI: 10.1039/c5nr06785e

  8. Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.

    PubMed

    Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin

    2017-11-15

    An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  9. Highly parallel single-molecule amplification approach based on agarose droplet polymerase chain reaction for efficient and cost-effective aptamer selection.

    PubMed

    Zhang, Wei Yun; Zhang, Wenhua; Liu, Zhiyuan; Li, Cong; Zhu, Zhi; Yang, Chaoyong James

    2012-01-03

    We have developed a novel method for efficiently screening affinity ligands (aptamers) from a complex single-stranded DNA (ssDNA) library by employing single-molecule emulsion polymerase chain reaction (PCR) based on the agarose droplet microfluidic technology. In a typical systematic evolution of ligands by exponential enrichment (SELEX) process, the enriched library is sequenced first, and tens to hundreds of aptamer candidates are analyzed via a bioinformatic approach. Possible candidates are then chemically synthesized, and their binding affinities are measured individually. Such a process is time-consuming, labor-intensive, inefficient, and expensive. To address these problems, we have developed a highly efficient single-molecule approach for aptamer screening using our agarose droplet microfluidic technology. Statistically diluted ssDNA of the pre-enriched library evolved through conventional SELEX against cancer biomarker Shp2 protein was encapsulated into individual uniform agarose droplets for droplet PCR to generate clonal agarose beads. The binding capacity of amplified ssDNA from each clonal bead was then screened via high-throughput fluorescence cytometry. DNA clones with high binding capacity and low K(d) were chosen as the aptamer and can be directly used for downstream biomedical applications. We have identified an ssDNA aptamer that selectively recognizes Shp2 with a K(d) of 24.9 nM. Compared to a conventional sequencing-chemical synthesis-screening work flow, our approach avoids large-scale DNA sequencing and expensive, time-consuming DNA synthesis of large populations of DNA candidates. The agarose droplet microfluidic approach is thus highly efficient and cost-effective for molecular evolution approaches and will find wide application in molecular evolution technologies, including mRNA display, phage display, and so on. © 2011 American Chemical Society

  10. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2008-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

  11. GenBank

    PubMed Central

    Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.

    2008-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov PMID:18073190

  12. A programming language for composable DNA circuits

    PubMed Central

    Phillips, Andrew; Cardelli, Luca

    2009-01-01

    Recently, a range of information-processing circuits have been implemented in DNA by using strand displacement as their main computational mechanism. Examples include digital logic circuits and catalytic signal amplification circuits that function as efficient molecular detectors. As new paradigms for DNA computation emerge, the development of corresponding languages and tools for these paradigms will help to facilitate the design of DNA circuits and their automatic compilation to nucleotide sequences. We present a programming language for designing and simulating DNA circuits in which strand displacement is the main computational mechanism. The language includes basic elements of sequence domains, toeholds and branch migration, and assumes that strands do not possess any secondary structure. The language is used to model and simulate a variety of circuits, including an entropy-driven catalytic gate, a simple gate motif for synthesizing large-scale circuits and a scheme for implementing an arbitrary system of chemical reactions. The language is a first step towards the design of modelling and simulation tools for DNA strand displacement, which complements the emergence of novel implementation strategies for DNA computing. PMID:19535415

  13. Diversification of transcription factor-DNA interactions and the evolution of gene regulatory networks.

    PubMed

    Rogers, Julia M; Bulyk, Martha L

    2018-04-25

    Sequence-specific transcription factors (TFs) bind short DNA sequences in the genome to regulate the expression of target genes. In the last decade, numerous technical advances have enabled the determination of the DNA-binding specificities of many of these factors. Large-scale screens of many TFs enabled the creation of databases of TF DNA-binding specificities, typically represented as position weight matrices (PWMs). Although great progress has been made in determining and predicting binding specificities systematically, there are still many surprises to be found when studying a particular TF's interactions with DNA in detail. Paralogous TFs' binding specificities can differ in subtle ways, in a manner that is not immediately apparent from looking at their PWMs. These differences affect gene regulatory outputs and enable TFs to rewire transcriptional networks over evolutionary time. This review discusses recent observations made in the study of TF-DNA interactions that highlight the importance of continued in-depth analysis of TF-DNA interactions and their inherent complexity. This article is categorized under: Biological Mechanisms > Regulatory Biology. © 2018 Wiley Periodicals, Inc.

  14. A programming language for composable DNA circuits.

    PubMed

    Phillips, Andrew; Cardelli, Luca

    2009-08-06

    Recently, a range of information-processing circuits have been implemented in DNA by using strand displacement as their main computational mechanism. Examples include digital logic circuits and catalytic signal amplification circuits that function as efficient molecular detectors. As new paradigms for DNA computation emerge, the development of corresponding languages and tools for these paradigms will help to facilitate the design of DNA circuits and their automatic compilation to nucleotide sequences. We present a programming language for designing and simulating DNA circuits in which strand displacement is the main computational mechanism. The language includes basic elements of sequence domains, toeholds and branch migration, and assumes that strands do not possess any secondary structure. The language is used to model and simulate a variety of circuits, including an entropy-driven catalytic gate, a simple gate motif for synthesizing large-scale circuits and a scheme for implementing an arbitrary system of chemical reactions. The language is a first step towards the design of modelling and simulation tools for DNA strand displacement, which complements the emergence of novel implementation strategies for DNA computing.

  15. Evolutionary dynamics of selfish DNA explains the abundance distribution of genomic subsequences

    PubMed Central

    Sheinman, Michael; Ramisch, Anna; Massip, Florian; Arndt, Peter F.

    2016-01-01

    Since the sequencing of large genomes, many statistical features of their sequences have been found. One intriguing feature is that certain subsequences are much more abundant than others. In fact, abundances of subsequences of a given length are distributed with a scale-free power-law tail, resembling properties of human texts, such as Zipf’s law. Despite recent efforts, the understanding of this phenomenon is still lacking. Here we find that selfish DNA elements, such as those belonging to the Alu family of repeats, dominate the power-law tail. Interestingly, for the Alu elements the power-law exponent increases with the length of the considered subsequences. Motivated by these observations, we develop a model of selfish DNA expansion. The predictions of this model qualitatively and quantitatively agree with the empirical observations. This allows us to estimate parameters for the process of selfish DNA spreading in a genome during its evolution. The obtained results shed light on how evolution of selfish DNA elements shapes non-trivial statistical properties of genomes. PMID:27488939

  16. A DNA methylation map of human cancer at single base-pair resolution.

    PubMed

    Vidal, E; Sayols, S; Moran, S; Guillaumet-Adkins, A; Schroeder, M P; Royo, R; Orozco, M; Gut, M; Gut, I; Lopez-Bigas, N; Heyn, H; Esteller, M

    2017-10-05

    Although single base-pair resolution DNA methylation landscapes for embryonic and different somatic cell types provided important insights into epigenetic dynamics and cell-type specificity, such comprehensive profiling is incomplete across human cancer types. This prompted us to perform genome-wide DNA methylation profiling of 22 samples derived from normal tissues and associated neoplasms, including primary tumors and cancer cell lines. Unlike their invariant normal counterparts, cancer samples exhibited highly variable CpG methylation levels in a large proportion of the genome, involving progressive changes during tumor evolution. The whole-genome sequencing results from selected samples were replicated in a large cohort of 1112 primary tumors of various cancer types using genome-scale DNA methylation analysis. Specifically, we determined DNA hypermethylation of promoters and enhancers regulating tumor-suppressor genes, with potential cancer-driving effects. DNA hypermethylation events showed evidence of positive selection, mutual exclusivity and tissue specificity, suggesting their active participation in neoplastic transformation. Our data highlight the extensive changes in DNA methylation that occur in cancer onset, progression and dissemination.

  17. Extracting DNA from FFPE Tissue Biospecimens Using User-Friendly Automated Technology: Is There an Impact on Yield or Quality?

    PubMed

    Mathieson, William; Guljar, Nafia; Sanchez, Ignacio; Sroya, Manveer; Thomas, Gerry A

    2018-05-03

    DNA extracted from formalin-fixed, paraffin-embedded (FFPE) tissue blocks is amenable to analytical techniques, including sequencing. DNA extraction protocols are typically long and complex, often involving an overnight proteinase K digest. Automated platforms that shorten and simplify the process are therefore an attractive proposition for users wanting a faster turn-around or to process large numbers of biospecimens. It is, however, unclear whether automated extraction systems return poorer DNA yields or quality than manual extractions performed by experienced technicians. We extracted DNA from 42 FFPE clinical tissue biospecimens using the QiaCube (Qiagen) and ExScale (ExScale Biospecimen Solutions) automated platforms, comparing DNA yields and integrities with those from manual extractions. The QIAamp DNA FFPE Spin Column Kit was used for manual and QiaCube DNA extractions and the ExScale extractions were performed using two of the manufacturer's magnetic bead kits: one extracting DNA only and the other simultaneously extracting DNA and RNA. In all automated extraction methods, DNA yields and integrities (assayed using DNA Integrity Numbers from a 4200 TapeStation and the qPCR-based Illumina FFPE QC Assay) were poorer than in the manual method, with the QiaCube system performing better than the ExScale system. However, ExScale was fastest, offered the highest reproducibility when extracting DNA only, and required the least intervention or technician experience. Thus, the extraction methods have different strengths and weaknesses, would appeal to different users with different requirements, and therefore, we cannot recommend one method over another.

  18. Palaeoproteomics for human evolution studies

    NASA Astrophysics Data System (ADS)

    Welker, Frido

    2018-06-01

    The commonplace sequencing of Neanderthal, Denisovan and ancient modern human DNA continues to revolutionize our understanding of hominin phylogeny and interaction(s). The challenge with older fossils is that the progressive fragmentation of DNA even under optimal conditions, a function of time and temperature, results in ever shorter fragments of DNA. This process continues until no DNA can be sequenced or reliably aligned. Ancient proteins ultimately suffer a similar fate, but are a potential alternative source of biomolecular sequence data to investigate hominin phylogeny given their slower rate of fragmentation. In addition, ancient proteins have been proposed to potentially provide insights into in vivo biological processes and can be used to provide additional ecological information through large scale ZooMS (Zooarchaeology by Mass Spectrometry) screening of unidentifiable bone fragments. However, as initially with ancient DNA, most ancient protein research has focused on Late Pleistocene or Holocene samples from Europe. In addition, only a limited number of studies on hominin remains have been published. Here, an updated review on ancient protein analysis in human evolutionary contexts is given, including the identification of specific knowledge gaps and existing analytical limits, as well as potential avenues to overcome these.

  19. Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases

    PubMed Central

    Schadt, Eric E.; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H.; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A.; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew

    2013-01-01

    Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types. PMID:23093720

  20. Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases.

    PubMed

    Schadt, Eric E; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew

    2013-01-01

    Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.

  1. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2007-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (www.ncbi.nlm.nih.gov).

  2. ITS1: a DNA barcode better than ITS2 in eukaryotes?

    PubMed

    Wang, Xin-Cun; Liu, Chang; Huang, Liang; Bengtsson-Palme, Johan; Chen, Haimei; Zhang, Jian-Hui; Cai, Dayong; Li, Jian-Qin

    2015-05-01

    A DNA barcode is a short piece of DNA sequence used for species determination and discovery. The internal transcribed spacer (ITS/ITS2) region has been proposed as the standard DNA barcode for fungi and seed plants and has been widely used in DNA barcoding analyses for other biological groups, for example algae, protists and animals. The ITS region consists of both ITS1 and ITS2 regions. Here, a large-scale meta-analysis was carried out to compare ITS1 and ITS2 from three aspects: PCR amplification, DNA sequencing and species discrimination, in terms of the presence of DNA barcoding gaps, species discrimination efficiency, sequence length distribution, GC content distribution and primer universality. In total, 85 345 sequence pairs in 10 major groups of eukaryotes, including ascomycetes, basidiomycetes, liverworts, mosses, ferns, gymnosperms, monocotyledons, eudicotyledons, insects and fishes, covering 611 families, 3694 genera, and 19 060 species, were analysed. Using similarity-based methods, we calculated species discrimination efficiencies for ITS1 and ITS2 in all major groups, families and genera. Using Fisher's exact test, we found that ITS1 has significantly higher efficiencies than ITS2 in 17 of the 47 families and 20 of the 49 genera, which are sample-rich. By in silico PCR amplification evaluation, primer universality of the extensively applied ITS1 primers was found superior to that of ITS2 primers. Additionally, shorter length of amplification product and lower GC content was discovered to be two other advantages of ITS1 for sequencing. In summary, ITS1 represents a better DNA barcode than ITS2 for eukaryotic species. © 2014 John Wiley & Sons Ltd.

  3. An improved DNA force field for ssDNA interactions with gold nanoparticles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, Xiankai; Huai, Ping; Fan, Chunhai

    The widespread applications of single-stranded DNA (ssDNA) conjugated gold nanoparticles (AuNPs) have spurred an increasing interest in the interactions between ssDNA and AuNPs. Despite extensive studies using the most sophisticated experimental techniques, the detailed molecular mechanisms still remain largely unknown. Large scale molecular dynamics (MD) simulations can thus be used to supplement experiments by providing complementary information about ssDNA-AuNP interactions. However, up to now, all modern force fields for DNA were developed based on the properties of double-stranded DNA (dsDNA) molecules, which have hydrophilic outer backbones “protecting” hydrophobic inner nucleobases from water. Without the double-helix structure of dsDNA and thusmore » the “protection” by the outer backbone, the nucleobases of ssDNA are directly exposed to solvent, and their behavior in water is very different from that of dsDNA, especially at the interface with nanoparticles. In this work, we have improved the force field of ssDNA for use with nanoparticles, such as AuNPs, based on recent experimental results and quantum mechanics calculations. With the new improved force field, we demonstrated that a poly(A) sequence adsorbed on a AuNP surface is much more stable than a poly(T) sequence, which is consistent with recent experimental observations. On the contrary, the current standard force fields, including AMBER03, CHARMM27, and OPLSAA, all gave erroneous results as compared to experiments. The current improved force field is expected to have wide applications in the study of ssDNA with nanomaterials including AuNPs, which might help promote the development of ssDNA-based biosensors and other bionano-devices.« less

  4. An improved DNA force field for ssDNA interactions with gold nanoparticles

    NASA Astrophysics Data System (ADS)

    Jiang, Xiankai; Gao, Jun; Huynh, Tien; Huai, Ping; Fan, Chunhai; Zhou, Ruhong; Song, Bo

    2014-06-01

    The widespread applications of single-stranded DNA (ssDNA) conjugated gold nanoparticles (AuNPs) have spurred an increasing interest in the interactions between ssDNA and AuNPs. Despite extensive studies using the most sophisticated experimental techniques, the detailed molecular mechanisms still remain largely unknown. Large scale molecular dynamics (MD) simulations can thus be used to supplement experiments by providing complementary information about ssDNA-AuNP interactions. However, up to now, all modern force fields for DNA were developed based on the properties of double-stranded DNA (dsDNA) molecules, which have hydrophilic outer backbones "protecting" hydrophobic inner nucleobases from water. Without the double-helix structure of dsDNA and thus the "protection" by the outer backbone, the nucleobases of ssDNA are directly exposed to solvent, and their behavior in water is very different from that of dsDNA, especially at the interface with nanoparticles. In this work, we have improved the force field of ssDNA for use with nanoparticles, such as AuNPs, based on recent experimental results and quantum mechanics calculations. With the new improved force field, we demonstrated that a poly(A) sequence adsorbed on a AuNP surface is much more stable than a poly(T) sequence, which is consistent with recent experimental observations. On the contrary, the current standard force fields, including AMBER03, CHARMM27, and OPLSAA, all gave erroneous results as compared to experiments. The current improved force field is expected to have wide applications in the study of ssDNA with nanomaterials including AuNPs, which might help promote the development of ssDNA-based biosensors and other bionano-devices.

  5. An improved DNA force field for ssDNA interactions with gold nanoparticles.

    PubMed

    Jiang, Xiankai; Gao, Jun; Huynh, Tien; Huai, Ping; Fan, Chunhai; Zhou, Ruhong; Song, Bo

    2014-06-21

    The widespread applications of single-stranded DNA (ssDNA) conjugated gold nanoparticles (AuNPs) have spurred an increasing interest in the interactions between ssDNA and AuNPs. Despite extensive studies using the most sophisticated experimental techniques, the detailed molecular mechanisms still remain largely unknown. Large scale molecular dynamics (MD) simulations can thus be used to supplement experiments by providing complementary information about ssDNA-AuNP interactions. However, up to now, all modern force fields for DNA were developed based on the properties of double-stranded DNA (dsDNA) molecules, which have hydrophilic outer backbones "protecting" hydrophobic inner nucleobases from water. Without the double-helix structure of dsDNA and thus the "protection" by the outer backbone, the nucleobases of ssDNA are directly exposed to solvent, and their behavior in water is very different from that of dsDNA, especially at the interface with nanoparticles. In this work, we have improved the force field of ssDNA for use with nanoparticles, such as AuNPs, based on recent experimental results and quantum mechanics calculations. With the new improved force field, we demonstrated that a poly(A) sequence adsorbed on a AuNP surface is much more stable than a poly(T) sequence, which is consistent with recent experimental observations. On the contrary, the current standard force fields, including AMBER03, CHARMM27, and OPLSAA, all gave erroneous results as compared to experiments. The current improved force field is expected to have wide applications in the study of ssDNA with nanomaterials including AuNPs, which might help promote the development of ssDNA-based biosensors and other bionano-devices.

  6. Sequencing of the large dsDNA genome of Oryctes rhinoceros nudivirus using multiple displacement amplification of nanogram amounts of virus DNA.

    PubMed

    Wang, Yongjie; Kleespies, Regina G; Ramle, Moslim B; Jehle, Johannes A

    2008-09-01

    The genomic sequence analysis of many large dsDNA viruses is hampered by the lack of enough sample materials. Here, we report a whole genome amplification of the Oryctes rhinoceros nudivirus (OrNV) isolate Ma07 starting from as few as about 10 ng of purified viral DNA by application of phi29 DNA polymerase- and exonuclease-resistant random hexamer-based multiple displacement amplification (MDA) method. About 60 microg of high molecular weight DNA with fragment sizes of up to 25 kbp was amplified. A genomic DNA clone library was generated using the product DNA. After 8-fold sequencing coverage, the 127,615 bp of OrNV whole genome was sequenced successfully. The results demonstrate that the MDA-based whole genome amplification enables rapid access to genomic information from exiguous virus samples.

  7. TaxI: a software tool for DNA barcoding using distance methods

    PubMed Central

    Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

    2005-01-01

    DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755

  8. Cardiovascular genetics: technological advancements and applicability for dilated cardiomyopathy.

    PubMed

    Kummeling, G J M; Baas, A F; Harakalova, M; van der Smagt, J J; Asselbergs, F W

    2015-07-01

    Genetics plays an important role in the pathophysiology of cardiovascular diseases, and is increasingly being integrated into clinical practice. Since 2008, both capacity and cost-efficiency of mutation screening of DNA have been increased magnificently due to the technological advancement obtained by next-generation sequencing. Hence, the discovery rate of genetic defects in cardiovascular genetics has grown rapidly and the financial threshold for gene diagnostics has been lowered, making large-scale DNA sequencing broadly accessible. In this review, the genetic variants, mutations and inheritance models are briefly introduced, after which an overview is provided of current clinical and technological applications in gene diagnostics and research for cardiovascular disease and in particular, dilated cardiomyopathy. Finally, a reflection on the future perspectives in cardiogenetics is given.

  9. Predicting Hydrologic Function With Aquatic Gene Fragments

    NASA Astrophysics Data System (ADS)

    Good, S. P.; URycki, D. R.; Crump, B. C.

    2018-03-01

    Recent advances in microbiology techniques, such as genetic sequencing, allow for rapid and cost-effective collection of large quantities of genetic information carried within water samples. Here we posit that the unique composition of aquatic DNA material within a water sample contains relevant information about hydrologic function at multiple temporal scales. In this study, machine learning was used to develop discharge prediction models trained on the relative abundance of bacterial taxa classified into operational taxonomic units (OTUs) based on 16S rRNA gene sequences from six large arctic rivers. We term this approach "genohydrology," and show that OTU relative abundances can be used to predict river discharge at monthly and longer timescales. Based on a single DNA sample from each river, the average Nash-Sutcliffe efficiency (NSE) for predicted mean monthly discharge values throughout the year was 0.84, while the NSE for predicted discharge values across different return intervals was 0.67. These are considerable improvements over predictions based only on the area-scaled mean specific discharge of five similar rivers, which had average NSE values of 0.64 and -0.32 for seasonal and recurrence interval discharge values, respectively. The genohydrology approach demonstrates that genetic diversity within the aquatic microbiome is a large and underutilized data resource with benefits for prediction of hydrologic function.

  10. Genome-wide Mapping Reveals Conservation of Promoter DNA Methylation Following Chicken Domestication

    PubMed Central

    Li, Qinghe; Wang, Yuanyuan; Hu, Xiaoxiang; Zhao, Yaofeng; Li, Ning

    2015-01-01

    It is well-known that environment influences DNA methylation, however, the extent of heritable DNA methylation variation following animal domestication remains largely unknown. Using meDIP-chip we mapped the promoter methylomes for 23,316 genes in muscle tissues of ancestral and domestic chickens. We systematically examined the variation of promoter DNA methylation in terms of different breeds, differentially expressed genes, SNPs and genes undergo genetic selection sweeps. While considerable changes in DNA sequence and gene expression programs were prevalent, we found that the inter-strain DNA methylation patterns were highly conserved in promoter region between the wild and domestic chicken breeds. Our data suggests a global preservation of DNA methylation between the wild and domestic chicken breeds in either a genome-wide or locus-specific scale in chick muscle tissues. PMID:25735894

  11. High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency

    PubMed Central

    Calvo, Sarah E; Tucker, Elena J; Compton, Alison G; Kirby, Denise M; Crawford, Gabriel; Burtt, Noel P; Rivas, Manuel A; Guiducci, Candace; Bruno, Damien L; Goldberger, Olga A; Redman, Michelle C; Wiltshire, Esko; Wilson, Callum J; Altshuler, David; Gabriel, Stacey B; Daly, Mark J; Thorburn, David R; Mootha, Vamsi K

    2010-01-01

    Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing, and experimental validation to uncover the molecular basis of mitochondrial complex I (CI) disorders. We created five pools of DNA from a cohort of 103 patients and then performed deep sequencing of 103 candidate genes to spotlight 151 rare variants predicted to impact protein function. We used confirmatory experiments to establish genetic diagnoses in 22% of previously unsolved cases, and discovered that defects in NUBPL and FOXRED1 can cause CI deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can reveal novel disease-causing mutations in individual patients. PMID:20818383

  12. Characterization of North American Armillaria species: Genetic relationships determined by ribosomal DNA sequences and AFLP markers

    Treesearch

    M. -S. Kim; N. B. Klopfenstein; J. W. Hanna; G. I. McDonald

    2006-01-01

    Phylogenetic and genetic relationships among 10 North American Armillaria species were analysed using sequence data from ribosomal DNA (rDNA), including intergenic spacer (IGS-1), internal transcribed spacers with associated 5.8S (ITS + 5.8S), and nuclear large subunit rDNA (nLSU), and amplified fragment length polymorphism (AFLP) markers. Based on rDNA sequence data,...

  13. GST-PRIME: an algorithm for genome-wide primer design.

    PubMed

    Leister, Dario; Varotto, Claudio

    2007-01-01

    The profiling of mRNA expression based on DNA arrays has become a powerful tool to study genome-wide transcription of genes in a number of organisms. GST-PRIME is a software package created to facilitate large-scale primer design for the amplification of probes to be immobilized on arrays for transcriptome analyses, even though it can be also applied in low-throughput approaches. GST-PRIME allows highly efficient, direct amplification of gene-sequence tags (GSTs) from genomic DNA (gDNA), starting from annotated genome or transcript sequences. GST-PRIME provides a customer-friendly platform for automatic primer design, and despite the relative simplicity of the algorithm, experimental tests in the model plant species Arabidopsis thaliana confirmed the reliability of the software. This chapter describes the algorithm used for primer design, its input and output files, and the installation of the standalone package and its use.

  14. Long-range correlations and charge transport properties of DNA sequences

    NASA Astrophysics Data System (ADS)

    Liu, Xiao-liang; Ren, Yi; Xie, Qiong-tao; Deng, Chao-sheng; Xu, Hui

    2010-04-01

    By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that λ-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5

  15. A DNA 'barcode blitz': rapid digitization and sequencing of a natural history collection.

    PubMed

    Hebert, Paul D N; Dewaard, Jeremy R; Zakharov, Evgeny V; Prosser, Sean W J; Sones, Jayme E; McKeown, Jaclyn T A; Mantle, Beth; La Salle, John

    2013-01-01

    DNA barcoding protocols require the linkage of each sequence record to a voucher specimen that has, whenever possible, been authoritatively identified. Natural history collections would seem an ideal resource for barcode library construction, but they have never seen large-scale analysis because of concerns linked to DNA degradation. The present study examines the strength of this barrier, carrying out a comprehensive analysis of moth and butterfly (Lepidoptera) species in the Australian National Insect Collection. Protocols were developed that enabled tissue samples, specimen data, and images to be assembled rapidly. Using these methods, a five-person team processed 41,650 specimens representing 12,699 species in 14 weeks. Subsequent molecular analysis took about six months, reflecting the need for multiple rounds of PCR as sequence recovery was impacted by age, body size, and collection protocols. Despite these variables and the fact that specimens averaged 30.4 years old, barcode records were obtained from 86% of the species. In fact, one or more barcode compliant sequences (>487 bp) were recovered from virtually all species represented by five or more individuals, even when the youngest was 50 years old. By assembling specimen images, distributional data, and DNA barcode sequences on a web-accessible informatics platform, this study has greatly advanced accessibility to information on thousands of species. Moreover, much of the specimen data became publically accessible within days of its acquisition, while most sequence results saw release within three months. As such, this study reveals the speed with which DNA barcode workflows can mobilize biodiversity data, often providing the first web-accessible information for a species. These results further suggest that existing collections can enable the rapid development of a comprehensive DNA barcode library for the most diverse compartment of terrestrial biodiversity - insects.

  16. "First generation" automated DNA sequencing technology.

    PubMed

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  17. Cytology of DNA Replication Reveals Dynamic Plasticity of Large-Scale Chromatin Fibers.

    PubMed

    Deng, Xiang; Zhironkina, Oxana A; Cherepanynets, Varvara D; Strelkova, Olga S; Kireev, Igor I; Belmont, Andrew S

    2016-09-26

    In higher eukaryotic interphase nuclei, the 100- to >1,000-fold linear compaction of chromatin is difficult to reconcile with its function as a template for transcription, replication, and repair. It is challenging to imagine how DNA and RNA polymerases with their associated molecular machinery would move along the DNA template without transient decondensation of observed large-scale chromatin "chromonema" fibers [1]. Transcription or "replication factory" models [2], in which polymerases remain fixed while DNA is reeled through, are similarly difficult to conceptualize without transient decondensation of these chromonema fibers. Here, we show how a dynamic plasticity of chromatin folding within large-scale chromatin fibers allows DNA replication to take place without significant changes in the global large-scale chromatin compaction or shape of these large-scale chromatin fibers. Time-lapse imaging of lac-operator-tagged chromosome regions shows no major change in the overall compaction of these chromosome regions during their DNA replication. Improved pulse-chase labeling of endogenous interphase chromosomes yields a model in which the global compaction and shape of large-Mbp chromatin domains remains largely invariant during DNA replication, with DNA within these domains undergoing significant movements and redistribution as they move into and then out of adjacent replication foci. In contrast to hierarchical folding models, this dynamic plasticity of large-scale chromatin organization explains how localized changes in DNA topology allow DNA replication to take place without an accompanying global unfolding of large-scale chromatin fibers while suggesting a possible mechanism for maintaining epigenetic programming of large-scale chromatin domains throughout DNA replication. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

    2010-01-01

    GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bi-monthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI homepage: www.ncbi.nlm.nih.gov.

  19. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

    2009-01-01

    GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank(R) staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

  20. P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool.

    PubMed

    Peng, Shaoliang; Yang, Shunyun; Gao, Ming; Liao, Xiangke; Liu, Jie; Yang, Canqun; Wu, Chengkun; Yu, Wenqiang

    2017-03-14

    The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).

  1. Gone with the currents: lack of genetic differentiation at the circum-continental scale in the Antarctic krill Euphausia superba

    PubMed Central

    2011-01-01

    Background Southern Ocean fauna represent a significant amount of global biodiversity, whose origin may be linked to glacial cycles determining local extinction/eradication with ice advance, survival of refugee populations and post-glacial re-colonization. This pattern implies high potential for differentiation in benthic shelf species with limited dispersal, yet consequences for pelagic organisms are less clear. The present study investigates levels of genetic variation and population structure of the Antarctic krill Euphausia superba using mitochondrial DNA and EST-linked microsatellite markers for an unprecedentedly comprehensive sampling of its populations over a circum-Antarctic range. Results MtDNA (ND1) sequences and EST-linked microsatellite markers indicated no clear sign of genetic structure among populations over large geographic scales, despite considerable power to detect differences inferred from forward-time simulations. Based on ND1, few instances of genetic heterogeneity, not significant after correction for multiple tests, were detected between geographic or temporal samples. Neutrality tests and mismatch distribution based on mtDNA sequences revealed strong evidence of past population expansion. Significant positive values of the parameter g (a measure of population growth) were obtained from microsatellite markers using a coalescent-based genealogical method and suggested a recent start (60 000 - 40 000 years ago) for the expansion. Conclusions The results provide evidence of lack of genetic heterogeneity of Antarctic krill at large geographic scales and unequivocal support for recent population expansion. Lack of genetic structuring likely reflects the tight link between krill and circum-Antarctic ocean currents and is consistent with the hypothesis that differentiation processes in Antarctic species are largely influenced by dispersal potential, whereas small-scale spatial and temporal differentiation might be due to local conditions leading to genetic patchiness. The signal of recent population growth suggests differential impact of glacial cycles on pelagic Antarctic species, which experienced population expansion during glaciations with increased available habitat, versus sedentary benthic shelf species. EST-linked microsatellites provide new perspectives to complement the results based on mtDNA and suggest that data-mining of EST libraries will be a useful approach to facilitate use of microsatellites for additional species. PMID:21486439

  2. A Toolkit for bulk PCR-based marker design from next-generation sequence data: application for development of a framework linkage map in bulb onion (Allium cepa L.)

    PubMed Central

    2012-01-01

    Background Although modern sequencing technologies permit the ready detection of numerous DNA sequence variants in any organisms, converting such information to PCR-based genetic markers is hampered by a lack of simple, scalable tools. Onion is an example of an under-researched crop with a complex, heterozygous genome where genome-based research has previously been hindered by limited sequence resources and genetic markers. Results We report the development of generic tools for large-scale web-based PCR-based marker design in the Galaxy bioinformatics framework, and their application for development of next-generation genetics resources in a wide cross of bulb onion (Allium cepa L.). Transcriptome sequence resources were developed for the homozygous doubled-haploid bulb onion line ‘CUDH2150’ and the genetically distant Indian landrace ‘Nasik Red’, using 454™ sequencing of normalised cDNA libraries of leaf and shoot. Read mapping of ‘Nasik Red’ reads onto ‘CUDH2150’ assemblies revealed 16836 indel and SNP polymorphisms that were mined for portable PCR-based marker development. Tools for detection of restriction polymorphisms and primer set design were developed in BioPython and adapted for use in the Galaxy workflow environment, enabling large-scale and targeted assay design. Using PCR-based markers designed with these tools, a framework genetic linkage map of over 800cM spanning all chromosomes was developed in a subset of 93 F2 progeny from a very large F2 family developed from the ‘Nasik Red’ x ‘CUDH2150’ inter-cross. The utility of tools and genetic resources developed was tested by designing markers to transcription factor-like polymorphic sequences. Bin mapping these markers using a subset of 10 progeny confirmed the ability to place markers within 10 cM bins, enabling increased efficiency in marker assignment and targeted map refinement. The major genetic loci conditioning red bulb colour (R) and fructan content (Frc) were located on this map by QTL analysis. Conclusions The generic tools developed for the Galaxy environment enable rapid development of sets of PCR assays targeting sequence variants identified from Illumina and 454 sequence data. They enable non-specialist users to validate and exploit large volumes of next-generation sequence data using basic equipment. PMID:23157543

  3. A toolkit for bulk PCR-based marker design from next-generation sequence data: application for development of a framework linkage map in bulb onion (Allium cepa L.).

    PubMed

    Baldwin, Samantha; Revanna, Roopashree; Thomson, Susan; Pither-Joyce, Meeghan; Wright, Kathryn; Crowhurst, Ross; Fiers, Mark; Chen, Leshi; Macknight, Richard; McCallum, John A

    2012-11-19

    Although modern sequencing technologies permit the ready detection of numerous DNA sequence variants in any organisms, converting such information to PCR-based genetic markers is hampered by a lack of simple, scalable tools. Onion is an example of an under-researched crop with a complex, heterozygous genome where genome-based research has previously been hindered by limited sequence resources and genetic markers. We report the development of generic tools for large-scale web-based PCR-based marker design in the Galaxy bioinformatics framework, and their application for development of next-generation genetics resources in a wide cross of bulb onion (Allium cepa L.). Transcriptome sequence resources were developed for the homozygous doubled-haploid bulb onion line 'CUDH2150' and the genetically distant Indian landrace 'Nasik Red', using 454™ sequencing of normalised cDNA libraries of leaf and shoot. Read mapping of 'Nasik Red' reads onto 'CUDH2150' assemblies revealed 16836 indel and SNP polymorphisms that were mined for portable PCR-based marker development. Tools for detection of restriction polymorphisms and primer set design were developed in BioPython and adapted for use in the Galaxy workflow environment, enabling large-scale and targeted assay design. Using PCR-based markers designed with these tools, a framework genetic linkage map of over 800cM spanning all chromosomes was developed in a subset of 93 F(2) progeny from a very large F(2) family developed from the 'Nasik Red' x 'CUDH2150' inter-cross. The utility of tools and genetic resources developed was tested by designing markers to transcription factor-like polymorphic sequences. Bin mapping these markers using a subset of 10 progeny confirmed the ability to place markers within 10 cM bins, enabling increased efficiency in marker assignment and targeted map refinement. The major genetic loci conditioning red bulb colour (R) and fructan content (Frc) were located on this map by QTL analysis. The generic tools developed for the Galaxy environment enable rapid development of sets of PCR assays targeting sequence variants identified from Illumina and 454 sequence data. They enable non-specialist users to validate and exploit large volumes of next-generation sequence data using basic equipment.

  4. Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.)

    USDA-ARS?s Scientific Manuscript database

    This study reports generation of large-scale genomic resources for pigeonpea, a so-called ‘orphan crop species’ of the semi-arid tropic regions. Roche FLX/454 sequencing was carried out on a normalized cDNA pool prepared from 31 tissues produced 494,353 short transcript reads (STRs). Cluster analysi...

  5. A DNA methylation map of human cancer at single base-pair resolution

    PubMed Central

    Vidal, E; Sayols, S; Moran, S; Guillaumet-Adkins, A; Schroeder, M P; Royo, R; Orozco, M; Gut, M; Gut, I; Lopez-Bigas, N; Heyn, H; Esteller, M

    2017-01-01

    Although single base-pair resolution DNA methylation landscapes for embryonic and different somatic cell types provided important insights into epigenetic dynamics and cell-type specificity, such comprehensive profiling is incomplete across human cancer types. This prompted us to perform genome-wide DNA methylation profiling of 22 samples derived from normal tissues and associated neoplasms, including primary tumors and cancer cell lines. Unlike their invariant normal counterparts, cancer samples exhibited highly variable CpG methylation levels in a large proportion of the genome, involving progressive changes during tumor evolution. The whole-genome sequencing results from selected samples were replicated in a large cohort of 1112 primary tumors of various cancer types using genome-scale DNA methylation analysis. Specifically, we determined DNA hypermethylation of promoters and enhancers regulating tumor-suppressor genes, with potential cancer-driving effects. DNA hypermethylation events showed evidence of positive selection, mutual exclusivity and tissue specificity, suggesting their active participation in neoplastic transformation. Our data highlight the extensive changes in DNA methylation that occur in cancer onset, progression and dissemination. PMID:28581523

  6. Minimap2: pairwise alignment for nucleotide sequences.

    PubMed

    Li, Heng

    2018-05-10

    Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.

  7. Phylogenetic relationships of the Gomphales based on nuc-25S-rDNA, mit-12S-rDNA, and mit-atp6-DNA combined sequences

    Treesearch

    Admir J. Giachini; Kentaro Hosaka; Eduardo Nouhra; Joseph Spatafora; James M. Trappe

    2010-01-01

    Phylogenetic relationships among Geastrales, Gomphales, Hysterangiales, and Phallales were estimated via combined sequences: nuclear large subunit ribosomal DNA (nuc-25S-rDNA), mitochondrial small subunit ribosomal DNA (mit-12S-rDNA), and mitochondrial atp6 DNA (mit-atp6-DNA). Eighty-one taxa comprising 19 genera and 58 species...

  8. DNA and RNA sequencing by nanoscale reading through programmable electrophoresis and nanoelectrode-gated tunneling and dielectric detection

    DOEpatents

    Lee, James W.; Thundat, Thomas G.

    2005-06-14

    An apparatus and method for performing nucleic acid (DNA and/or RNA) sequencing on a single molecule. The genetic sequence information is obtained by probing through a DNA or RNA molecule base by base at nanometer scale as though looking through a strip of movie film. This DNA sequencing nanotechnology has the theoretical capability of performing DNA sequencing at a maximal rate of about 1,000,000 bases per second. This enhanced performance is made possible by a series of innovations including: novel applications of a fine-tuned nanometer gap for passage of a single DNA or RNA molecule; thin layer microfluidics for sample loading and delivery; and programmable electric fields for precise control of DNA or RNA movement. Detection methods include nanoelectrode-gated tunneling current measurements, dielectric molecular characterization, and atomic force microscopy/electrostatic force microscopy (AFM/EFM) probing for nanoscale reading of the nucleic acid sequences.

  9. A Hybrid Approach for the Automated Finishing of Bacterial Genomes

    PubMed Central

    Robins, William P.; Chin, Chen-Shan; Webster, Dale; Paxinos, Ellen; Hsu, David; Ashby, Meredith; Wang, Susana; Peluso, Paul; Sebra, Robert; Sorenson, Jon; Bullard, James; Yen, Jackie; Valdovino, Marie; Mollova, Emilia; Luong, Khai; Lin, Steven; LaMay, Brianna; Joshi, Amruta; Rowe, Lori; Frace, Michael; Tarr, Cheryl L.; Turnsek, Maryann; Davis, Brigid M; Kasarskis, Andrew; Mekalanos, John J.; Waldor, Matthew K.; Schadt, Eric E.

    2013-01-01

    Dramatic improvements in DNA sequencing technology have revolutionized our ability to characterize most genomic diversity. However, accurate resolution of large structural events has remained challenging due to the comparatively shorter read lengths of second-generation technologies. Emerging third-generation sequencing technologies, which yield markedly increased read length on rapid time scales and for low cost, have the potential to address assembly limitations. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at > 99.9% accuracy. Complex regions with clinically significant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 reference we obtain 14 and 8 scaffolds greater than 1kb, respectively, correcting several errors in the underlying source data. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly. PMID:22750883

  10. Management of familial cancer: sequencing, surveillance and society.

    PubMed

    Samuel, Nardin; Villani, Anita; Fernandez, Conrad V; Malkin, David

    2014-12-01

    The clinical management of familial cancer begins with recognition of patterns of cancer occurrence suggestive of genetic susceptibility in a proband or pedigree, to enable subsequent investigation of the underlying DNA mutations. In this regard, next-generation sequencing of DNA continues to transform cancer diagnostics, by enabling screening for cancer-susceptibility genes in the context of known and emerging familial cancer syndromes. Increasingly, not only are candidate cancer genes sequenced, but also entire 'healthy' genomes are mapped in children with cancer and their family members. Although large-scale genomic analysis is considered intrinsic to the success of cancer research and discovery, a number of accompanying ethical and technical issues must be addressed before this approach can be adopted widely in personalized therapy. In this Perspectives article, we describe our views on how the emergence of new sequencing technologies and cancer surveillance strategies is altering the framework for the clinical management of hereditary cancer. Genetic counselling and disclosure issues are discussed, and strategies for approaching ethical dilemmas are proposed.

  11. Developmental and Subcellular Organization of Single-Cell C₄ Photosynthesis in Bienertia sinuspersici Determined by Large-Scale Proteomics and cDNA Assembly from 454 DNA Sequencing.

    PubMed

    Offermann, Sascha; Friso, Giulia; Doroshenk, Kelly A; Sun, Qi; Sharpe, Richard M; Okita, Thomas W; Wimmer, Diana; Edwards, Gerald E; van Wijk, Klaas J

    2015-05-01

    Kranz C4 species strictly depend on separation of primary and secondary carbon fixation reactions in different cell types. In contrast, the single-cell C4 (SCC4) species Bienertia sinuspersici utilizes intracellular compartmentation including two physiologically and biochemically different chloroplast types; however, information on identity, localization, and induction of proteins required for this SCC4 system is currently very limited. In this study, we determined the distribution of photosynthesis-related proteins and the induction of the C4 system during development by label-free proteomics of subcellular fractions and leaves of different developmental stages. This was enabled by inferring a protein sequence database from 454 sequencing of Bienertia cDNAs. Large-scale proteome rearrangements were observed as C4 photosynthesis developed during leaf maturation. The proteomes of the two chloroplasts are different with differential accumulation of linear and cyclic electron transport components, primary and secondary carbon fixation reactions, and a triose-phosphate shuttle that is shared between the two chloroplast types. This differential protein distribution pattern suggests the presence of a mRNA or protein-sorting mechanism for nuclear-encoded, chloroplast-targeted proteins in SCC4 species. The combined information was used to provide a comprehensive model for NAD-ME type carbon fixation in SCC4 species.

  12. Spatial Representativeness of Environmental DNA Metabarcoding Signal for Fish Biodiversity Assessment in a Natural Freshwater System.

    PubMed

    Civade, Raphaël; Dejean, Tony; Valentini, Alice; Roset, Nicolas; Raymond, Jean-Claude; Bonin, Aurélie; Taberlet, Pierre; Pont, Didier

    2016-01-01

    In the last few years, the study of environmental DNA (eDNA) has drawn attention for many reasons, including its advantages for monitoring and conservation purposes. So far, in aquatic environments, most of eDNA research has focused on the detection of single species using species-specific markers. Recently, species inventories based on the analysis of a single generalist marker targeting a larger taxonomic group (eDNA metabarcoding) have proven useful for bony fish and amphibian biodiversity surveys. This approach involves in situ filtering of large volumes of water followed by amplification and sequencing of a short discriminative fragment from the 12S rDNA mitochondrial gene. In this study, we went one step further by investigating the spatial representativeness (i.e. ecological reliability and signal variability in space) of eDNA metabarcoding for large-scale fish biodiversity assessment in a freshwater system including lentic and lotic environments. We tested the ability of this approach to characterize large-scale organization of fish communities along a longitudinal gradient, from a lake to the outflowing river. First, our results confirm that eDNA metabarcoding is more efficient than a single traditional sampling campaign to detect species presence, especially in rivers. Second, the species list obtained using this approach is comparable to the one obtained when cumulating all traditional sampling sessions since 1995 and 1988 for the lake and the river, respectively. In conclusion, eDNA metabarcoding gives a faithful description of local fish biodiversity in the study system, more specifically within a range of a few kilometers along the river in our study conditions, i.e. longer than a traditional fish sampling site.

  13. Large scale DNA microsequencing device

    DOEpatents

    Foote, Robert S.

    1997-01-01

    A microminiature sequencing apparatus and method provide means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus comprises a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means.

  14. Large scale DNA microsequencing device

    DOEpatents

    Foote, Robert S.

    1999-01-01

    A microminiature sequencing apparatus and method provide means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus comprises a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means.

  15. Large scale DNA microsequencing device

    DOEpatents

    Foote, R.S.

    1999-08-31

    A microminiature sequencing apparatus and method provide means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus comprises a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means. 11 figs.

  16. iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

    PubMed Central

    Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen

    2011-01-01

    DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. PMID:21935457

  17. Next-Generation Sequencing Platforms

    NASA Astrophysics Data System (ADS)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  18. Versatile and Programmable DNA Logic Gates on Universal and Label-Free Homogeneous Electrochemical Platform.

    PubMed

    Ge, Lei; Wang, Wenxiao; Sun, Ximei; Hou, Ting; Li, Feng

    2016-10-04

    Herein, a novel universal and label-free homogeneous electrochemical platform is demonstrated, on which a complete set of DNA-based two-input Boolean logic gates (OR, NAND, AND, NOR, INHIBIT, IMPLICATION, XOR, and XNOR) is constructed by simply and rationally deploying the designed DNA polymerization/nicking machines without complicated sequence modulation. Single-stranded DNA is employed as the proof-of-concept target/input to initiate or prevent the DNA polymerization/nicking cyclic reactions on these DNA machines to synthesize numerous intact G-quadruplex sequences or binary G-quadruplex subunits as the output. The generated output strands then self-assemble into G-quadruplexes that render remarkable decrease to the diffusion current response of methylene blue and, thus, provide the amplified homogeneous electrochemical readout signal not only for the logic gate operations but also for the ultrasensitive detection of the target/input. This system represents the first example of homogeneous electrochemical logic operation. Importantly, the proposed homogeneous electrochemical logic gates possess the input/output homogeneity and share a constant output threshold value. Moreover, the modular design of DNA polymerization/nicking machines enables the adaptation of these homogeneous electrochemical logic gates to various input and output sequences. The results of this study demonstrate the versatility and universality of the label-free homogeneous electrochemical platform in the design of biomolecular logic gates and provide a potential platform for the further development of large-scale DNA-based biocomputing circuits and advanced biosensors for multiple molecular targets.

  19. A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12,028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries.

    PubMed

    Asamizu, E; Nakamura, Y; Sato, S; Tabata, S

    2000-06-30

    For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5'-end ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery.

  20. Statistical genetics concepts and approaches in schizophrenia and related neuropsychiatric research.

    PubMed

    Schork, Nicholas J; Greenwood, Tiffany A; Braff, David L

    2007-01-01

    Statistical genetics is a research field that focuses on mathematical models and statistical inference methodologies that relate genetic variations (ie, naturally occurring human DNA sequence variations or "polymorphisms") to particular traits or diseases (phenotypes) usually from data collected on large samples of families or individuals. The ultimate goal of such analysis is the identification of genes and genetic variations that influence disease susceptibility. Although of extreme interest and importance, the fact that many genes and environmental factors contribute to neuropsychiatric diseases of public health importance (eg, schizophrenia, bipolar disorder, and depression) complicates relevant studies and suggests that very sophisticated mathematical and statistical modeling may be required. In addition, large-scale contemporary human DNA sequencing and related projects, such as the Human Genome Project and the International HapMap Project, as well as the development of high-throughput DNA sequencing and genotyping technologies have provided statistical geneticists with a great deal of very relevant and appropriate information and resources. Unfortunately, the use of these resources and their interpretation are not straightforward when applied to complex, multifactorial diseases such as schizophrenia. In this brief and largely nonmathematical review of the field of statistical genetics, we describe many of the main concepts, definitions, and issues that motivate contemporary research. We also provide a discussion of the most pressing contemporary problems that demand further research if progress is to be made in the identification of genes and genetic variations that predispose to complex neuropsychiatric diseases.

  1. SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

    PubMed

    Yu, Qiang; Wei, Dingbang; Huo, Hongwei

    2018-06-18

    Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.

  2. Diverse Applications of Environmental DNA Methods in Parasitology.

    PubMed

    Bass, David; Stentiford, Grant D; Littlewood, D T J; Hartikainen, Hanna

    2015-10-01

    Nucleic acid extraction and sequencing of genes from organisms within environmental samples encompasses a variety of techniques collectively referred to as environmental DNA or 'eDNA'. The key advantages of eDNA analysis include the detection of cryptic or otherwise elusive organisms, large-scale sampling with fewer biases than specimen-based methods, and generation of data for molecular systematics. These are particularly relevant for parasitology because parasites can be difficult to locate and are morphologically intractable and genetically divergent. However, parasites have rarely been the focus of eDNA studies. Focusing on eukaryote parasites, we review the increasing diversity of the 'eDNA toolbox'. Combining eDNA methods with complementary tools offers much potential to understand parasite communities, disease risk, and parasite roles in broader ecosystem processes such as food web structuring and community assembly. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.

  3. Sequencing and functional validation of the JGI Brachypodium distachyon T-DNA collection

    USDA-ARS?s Scientific Manuscript database

    Brachypodium distachyon is a powerful experimental model for the grasses with a large and growing collection of genomic and experimental resources. We have added to these resources by greatly expanding the number of sequence-indexed T-DNA lines. We sequenced 21,165 T-DNA lines, 15,569 of which were ...

  4. Guidelines for whole genome bisulphite sequencing of intact and FFPET DNA on the Illumina HiSeq X Ten.

    PubMed

    Nair, Shalima S; Luu, Phuc-Loi; Qu, Wenjia; Maddugoda, Madhavi; Huschtscha, Lily; Reddel, Roger; Chenevix-Trench, Georgia; Toso, Martina; Kench, James G; Horvath, Lisa G; Hayes, Vanessa M; Stricker, Phillip D; Hughes, Timothy P; White, Deborah L; Rasko, John E J; Wong, Justin J-L; Clark, Susan J

    2018-05-28

    Comprehensive genome-wide DNA methylation profiling is critical to gain insights into epigenetic reprogramming during development and disease processes. Among the different genome-wide DNA methylation technologies, whole genome bisulphite sequencing (WGBS) is considered the gold standard for assaying genome-wide DNA methylation at single base resolution. However, the high sequencing cost to achieve the optimal depth of coverage limits its application in both basic and clinical research. To achieve 15× coverage of the human methylome, using WGBS, requires approximately three lanes of 100-bp-paired-end Illumina HiSeq 2500 sequencing. It is important, therefore, for advances in sequencing technologies to be developed to enable cost-effective high-coverage sequencing. In this study, we provide an optimised WGBS methodology, from library preparation to sequencing and data processing, to enable 16-20× genome-wide coverage per single lane of HiSeq X Ten, HCS 3.3.76. To process and analyse the data, we developed a WGBS pipeline (METH10X) that is fast and can call SNPs. We performed WGBS on both high-quality intact DNA and degraded DNA from formalin-fixed paraffin-embedded tissue. First, we compared different library preparation methods on the HiSeq 2500 platform to identify the best method for sequencing on the HiSeq X Ten. Second, we optimised the PhiX and genome spike-ins to achieve higher quality and coverage of WGBS data on the HiSeq X Ten. Third, we performed integrated whole genome sequencing (WGS) and WGBS of the same DNA sample in a single lane of HiSeq X Ten to improve data output. Finally, we compared methylation data from the HiSeq 2500 and HiSeq X Ten and found high concordance (Pearson r > 0.9×). Together we provide a systematic, efficient and complete approach to perform and analyse WGBS on the HiSeq X Ten. Our protocol allows for large-scale WGBS studies at reasonable processing time and cost on the HiSeq X Ten platform.

  5. GENESUS: a two-step sequence design program for DNA nanostructure self-assembly.

    PubMed

    Tsutsumi, Takanobu; Asakawa, Takeshi; Kanegami, Akemi; Okada, Takao; Tahira, Tomoko; Hayashi, Kenshi

    2014-01-01

    DNA has been recognized as an ideal material for bottom-up construction of nanometer scale structures by self-assembly. The generation of sequences optimized for unique self-assembly (GENESUS) program reported here is a straightforward method for generating sets of strand sequences optimized for self-assembly of arbitrarily designed DNA nanostructures by a generate-candidates-and-choose-the-best strategy. A scalable procedure to prepare single-stranded DNA having arbitrary sequences is also presented. Strands for the assembly of various structures were designed and successfully constructed, validating both the program and the procedure.

  6. Sequence and Structure Dependent DNA-DNA Interactions

    NASA Astrophysics Data System (ADS)

    Kopchick, Benjamin; Qiu, Xiangyun

    Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.

  7. Enzymatic Synthesis of Self-assembled Dicer Substrate RNA Nanostructures for Programmable Gene Silencing.

    PubMed

    Jang, Bora; Kim, Boyoung; Kim, Hyunsook; Kwon, Hyokyoung; Kim, Minjeong; Seo, Yunmi; Colas, Marion; Jeong, Hansaem; Jeong, Eun Hye; Lee, Kyuri; Lee, Hyukjin

    2018-06-08

    Enzymatic synthesis of RNA nanostructures is achieved by isothermal rolling circle transcription (RCT). Each arm of RNA nanostructures provides a functional role of Dicer substrate RNA inducing sequence specific RNA interference (RNAi). Three different RNAi sequences (GFP, RFP, and BFP) are incorporated within the three-arm junction RNA nanostructures (Y-RNA). The template and helper DNA strands are designed for the large-scale in vitro synthesis of RNA strands to prepare self-assembled Y-RNA. Interestingly, Dicer processing of Y-RNA is highly influenced by its physical structure and different gene silencing activity is achieved depending on its arm length and overhang. In addition, enzymatic synthesis allows the preparation of various Y-RNA structures using a single DNA template offering on demand regulation of multiple target genes.

  8. Almost 20 years of Neanderthal palaeogenetics: adaptation, admixture, diversity, demography and extinction

    PubMed Central

    Sánchez-Quinto, Federico; Lalueza-Fox, Carles

    2015-01-01

    Nearly two decades since the first retrieval of Neanderthal DNA, recent advances in next-generation sequencing technologies have allowed the generation of high-coverage genomes from two archaic hominins, a Neanderthal and a Denisovan, as well as a complete mitochondrial genome from remains which probably represent early members of the Neanderthal lineage. This genomic information, coupled with diversity exome data from several Neanderthal specimens is shedding new light on evolutionary processes such as the genetic basis of Neanderthal and modern human-specific adaptations—including morphological and behavioural traits—as well as the extent and nature of the admixture events between them. An emerging picture is that Neanderthals had a long-term small population size, lived in small and isolated groups and probably practised inbreeding at times. Deleterious genetic effects associated with these demographic factors could have played a role in their extinction. The analysis of DNA from further remains making use of new large-scale hybridization-capture-based methods as well as of new approaches to discriminate contaminant DNA sequences will provide genetic information in spatial and temporal scales that could help clarify the Neanderthal's—and our very own—evolutionary history. PMID:25487326

  9. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

    2011-01-01

    GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system that integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

  10. Long-range barcode labeling-sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Feng; Zhang, Tao; Singh, Kanwar K.

    Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.

  11. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale.

    PubMed

    Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun

    2015-01-01

    Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.

  12. Compressing DNA sequence databases with coil.

    PubMed

    White, W Timothy J; Hendy, Michael D

    2008-05-20

    Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression - an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression - the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  13. Compressing DNA sequence databases with coil

    PubMed Central

    White, W Timothy J; Hendy, Michael D

    2008-01-01

    Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794

  14. Large-scale transcriptome characterization and mass discovery of SNPs in globe artichoke and its related taxa.

    PubMed

    Scaglione, Davide; Lanteri, Sergio; Acquadro, Alberto; Lai, Zhao; Knapp, Steven J; Rieseberg, Loren; Portis, Ezio

    2012-10-01

    Cynara cardunculus (2n = 2× = 34) is a member of the Asteraceae family that contributes significantly to the agricultural economy of the Mediterranean basin. The species includes two cultivated varieties, globe artichoke and cardoon, which are grown mainly for food. Cynara cardunculus is an orphan crop species whose genome/transcriptome has been relatively unexplored, especially in comparison to other Asteraceae crops. Hence, there is a significant need to improve its genomic resources through the identification of novel genes and sequence-based markers, to design new breeding schemes aimed at increasing quality and crop productivity. We report the outcome of cDNA sequencing and assembly for eleven accessions of C. cardunculus. Sequencing of three mapping parental genotypes using Roche 454-Titanium technology generated 1.7 × 10⁶ reads, which were assembled into 38,726 reference transcripts covering 32 Mbp. Putative enzyme-encoding genes were annotated using the KEGG-database. Transcription factors and candidate resistance genes were surveyed as well. Paired-end sequencing was done for cDNA libraries of eight other representative C. cardunculus accessions on an Illumina Genome Analyzer IIx, generating 46 × 10⁶ reads. Alignment of the IGA and 454 reads to reference transcripts led to the identification of 195,400 SNPs with a Bayesian probability exceeding 95%; a validation rate of 90% was obtained by Sanger-sequencing of a subset of contigs. These results demonstrate that the integration of data from different NGS platforms enables large-scale transcriptome characterization, along with massive SNP discovery. This information will contribute to the dissection of key agricultural traits in C. cardunculus and facilitate the implementation of marker-assisted selection programs. © 2012 The Authors. Plant Biotechnology Journal © 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.

  15. Preparation of metagenomic libraries from naturally occurring marine viruses.

    PubMed

    Solonenko, Sergei A; Sullivan, Matthew B

    2013-01-01

    Microbes are now well recognized as major drivers of the biogeochemical cycling that fuels the Earth, and their viruses (phages) are known to be abundant and important in microbial mortality, horizontal gene transfer, and modulating microbial metabolic output. Investigation of environmental phages has been frustrated by an inability to culture the vast majority of naturally occurring diversity coupled with the lack of robust, quantitative, culture-independent methods for studying this uncultured majority. However, for double-stranded DNA phages, a quantitative viral metagenomic sample-to-sequence workflow now exists. Here, we review these advances with special emphasis on the technical details of preparing DNA sequencing libraries for metagenomic sequencing from environmentally relevant low-input DNA samples. Library preparation steps broadly involve manipulating the sample DNA by fragmentation, end repair and adaptor ligation, size fractionation, and amplification. One critical area of future research and development is parallel advances for alternate nucleic acid types such as single-stranded DNA and RNA viruses that are also abundant in nature. Combinations of recent advances in fragmentation (e.g., acoustic shearing and tagmentation), ligation reactions (adaptor-to-template ratio reference table availability), size fractionation (non-gel-sizing), and amplification (linear amplification for deep sequencing and linker amplification protocols) enhance our ability to generate quantitatively representative metagenomic datasets from low-input DNA samples. Such datasets are already providing new insights into the role of viruses in marine systems and will continue to do so as new environments are explored and synergies and paradigms emerge from large-scale comparative analyses. © 2013 Elsevier Inc. All rights reserved.

  16. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

    PubMed Central

    2009-01-01

    Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416

  17. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

    PubMed

    Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

    2009-08-06

    Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.

  18. Simulations Using Random-Generated DNA and RNA Sequences

    ERIC Educational Resources Information Center

    Bryce, C. F. A.

    1977-01-01

    Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…

  19. Control of DNA-Functionalized Nanoparticle Assembly

    NASA Astrophysics Data System (ADS)

    Olvera de La Cruz, Monica

    Directed crystallization of a large variety of nanoparticles, including proteins, via DNA hybridization kinetics has led to unique materials with a broad range of crystal symmetries. The nanoparticles are functionalized with DNA chains that link neighboring functionalized units. The shape of the nanoparticle, the DNA length, the sequence of the hybridizing DNA linker and the grafting density determine the crystal symmetries and lattice spacing. By carefully selecting these parameters one can, in principle, achieve all the symmetries found for both atomic and colloidal crystals of asymmetric shapes as well as new symmetries, and drive transitions between them. A scale-accurate coarse-grained model with explicit DNA chains provides the design parameters, including degree of hybridization, to achieve specific crystal structures. The model also provides surface energy values to determine the shape of defect-free single crystals with macroscopic anisotropic properties, as well as the parameters to develop colloidal models that reproduce both the shape of single crystals and their growth kinetics.

  20. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.

    PubMed

    Salzberg, Steven L; Dunning Hotopp, Julie C; Delcher, Arthur L; Pop, Mihai; Smith, Douglas R; Eisen, Michael B; Nelson, William C

    2005-01-01

    The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.

  1. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    PubMed

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.

  2. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

    PubMed Central

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096

  3. A new method to cluster genomes based on cumulative Fourier power spectrum.

    PubMed

    Dong, Rui; Zhu, Ziyue; Yin, Changchuan; He, Rong L; Yau, Stephen S-T

    2018-06-20

    Analyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming. When the scale of data is large, alignment methods cannot finish calculation in reasonable time. Therefore, we present a new method using moments of cumulative Fourier power spectrum in clustering the DNA sequences. Each sequence is translated into a vector in Euclidean space. Distances between the vectors can reflect the relationships between sequences. The mapping between the spectra and moment vector is one-to-one, which means that no information is lost in the power spectra during the calculation. We cluster and classify several datasets including Influenza A, primates, and human rhinovirus (HRV) datasets to build up the phylogenetic trees. Results show that the new proposed cumulative Fourier power spectrum is much faster and more accurately than MSA and another alignment-free method known as k-mer. The research provides us new insights in the study of phylogeny, evolution, and efficient DNA comparison algorithms for large genomes. The computer programs of the cumulative Fourier power spectrum are available at GitHub (https://github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum). Copyright © 2018. Published by Elsevier B.V.

  4. Solving satisfiability problems using a novel microarray-based DNA computer.

    PubMed

    Lin, Che-Hsin; Cheng, Hsiao-Ping; Yang, Chang-Biau; Yang, Chia-Ning

    2007-01-01

    An algorithm based on a modified sticker model accompanied with an advanced MEMS-based microarray technology is demonstrated to solve SAT problem, which has long served as a benchmark in DNA computing. Unlike conventional DNA computing algorithms needing an initial data pool to cover correct and incorrect answers and further executing a series of separation procedures to destroy the unwanted ones, we built solutions in parts to satisfy one clause in one step, and eventually solve the entire Boolean formula through steps. No time-consuming sample preparation procedures and delicate sample applying equipment were required for the computing process. Moreover, experimental results show the bound DNA sequences can sustain the chemical solutions during computing processes such that the proposed method shall be useful in dealing with large-scale problems.

  5. Algorithms for optimizing cross-overs in DNA shuffling.

    PubMed

    He, Lu; Friedman, Alan M; Bailey-Kellogg, Chris

    2012-03-21

    DNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library. This paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. Our CODNS (cross-over optimization for DNA shuffling) approach employs polynomial-time dynamic programming algorithms to select codons for the parental amino acids, allowing for zero or a fixed number of conservative substitutions. We first present efficient algorithms to optimize the local sequence identity or the nearest-neighbor approximation of the change in free energy upon annealing, objectives that were previously optimized by computationally-expensive integer programming methods. We then present efficient algorithms for more powerful objectives that seek to localize and enhance the frequency of recombination by producing "runs" of common nucleotides either overall or according to the sequence diversity of the resulting chimeras. We demonstrate the effectiveness of CODNS in choosing codons and allocating substitutions to promote recombination between parents targeted in earlier studies: two GAR transformylases (41% amino acid sequence identity), two very distantly related DNA polymerases, Pol X and β (15%), and beta-lactamases of varying identity (26-47%). Our methods provide the protein engineer with a new approach to DNA shuffling that supports substantially more diverse parents, is more deterministic, and generates more predictable and more diverse chimeric libraries.

  6. Germline whole exome sequencing and large-scale replication identifies FANCM as a likely high grade serous ovarian cancer susceptibility gene.

    PubMed

    Dicks, Ed; Song, Honglin; Ramus, Susan J; Oudenhove, Elke Van; Tyrer, Jonathan P; Intermaggio, Maria P; Kar, Siddhartha; Harrington, Patricia; Bowtell, David D; Group, Aocs Study; Cicek, Mine S; Cunningham, Julie M; Fridley, Brooke L; Alsop, Jennifer; Jimenez-Linan, Mercedes; Piskorz, Anna; Goranova, Teodora; Kent, Emma; Siddiqui, Nadeem; Paul, James; Crawford, Robin; Poblete, Samantha; Lele, Shashi; Sucheston-Campbell, Lara; Moysich, Kirsten B; Sieh, Weiva; McGuire, Valerie; Lester, Jenny; Odunsi, Kunle; Whittemore, Alice S; Bogdanova, Natalia; Dürst, Matthias; Hillemanns, Peter; Karlan, Beth Y; Gentry-Maharaj, Aleksandra; Menon, Usha; Tischkowitz, Marc; Levine, Douglas; Brenton, James D; Dörk, Thilo; Goode, Ellen L; Gayther, Simon A; Pharoah, D P Paul

    2017-08-01

    We analyzed whole exome sequencing data in germline DNA from 412 high grade serous ovarian cancer (HGSOC) cases from The Cancer Genome Atlas Project and identified 5,517 genes harboring a predicted deleterious germline coding mutation in at least one HGSOC case. Gene-set enrichment analysis showed enrichment for genes involved in DNA repair (p = 1.8×10 -3 ). Twelve DNA repair genes - APEX1, APLF, ATX, EME1, FANCL, FANCM, MAD2L2, PARP2, PARP3, POLN, RAD54L and SMUG1 - were prioritized for targeted sequencing in up to 3,107 HGSOC cases, 1,491 cases of other epithelial ovarian cancer (EOC) subtypes and 3,368 unaffected controls of European origin. We estimated mutation prevalence for each gene and tested for associations with disease risk. Mutations were identified in both cases and controls in all genes except MAD2L2 , where we found no evidence of mutations in controls. In FANCM we observed a higher mutation frequency in HGSOC cases compared to controls (29/3,107 cases, 0.96 percent; 13/3,368 controls, 0.38 percent; P=0.008) with little evidence for association with other subtypes (6/1,491, 0.40 percent; P=0.82). The relative risk of HGSOC associated with deleterious FANCM mutations was estimated to be 2.5 (95% CI 1.3 - 5.0; P=0.006). In summary, whole exome sequencing of EOC cases with large-scale replication in case-control studies has identified FANCM as a likely novel susceptibility gene for HGSOC, with mutations associated with a moderate increase in risk. These data may have clinical implications for risk prediction and prevention approaches for high-grade serous ovarian cancer in the future and a significant impact on reducing disease mortality.

  7. DNA sequence analysis with droplet-based microfluidics

    PubMed Central

    Abate, Adam R.; Hung, Tony; Sperling, Ralph A.; Mary, Pascaline; Rotem, Assaf; Agresti, Jeremy J.; Weiner, Michael A.; Weitz, David A.

    2014-01-01

    Droplet-based microfluidic techniques can form and process micrometer scale droplets at thousands per second. Each droplet can house an individual biochemical reaction, allowing millions of reactions to be performed in minutes with small amounts of total reagent. This versatile approach has been used for engineering enzymes, quantifying concentrations of DNA in solution, and screening protein crystallization conditions. Here, we use it to read the sequences of DNA molecules with a FRET-based assay. Using probes of different sequences, we interrogate a target DNA molecule for polymorphisms. With a larger probe set, additional polymorphisms can be interrogated as well as targets of arbitrary sequence. PMID:24185402

  8. Large scale DNA microsequencing device

    DOEpatents

    Foote, R.S.

    1997-08-26

    A microminiature sequencing apparatus and method provide a means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus cosists of a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means. 17 figs.

  9. Emerging pathogens in the fish farming industry and sequencing-based pathogen discovery.

    PubMed

    Tengs, Torstein; Rimstad, Espen

    2017-10-01

    The use of large scale DNA/RNA sequencing has become an integral part of biomedical research. Reduced sequencing costs and the availability of efficient computational resources has led to a revolution in how problems concerning genomics and transcriptomics are addressed. Sequencing-based pathogen discovery represents one example of how genetic data can now be used in ways that were previously considered infeasible. Emerging pathogens affect both human and animal health due to a multitude of factors, including globalization, a shifting environment and an increasing human population. Fish farming represents a relevant, interesting and challenging system to study emerging pathogens. This review summarizes recent progress in pathogen discovery using sequence data, with particular emphasis on viruses in Atlantic salmon (Salmo salar). Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  11. Genomics: The Science and Technology Behind the Human Genome Project (by Charles R. Cantor and Cassandra L. Smith)

    NASA Astrophysics Data System (ADS)

    Serra, Reviewed By Martin J.

    2000-01-01

    Genomics is one of the most rapidly expanding areas of science. This book is an outgrowth of a series of lectures given by one of the former heads (CRC) of the Human Genome Initiative. The book is designed to reach a wide audience, from biologists with little chemical or physical science background through engineers, computer scientists, and physicists with little current exposure to the chemical or biological principles of genetics. The text starts with a basic review of the chemical and biological properties of DNA. However, without either a biochemistry background or a supplemental biochemistry text, this chapter and much of the rest of the text would be difficult to digest. The second chapter is designed to put DNA into the context of the larger chromosomal unit. Specialized chromosomal structures and sequences (centromeres, telomeres) are introduced, leading to a section on chromosome organization and purification. The next 4 chapters cover the physical (hybridization, electrophoresis), chemical (polymerase chain reaction), and biological (genetic) techniques that provide the backbone of genomic analysis. These chapters cover in significant detail the fundamental principles underlying each technique and provide a firm background for the remainder of the text. Chapters 7­9 consider the need and methods for the development of physical maps. Chapter 7 primarily discusses chromosomal localization techniques, including in situ hybridization, FISH, and chromosome paintings. The next two chapters focus on the development of libraries and clones. In particular, Chapter 9 considers the limitations of current mapping and clone production. The current state and future of DNA sequencing is covered in the next three chapters. The first considers the current methods of DNA sequencing - especially gel-based methods of analysis, although other possible approaches (mass spectrometry) are introduced. Much of the chapter addresses the limitations of current methods, including analysis of error in sequencing and current bottlenecks in the sequencing effort. The next chapter describes the steps necessary to scale current technologies for the sequencing of entire genomes. Chapter 12 examines alternate methods for DNA sequencing. Initially, methods of single-molecule sequencing and sequencing by microscopy are introduced; the majority of the chapter is devoted to the development of DNA sequencing methods using chip microarrays and hybridization. The remaining chapters (13-15) consider the uses and analysis of DNA sequence information. The initial focus is on the identification of genes. Several examples are given of the use of DNA sequence information for diagnosis of inherited or infectious diseases. The sequence-specific manipulation of DNA is discussed in Chapter 14. The final chapter deals with the implications of large-scale sequencing, including methods for identifying genes and finding errors in DNA sequences, to the development of computer algorithms for the interpretation of DNA sequence information. The text figures are black and white line drawings that, although clearly done, seem a bit primitive for 1999. While I appreciated the simplicity of the drawings, many students accustomed to more colorful presentations will find them wanting. The four color figures in the center of the text seem an afterthought and add little to the text's clarity. Each chapter has a set of additional reading sources, mostly primary sources. Often, specialized topics are offset into boxes that provide clarification and amplification without cluttering the text. An appendix includes a list of the Web-based database resources. As an undergraduate instructor who has previously taught biochemistry, molecular biology, and a course on the human genome, I found many interesting tidbits and amplifications throughout the text. I would recommend this book as a text for an advanced undergraduate or beginning graduate course in genomics. Although the text works though several examples of genetic and genome analysis, additional problem/homework sets would need to be developed to ensure student comprehension. The text steers clear of the ethical implications of the Human Genome Initiative and remains true to its subtitle The Science and Technology .

  12. Constructing DNA Barcode Sets Based on Particle Swarm Optimization.

    PubMed

    Wang, Bin; Zheng, Xuedong; Zhou, Shihua; Zhou, Changjun; Wei, Xiaopeng; Zhang, Qiang; Wei, Ziqi

    2018-01-01

    Following the completion of the human genome project, a large amount of high-throughput bio-data was generated. To analyze these data, massively parallel sequencing, namely next-generation sequencing, was rapidly developed. DNA barcodes are used to identify the ownership between sequences and samples when they are attached at the beginning or end of sequencing reads. Constructing DNA barcode sets provides the candidate DNA barcodes for this application. To increase the accuracy of DNA barcode sets, a particle swarm optimization (PSO) algorithm has been modified and used to construct the DNA barcode sets in this paper. Compared with the extant results, some lower bounds of DNA barcode sets are improved. The results show that the proposed algorithm is effective in constructing DNA barcode sets.

  13. [Current situation and prospect of breast cancer liquid biopsy].

    PubMed

    Zhou, B; Xin, L; Xu, L; Ye, J M; Liu, Y H

    2018-02-01

    Liquid biopsy is a diagnostic approach by analyzing body fluid samples. Peripheral blood is the most common sample. Urine, saliva, pleural effusion and ascites are also used. Now liquid biopsy is mainly used in the area of neoplasm diagnosis and treatment. Compared with traditional tissue biopsy, liquid biopsy is minimally invasive, convenient to sample and easy to repeat. Liquid biopsy mainly includes circulating tumor cells and circulating tumor DNA (ctDNA) detection. Detection of ctDNA requires sensitive and accurate methods. The progression of next-generation sequencing (NGS) and digital PCR promote the process of studies in ctDNA. In 2016, Nature published the result of whole-genome sequencing study of breast cancer. The study found 1 628 mutations of 93 protein-coding genes which may be driver mutations of breast cancer. The result of this study provided a new platform for breast cancer ctDNA studies. In recent years, there were many studies using ctDNA detection to monitor therapeutic effect and guide treatment. NGS is a promising technique in accessing genetic information and guiding targeted therapy. It must be emphasized that ctDNA detection using NGS is still at research stage. It is important to standardize ctDNA detection technique and perform prospective clinical researches. The time is not ripe for using ctDNA detection to guide large-scale breast cancer clinical practice at present.

  14. cDNA cloning and heterologous expression of a wheat proteinase inhibitor of subtilisin and chymotrypsin (WSCI) that interferes with digestive enzymes of insect pests.

    PubMed

    Di Gennaro, Simone; Ficca, Anna G; Panichi, Daniela; Poerio, Elia

    2005-04-01

    A cDNA encoding the proteinase inhibitor WSCI (wheat subtilisin/chymotrypsin inhibitor) was isolated by RT-PCR. Degenerate oligonucleotide primers were designed based on the amino acid sequence of WSCI and on the nucleotide sequence of the two homologous inhibitors (CI-2A and CI-2B) isolated from barley. For large-scale production, wsci cDNA was cloned into the E. coli vector pGEX-2T. The fusion protein GST-WSCI was efficiently produced in the bacterial expression system and, as the native inhibitor, was capable of inhibiting bacterial subtilisin, mammalian chymotrypsins and chymotrypsin-like activities present in crude extracts of a number of insect larvae ( Helicoverpa armigera , Plodia interpunctella and Tenebrio molitor ). The recombinant protein produced was also able to interfere with chymotrypsin-like activity isolated from immature wheat caryopses. These findings support a physiological role for this inhibitor during grain maturation.

  15. Dynamic DNA cytosine methylation in the Populus trichocarpa genome: tissue-level variation and relationship to gene expression

    PubMed Central

    2012-01-01

    Background DNA cytosine methylation is an epigenetic modification that has been implicated in many biological processes. However, large-scale epigenomic studies have been applied to very few plant species, and variability in methylation among specialized tissues and its relationship to gene expression is poorly understood. Results We surveyed DNA methylation from seven distinct tissue types (vegetative bud, male inflorescence [catkin], female catkin, leaf, root, xylem, phloem) in the reference tree species black cottonwood (Populus trichocarpa). Using 5-methyl-cytosine DNA immunoprecipitation followed by Illumina sequencing (MeDIP-seq), we mapped a total of 129,360,151 36- or 32-mer reads to the P. trichocarpa reference genome. We validated MeDIP-seq results by bisulfite sequencing, and compared methylation and gene expression using published microarray data. Qualitative DNA methylation differences among tissues were obvious on a chromosome scale. Methylated genes had lower expression than unmethylated genes, but genes with methylation in transcribed regions ("gene body methylation") had even lower expression than genes with promoter methylation. Promoter methylation was more frequent than gene body methylation in all tissues except male catkins. Male catkins differed in demethylation of particular transposable element categories, in level of gene body methylation, and in expression range of genes with methylated transcribed regions. Tissue-specific gene expression patterns were correlated with both gene body and promoter methylation. Conclusions We found striking differences among tissues in methylation, which were apparent at the chromosomal scale and when genes and transposable elements were examined. In contrast to other studies in plants, gene body methylation had a more repressive effect on transcription than promoter methylation. PMID:22251412

  16. Assay for identification of heterozygous single-nucleotide polymorphism (Ala67Thr) in human poliovirus receptor gene.

    PubMed

    Nandi, Shyam Sundar; Sharma, Deepa Kailash; Deshpande, Jagadish M

    2016-07-01

    It is important to understand the role of cell surface receptors in susceptibility to infectious diseases. CD155 a member of the immunoglobulin super family, serves as the poliovirus receptor (PVR). Heterozygous (Ala67Thr) polymorphism in CD155 has been suggested as a risk factor for paralytic outcome of poliovirus infection. The present study pertains to the development of a screening test to detect the single nucleotide (SNP) polymorphism in the CD155 gene. New primers were designed for PCR, sequencing and SNP analysis of Exon2 of CD155 gene. DNAs extracted from either whole blood (n=75) or cells from oral cavity (n=75) were used for standardization and validation of the SNP assay. DNA sequencing was used as the gold standard method. A new SNP assay for detection of heterozygous Ala67Thr genotype was developed and validated by testing 150 DNA samples. Heterozygous CD155 was detected in 27.33 per cent (41/150) of DNA samples tested by both SNP detection assay and sequencing. The SNP detection assay was successfully developed for identification of Ala67Thr polymorphism in human PVR/CD155 gene. The SNP assay will be useful for large scale screening of DNA samples.

  17. Progress in ion torrent semiconductor chip based sequencing.

    PubMed

    Merriman, Barry; Rothberg, Jonathan M

    2012-12-01

    In order for next-generation sequencing to become widely used as a diagnostic in the healthcare industry, sequencing instrumentation will need to be mass produced with a high degree of quality and economy. One way to achieve this is to recast DNA sequencing in a format that fully leverages the manufacturing base created for computer chips, complementary metal-oxide semiconductor chip fabrication, which is the current pinnacle of large scale, high quality, low-cost manufacturing of high technology. To achieve this, ideally the entire sensory apparatus of the sequencer would be embodied in a standard semiconductor chip, manufactured in the same fab facilities used for logic and memory chips. Recently, such a sequencing chip, and the associated sequencing platform, has been developed and commercialized by Ion Torrent, a division of Life Technologies, Inc. Here we provide an overview of this semiconductor chip based sequencing technology, and summarize the progress made since its commercial introduction. We described in detail the progress in chip scaling, sequencing throughput, read length, and accuracy. We also summarize the enhancements in the associated platform, including sample preparation, data processing, and engagement of the broader development community through open source and crowdsourcing initiatives. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. 3' rapid amplification of cDNA ends (RACE) walking for rapid structural analysis of large transcripts.

    PubMed

    Ozawa, Tatsuhiko; Kondo, Masato; Isobe, Masaharu

    2004-01-01

    The 3' rapid amplification of cDNA ends (3' RACE) is widely used to isolate the cDNA of unknown 3' flanking sequences. However, the conventional 3' RACE often fails to amplify cDNA from a large transcript if there is a long distance between the 5' gene-specific primer and poly(A) stretch, since the conventional 3' RACE utilizes 3' oligo-dT-containing primer complementary to the poly(A) tail of mRNA at the first strand cDNA synthesis. To overcome this problem, we have developed an improved 3' RACE method suitable for the isolation of cDNA derived from very large transcripts. By using the oligonucleotide-containing random 9mer together with the GC-rich sequence for the suppression PCR technology at the first strand of cDNA synthesis, we have been able to amplify the cDNA from a very large transcript, such as the microtubule-actin crosslinking factor 1 (MACF1) gene, which codes a transcript of 20 kb in size. When there is no splicing variant, our highly specific amplification allows us to perform the direct sequencing of 3' RACE products without requiring cloning in bacterial hosts. Thus, this stepwise 3' RACE walking will help rapid characterization of the 3' structure of a gene, even when it encodes a very large transcript.

  19. Noninvasive prenatal diagnosis of common aneuploidies by semiconductor sequencing

    PubMed Central

    Liao, Can; Yin, Ai-hua; Peng, Chun-fang; Fu, Fang; Yang, Jie-xia; Li, Ru; Chen, Yang-yi; Luo, Dong-hong; Zhang, Yong-ling; Ou, Yan-mei; Li, Jian; Wu, Jing; Mai, Ming-qin; Hou, Rui; Wu, Frances; Luo, Hongrong; Li, Dong-zhi; Liu, Hai-liang; Zhang, Xiao-zhuang; Zhang, Kang

    2014-01-01

    Massively parallel sequencing (MPS) of cell-free fetal DNA from maternal plasma has revolutionized our ability to perform noninvasive prenatal diagnosis. This approach avoids the risk of fetal loss associated with more invasive diagnostic procedures. The present study developed an effective method for noninvasive prenatal diagnosis of common chromosomal aneuploidies using a benchtop semiconductor sequencing platform (SSP), which relies on the MPS platform but offers advantages over existing noninvasive screening techniques. A total of 2,275 pregnant subjects was included in the study; of these, 515 subjects who had full karyotyping results were used in a retrospective analysis, and 1,760 subjects without karyotyping were analyzed in a prospective study. In the retrospective study, all 55 fetal trisomy 21 cases were identified using the SSP with a sensitivity and specificity of 99.94% and 99.46%, respectively. The SSP also detected 16 trisomy 18 cases with 100% sensitivity and 99.24% specificity and 3 trisomy 13 cases with 100% sensitivity and 100% specificity. Furthermore, 15 fetuses with sex chromosome aneuploidies (10 45,X, 2 47,XYY, 2 47,XXX, and 1 47,XXY) were detected. In the prospective study, nine fetuses with trisomy 21, three with trisomy 18, three with trisomy 13, and one with 45,X were detected. To our knowledge, this is the first large-scale clinical study to systematically identify chromosomal aneuploidies based on cell-free fetal DNA using the SSP and provides an effective strategy for large-scale noninvasive screening for chromosomal aneuploidies in a clinical setting. PMID:24799683

  20. Optical Materials with a Genome: Nanophotonics with DNA-Stabilized Silver Clusters

    NASA Astrophysics Data System (ADS)

    Copp, Stacy M.

    Fluorescent silver clusters with unique rod-like geometries are stabilized by DNA. The sizes and colors of these clusters, or AgN-DNA, are selected by DNA base sequence, which can tune peak emission from blue-green into the near-infrared. Combined with DNA nanostructures, AgN-DNA promise exciting applications in nanophotonics and sensing. Until recently, however, a lack of understanding of the mechanisms controlling AgN-DNA fluorescence has challenged such applications. This dissertation discusses progress toward understanding the role of DNA as a "genome" for silver clusters and toward using DNA to achieve atomic-scale precision of silver cluster size and nanometer-scale precision of silver cluster position on a DNA breadboard. We also investigate sensitivity of AgN-DNA to local solvent environment, with an eye toward applications in chemical and biochemical sensing. Using robotic techniques to generate large data sets, we show that fluorescent silver clusters are templated by certain DNA base motifs that select "magic-sized" cluster cores of enhanced stabilities. The linear arrangement of bases on the phosphate backbone imposes a unique rod-like geometry on the clusters. Harnessing machine learning and bioinformatics techniques, we also demonstrate that sequences of DNA templates can be selected to stabilize silver clusters with desired optical properties, including high fluorescence intensity and specific fluorescence wavelengths, with much higher rates of success as compared to current strategies. The discovered base motifs can be also used to design modular DNA host strands that enable individual silver clusters with atomically precise sizes to bind at specific programmed locations on a DNA nanostructure. We show that DNA-mediated nanoscale arrangement enables near-field coupling of distinct clusters, demonstrated by dual-color cluster assemblies exhibiting resonant energy transfer. These results demonstrate a new degree of control over the optical properties and relative positions of nanoparticles, selected almost solely by the sequence of DNA. AgN-DNA are promising chemical and biochemical sensors due to the sensitivity of their fluorescence to local environment. However, the mechanisms behind many sensing schemes are not understood, and the nature of the excited state of the silver cluster itself remains unknown. To probe the fluorescence mechanisms of AgN-DNA, we investigate the behavior of purified solutions of these clusters in various solvents. We find that standard models for fluorophore solvatochromism, including the Lippert-Mataga model, do not describe AgN-DNA fluorescence because such models neglect specific interactions between the cluster and surrounding solvent molecules. Fluorescence colors are well-modeled by Mie-Gans theory, suggesting that the local dielectric environment of the cluster does play a role in fluorescence, although additional specific solvent interactions and cluster shape changes may also determine fluorescence color and intensity. These results suggest that AgN-DNA may be sensitive to changes in local dielectric environment on nanometer length scales and may also act as sensors for small molecules with affinity for DNA.

  1. Equilibrium properties of DNA and other semiflexible polymers confined in nanochannels

    NASA Astrophysics Data System (ADS)

    Muralidhar, Abhiram

    Recent developments in next-generation sequencing (NGS) techniques have opened the door for low-cost, high-throughput sequencing of genomes. However, these developments have also exposed the inability of NGS to track large scale genomic information, which are extremely important to understand the relationship between genotype and phenotype. Genome mapping offers a reliable way to obtain information about large-scale structural variations in a given genome. A promising variant of genome mapping involves confining single DNA molecules in nanochannels whose cross-sectional dimensions are approximately 50 nm. Despite the development and commercialization of nanochannel-based genome mapping technology, the polymer physics of DNA in confinement is only beginning to be understood. Apart from its biological relevance, DNA is also used as a model polymer in experiments by polymer physicists. Indeed, the seminal experiments by Reisner et al. (2005) of DNA confined in nanochannels of different widths revealed discrepancies with the classical theories of Odijk and de Gennes for polymer confinement. Picking up from the conclusions of the dissertation of Tree (2014), this dissertation addresses a number of key outstanding problems in the area of nanoconfined DNA. Adopting a Monte Carlo chain growth technique known as the pruned-enriched Rosenbluth method, we examine the equilibrium and near-equilibrium properties of DNA and other semiflexible polymers in nanochannel confinement. We begin by analyzing the dependence of molecular weight on various thermodynamic properties of confined semiflexible polymers. This allows us to point out the finite size effects that can occur when using low molecular weight DNA in experiments. We then analyze the statistics of backfolding and hairpin formation in the context of existing theories and discuss how our results can be used to engineer better conditions for genome mapping. Finally, we elucidate the diffusion behavior of confined semiflexible polymers by comparing and contrasting our results for asymptotically long chains with other similar studies in the literature. We expect our findings to be not only beneficial to the design of better genome mapping devices, but also to the fundamental understanding of semiflexible polymers in confinement.

  2. Dimensions of biodiversity in the Earth mycobiome.

    PubMed

    Peay, Kabir G; Kennedy, Peter G; Talbot, Jennifer M

    2016-07-01

    Fungi represent a large proportion of the genetic diversity on Earth and fungal activity influences the structure of plant and animal communities, as well as rates of ecosystem processes. Large-scale DNA-sequencing datasets are beginning to reveal the dimensions of fungal biodiversity, which seem to be fundamentally different to bacteria, plants and animals. In this Review, we describe the patterns of fungal biodiversity that have been revealed by molecular-based studies. Furthermore, we consider the evidence that supports the roles of different candidate drivers of fungal diversity at a range of spatial scales, as well as the role of dispersal limitation in maintaining regional endemism and influencing local community assembly. Finally, we discuss the ecological mechanisms that are likely to be responsible for the high heterogeneity that is observed in fungal communities at local scales.

  3. A phylogenetic analysis of armored scale insects (Hemiptera: Diaspididae), based upon nuclear, mitochondrial, and endosymbiont gene sequences.

    PubMed

    Andersen, Jeremy C; Wu, Jin; Gruwell, Matthew E; Gwiazdowski, Rodger; Santana, Sharlene E; Feliciano, Natalie M; Morse, Geoffrey E; Normark, Benjamin B

    2010-12-01

    Armored scale insects (Hemiptera: Diaspididae) are among the most invasive insects in the world. They have unusual genetic systems, including diverse types of paternal genome elimination (PGE) and parthenogenesis. Intimate relationships with their host plants and bacterial endosymbionts make them potentially important subjects for the study of co-evolution. Here, we expand upon recent phylogenetic work (Morse and Normark, 2006) by analyzing armored scale and endosymbiont DNA sequences from 125 species of armored scale insect, represented by 253 samples and eight outgroup species. We used fragments of four different gene regions: the nuclear protein-coding gene Elongation Factor 1α (EF1α), the large ribosomal subunit (28S) rDNA, a mitochondrial region spanning parts of cytochrome oxidase I (COI) and cytochrome oxidase II (COII), and the small ribosomal subunit (16S) rDNA from the primary bacterial endosymbiont Uzinura diaspidicola. Maximum likelihood, and Bayesian analyses were performed producing highly congruent topological results. A comparison of two datasets, one with and one without missing data, found that missing data had little effect on topology. Our results broadly corroborate several major features of the existing classification, although we do not find any of the subfamilies, tribes or subtribes to be monophyletic as currently constituted. Using ancestral state reconstruction we estimate that the ancestral armored scale had the late PGE sex system, and it may as well have been pupillarial, though results differed between reconstruction methods. These results highlight the need for a complete revision of this family, and provide the groundwork for future taxonomic work in armored scale insects. Copyright © 2010 Elsevier Inc. All rights reserved.

  4. cgDNAweb: a web interface to the cgDNA sequence-dependent coarse-grain model of double-stranded DNA.

    PubMed

    De Bruin, Lennart; Maddocks, John H

    2018-06-14

    The sequence-dependent statistical mechanical properties of fragments of double-stranded DNA is believed to be pertinent to its biological function at length scales from a few base pairs (or bp) to a few hundreds of bp, e.g. indirect read-out protein binding sites, nucleosome positioning sequences, phased A-tracts, etc. In turn, the equilibrium statistical mechanics behaviour of DNA depends upon its ground state configuration, or minimum free energy shape, as well as on its fluctuations as governed by its stiffness (in an appropriate sense). We here present cgDNAweb, which provides browser-based interactive visualization of the sequence-dependent ground states of double-stranded DNA molecules, as predicted by the underlying cgDNA coarse-grain rigid-base model of fragments with arbitrary sequence. The cgDNAweb interface is specifically designed to facilitate comparison between ground state shapes of different sequences. The server is freely available at cgDNAweb.epfl.ch with no login requirement.

  5. Analysis of protein-coding genetic variation in 60,706 humans.

    PubMed

    Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G

    2016-08-18

    Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

  6. Sequence and Analysis of the Tomato JOINTLESS Locus1

    PubMed Central

    Mao, Long; Begum, Dilara; Goff, Stephen A.; Wing, Rod A.

    2001-01-01

    A 119-kb bacterial artificial chromosome from the JOINTLESS locus on the tomato (Lycopersicon esculentum) chromosome 11 contained 15 putative genes. Repetitive sequences in this region include one copia-like LTR retrotransposon, 13 simple sequence repeats, three copies of a novel type III foldback transposon, and four putative short DNA repeats. Database searches showed that the foldback transposon and the short DNA repeats seemed to be associated preferably with genes. The predicted tomato genes were compared with the complete Arabidopsis genome. Eleven out of 15 tomato open reading frames were found to be colinear with segments on five Arabidopsis bacterial artificial chromosome/P1-derived artificial chromosome clones. The synteny patterns, however, did not reveal duplicated segments in Arabidopsis, where over half of the genome is duplicated. Our analysis indicated that the microsynteny between the tomato and Arabidopsis genomes was still conserved at a very small scale but was complicated by the large number of gene families in the Arabidopsis genome. PMID:11457984

  7. DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation

    PubMed Central

    Boedicker, James Q.; Garcia, Hernan G.; Johnson, Stephanie; Phillips, Rob

    2014-01-01

    As the chief informational molecule of life, DNA is subject to extensive physical manipulations. The energy required to deform double-helical DNA depends on sequence, and this mechanical code of DNA influences gene regulation, such as through nucleosome positioning. Here we examine the sequence-dependent flexibility of DNA in bacterial transcription factor-mediated looping, a context for which the role of sequence remains poorly understood. Using a suite of synthetic constructs repressed by the Lac repressor and two well-known sequences that show large flexibility differences in vitro, we make precise statistical mechanical predictions as to how DNA sequence influences loop formation and test these predictions using in vivo transcription and in vitro single-molecule assays. Surprisingly, sequence-dependent flexibility does not affect in vivo gene regulation. By theoretically and experimentally quantifying the relative contributions of sequence and the DNA-bending protein HU to DNA mechanical properties, we reveal that bending by HU dominates DNA mechanics and masks intrinsic sequence-dependent flexibility. Such a quantitative understanding of how mechanical regulatory information is encoded in the genome will be a key step towards a predictive understanding of gene regulation at single-base pair resolution. PMID:24231252

  8. Comprehensive definition of genome features in Spirodela polyrhiza by high-depth physical mapping and short-read DNA sequencing strategies.

    PubMed

    Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric

    2017-02-01

    Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  9. Molecular approach to annelid regeneration: cDNA subtraction cloning reveals various novel genes that are upregulated during the large-scale regeneration of the oligochaete, Enchytraeus japonensis.

    PubMed

    Myohara, Maroko; Niva, Cintia Carla; Lee, Jae Min

    2006-08-01

    To identify genes specifically activated during annelid regeneration, suppression subtractive hybridization was performed with cDNAs from regenerating and intact Enchytraeus japonensis, a terrestrial oligochaete that can regenerate a complete organism from small body fragments within 4-5 days. Filter array screening subsequently revealed that about 38% of the forward-subtracted cDNA clones contained genes that were upregulated during regeneration. Two hundred seventy-nine of these clones were sequenced and found to contain 165 different sequences (79 known and 86 unknown). Nine clones were fully sequenced and four of these sequences were matched to known genes for glutamine synthetase, glucosidase 1, retinal protein 4, and phosphoribosylaminoimidazole carboxylase, respectively. The remaining five clones encoded an unknown open-reading frame. The expression levels of these genes were highest during blastema formation. Our present results, therefore, demonstrate the great potential of annelids as a new experimental subject for the exploration of unknown genes that play critical roles in animal regeneration.

  10. Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana.

    PubMed

    Lin, X; Kaul, S; Rounsley, S; Shea, T P; Benito, M I; Town, C D; Fujii, C Y; Mason, T; Bowman, C L; Barnstead, M; Feldblyum, T V; Buell, C R; Ketchum, K A; Lee, J; Ronning, C M; Koo, H L; Moffat, K S; Cronin, L A; Shen, M; Pai, G; Van Aken, S; Umayam, L; Tallon, L J; Gill, J E; Adams, M D; Carrera, A J; Creasy, T H; Goodman, H M; Somerville, C R; Copenhaver, G P; Preuss, D; Nierman, W C; White, O; Eisen, J A; Salzberg, S L; Fraser, C M; Venter, J C

    1999-12-16

    Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130-140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.

  11. Comparative 454 pyrosequencing of transcripts from two olive genotypes during fruit development

    PubMed Central

    Alagna, Fiammetta; D'Agostino, Nunzio; Torchia, Laura; Servili, Maurizio; Rao, Rosa; Pietrella, Marco; Giuliano, Giovanni; Chiusano, Maria Luisa; Baldoni, Luciana; Perrotta, Gaetano

    2009-01-01

    Background Despite its primary economic importance, genomic information on olive tree is still lacking. 454 pyrosequencing was used to enrich the very few sequence data currently available for the Olea europaea species and to identify genes involved in expression of fruit quality traits. Results Fruits of Coratina, a widely cultivated variety characterized by a very high phenolic content, and Tendellone, an oleuropein-lacking natural variant, were used as starting material for monitoring the transcriptome. Four different cDNA libraries were sequenced, respectively at the beginning and at the end of drupe development. A total of 261,485 reads were obtained, for an output of about 58 Mb. Raw sequence data were processed using a four step pipeline procedure and data were stored in a relational database with a web interface. Conclusion Massively parallel sequencing of different fruit cDNA collections has provided large scale information about the structure and putative function of gene transcripts accumulated during fruit development. Comparative transcript profiling allowed the identification of differentially expressed genes with potential relevance in regulating the fruit metabolism and phenolic content during ripening. PMID:19709400

  12. Continuous Influx of Genetic Material from Host to Virus Populations

    PubMed Central

    Gilbert, Clément; Peccoud, Jean; Chateigner, Aurélien; Moumen, Bouziane

    2016-01-01

    Many genes of large double-stranded DNA viruses have a cellular origin, suggesting that host-to-virus horizontal transfer (HT) of DNA is recurrent. Yet, the frequency of these transfers has never been assessed in viral populations. Here we used ultra-deep DNA sequencing of 21 baculovirus populations extracted from two moth species to show that a large diversity of moth DNA sequences (n = 86) can integrate into viral genomes during the course of a viral infection. The majority of the 86 different moth DNA sequences are transposable elements (TEs, n = 69) belonging to 10 superfamilies of DNA transposons and three superfamilies of retrotransposons. The remaining 17 sequences are moth sequences of unknown nature. In addition to bona fide DNA transposition, we uncover microhomology-mediated recombination as a mechanism explaining integration of moth sequences into viral genomes. Many sequences integrated multiple times at multiple positions along the viral genome. We detected a total of 27,504 insertions of moth sequences in the 21 viral populations and we calculate that on average, 4.8% of viruses harbor at least one moth sequence in these populations. Despite this substantial proportion, no insertion of moth DNA was maintained in any viral population after 10 successive infection cycles. Hence, there is a constant turnover of host DNA inserted into viral genomes each time the virus infects a moth. Finally, we found that at least 21 of the moth TEs integrated into viral genomes underwent repeated horizontal transfers between various insect species, including some lepidopterans susceptible to baculoviruses. Our results identify host DNA influx as a potent source of genetic diversity in viral populations. They also support a role for baculoviruses as vectors of DNA HT between insects, and call for an evaluation of possible gene or TE spread when using viruses as biopesticides or gene delivery vectors. PMID:26829124

  13. Continuous Influx of Genetic Material from Host to Virus Populations.

    PubMed

    Gilbert, Clément; Peccoud, Jean; Chateigner, Aurélien; Moumen, Bouziane; Cordaux, Richard; Herniou, Elisabeth A

    2016-02-01

    Many genes of large double-stranded DNA viruses have a cellular origin, suggesting that host-to-virus horizontal transfer (HT) of DNA is recurrent. Yet, the frequency of these transfers has never been assessed in viral populations. Here we used ultra-deep DNA sequencing of 21 baculovirus populations extracted from two moth species to show that a large diversity of moth DNA sequences (n = 86) can integrate into viral genomes during the course of a viral infection. The majority of the 86 different moth DNA sequences are transposable elements (TEs, n = 69) belonging to 10 superfamilies of DNA transposons and three superfamilies of retrotransposons. The remaining 17 sequences are moth sequences of unknown nature. In addition to bona fide DNA transposition, we uncover microhomology-mediated recombination as a mechanism explaining integration of moth sequences into viral genomes. Many sequences integrated multiple times at multiple positions along the viral genome. We detected a total of 27,504 insertions of moth sequences in the 21 viral populations and we calculate that on average, 4.8% of viruses harbor at least one moth sequence in these populations. Despite this substantial proportion, no insertion of moth DNA was maintained in any viral population after 10 successive infection cycles. Hence, there is a constant turnover of host DNA inserted into viral genomes each time the virus infects a moth. Finally, we found that at least 21 of the moth TEs integrated into viral genomes underwent repeated horizontal transfers between various insect species, including some lepidopterans susceptible to baculoviruses. Our results identify host DNA influx as a potent source of genetic diversity in viral populations. They also support a role for baculoviruses as vectors of DNA HT between insects, and call for an evaluation of possible gene or TE spread when using viruses as biopesticides or gene delivery vectors.

  14. Limited Phylogeographic Signal in Sex-Linked and Autosomal Loci Despite Geographically, Ecologically, and Phenotypically Concordant Structure of mtDNA Variation in the Holarctic Avian Genus Eremophila

    PubMed Central

    Drovetski, Sergei V.; Raković, Marko; Semenov, Georgy; Fadeev, Igor V.; Red’kin, Yaroslav A.

    2014-01-01

    Phylogeographic studies of Holarctic birds are challenging because they involve vast geographic scale, complex glacial history, extensive phenotypic variation, and heterogeneous taxonomic treatment across countries, all of which require large sample sizes. Knowledge about the quality of phylogeographic information provided by different loci is crucial for study design. We use sequences of one mtDNA gene, one sex-linked intron, and one autosomal intron to elucidate large scale phylogeographic patterns in the Holarctic lark genus Eremophila. The mtDNA ND2 gene identified six geographically, ecologically, and phenotypically concordant clades in the Palearctic that diverged in the Early - Middle Pleistocene and suggested paraphyly of the horned lark (E. alpestris) with respect to the Temminck's lark (E. bilopha). In the Nearctic, ND2 identified five subclades which diverged in the Late Pleistocene. They overlapped geographically and were not concordant phenotypically or ecologically. Nuclear alleles provided little information on geographic structuring of genetic variation in horned larks beyond supporting the monophyly of Eremophila and paraphyly of the horned lark. Multilocus species trees based on two nuclear or all three loci provided poor support for haplogroups identified by mtDNA. The node ages calculated using mtDNA were consistent with the available paleontological data, whereas individual nuclear loci and multilocus species trees appeared to underestimate node ages. We argue that mtDNA is capable of discovering independent evolutionary units within avian taxa and can provide a reasonable phylogeographic hypothesis when geographic scale, geologic history, and phenotypic variation in the study system are too complex for proposing reasonable a priori hypotheses required for multilocus methods. Finally, we suggest splitting the currently recognized horned lark into five Palearctic and one Nearctic species. PMID:24498139

  15. Revealing Less Derived Nature of Cartilaginous Fish Genomes with Their Evolutionary Time Scale Inferred with Nuclear Genes

    PubMed Central

    Renz, Adina J.; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540

  16. Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

    PubMed

    Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.

  17. Predicting conformational ensembles and genome-wide transcription factor binding sites from DNA sequences.

    PubMed

    Andrabi, Munazah; Hutchins, Andrew Paul; Miranda-Saavedra, Diego; Kono, Hidetoshi; Nussinov, Ruth; Mizuguchi, Kenji; Ahmad, Shandar

    2017-06-22

    DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.

  18. A parallel and sensitive software tool for methylation analysis on multicore platforms.

    PubMed

    Tárraga, Joaquín; Pérez, Mariano; Orduña, Juan M; Duato, José; Medina, Ignacio; Dopazo, Joaquín

    2015-10-01

    DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. We present a new software tool, called HPG-Methyl, which efficiently maps bisulphite sequencing reads on DNA, analyzing DNA methylation. The strategy used by this software consists of leveraging the speed of the Burrows-Wheeler Transform to map a large number of DNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, which is exclusively employed to deal with the most ambiguous and shortest reads. Experimental results on platforms with Intel multicore processors show that HPG-Methyl significantly outperforms in both execution time and sensitivity state-of-the-art software such as Bismark, BS-Seeker or BSMAP, particularly for long bisulphite reads. Software in the form of C libraries and functions, together with instructions to compile and execute this software. Available by sftp to anonymous@clariano.uv.es (password 'anonymous'). juan.orduna@uv.es or jdopazo@cipf.es. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Biotechnological applications of mobile group II introns and their reverse transcriptases: gene targeting, RNA-seq, and non-coding RNA analysis.

    PubMed

    Enyeart, Peter J; Mohr, Georg; Ellington, Andrew D; Lambowitz, Alan M

    2014-01-13

    Mobile group II introns are bacterial retrotransposons that combine the activities of an autocatalytic intron RNA (a ribozyme) and an intron-encoded reverse transcriptase to insert site-specifically into DNA. They recognize DNA target sites largely by base pairing of sequences within the intron RNA and achieve high DNA target specificity by using the ribozyme active site to couple correct base pairing to RNA-catalyzed intron integration. Algorithms have been developed to program the DNA target site specificity of several mobile group II introns, allowing them to be made into 'targetrons.' Targetrons function for gene targeting in a wide variety of bacteria and typically integrate at efficiencies high enough to be screened easily by colony PCR, without the need for selectable markers. Targetrons have found wide application in microbiological research, enabling gene targeting and genetic engineering of bacteria that had been intractable to other methods. Recently, a thermostable targetron has been developed for use in bacterial thermophiles, and new methods have been developed for using targetrons to position recombinase recognition sites, enabling large-scale genome-editing operations, such as deletions, inversions, insertions, and 'cut-and-pastes' (that is, translocation of large DNA segments), in a wide range of bacteria at high efficiency. Using targetrons in eukaryotes presents challenges due to the difficulties of nuclear localization and sub-optimal magnesium concentrations, although supplementation with magnesium can increase integration efficiency, and directed evolution is being employed to overcome these barriers. Finally, spurred by new methods for expressing group II intron reverse transcriptases that yield large amounts of highly active protein, thermostable group II intron reverse transcriptases from bacterial thermophiles are being used as research tools for a variety of applications, including qRT-PCR and next-generation RNA sequencing (RNA-seq). The high processivity and fidelity of group II intron reverse transcriptases along with their novel template-switching activity, which can directly link RNA-seq adaptor sequences to cDNAs during reverse transcription, open new approaches for RNA-seq and the identification and profiling of non-coding RNAs, with potentially wide applications in research and biotechnology.

  20. Towards decoding the conifer giga-genome.

    PubMed

    Mackay, John; Dean, Jeffrey F D; Plomion, Christophe; Peterson, Daniel G; Cánovas, Francisco M; Pavy, Nathalie; Ingvarsson, Pär K; Savolainen, Outi; Guevara, M Ángeles; Fluch, Silvia; Vinceti, Barbara; Abarca, Dolores; Díaz-Sala, Carmen; Cervera, María-Teresa

    2012-12-01

    Several new initiatives have been launched recently to sequence conifer genomes including pines, spruces and Douglas-fir. Owing to the very large genome sizes ranging from 18 to 35 gigabases, sequencing even a single conifer genome had been considered unattainable until the recent throughput increases and cost reductions afforded by next generation sequencers. The purpose of this review is to describe the context for these new initiatives. A knowledge foundation has been acquired in several conifers of commercial and ecological interest through large-scale cDNA analyses, construction of genetic maps and gene mapping studies aiming to link phenotype and genotype. Exploratory sequencing in pines and spruces have pointed out some of the unique properties of these giga-genomes and suggested strategies that may be needed to extract value from their sequencing. The hope is that recent and pending developments in sequencing technology will contribute to rapidly filling the knowledge vacuum surrounding their structure, contents and evolution. Researchers are also making plans to use comparative analyses that will help to turn the data into a valuable resource for enhancing and protecting the world's conifer forests.

  1. Genome and methylome of the oleaginous diatom Cyclotella cryptica reveal genetic flexibility toward a high lipid phenotype

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Traller, Jesse C.; Cokus, Shawn J.; Lopez, David A.

    Here, improvement in the performance of eukaryotic microalgae for biofuel and bioproduct production is largely dependent on characterization of metabolic mechanisms within the cell. The marine diatom Cyclotella cryptica, which was originally identified in the Aquatic Species Program, is a promising strain of microalgae for large-scale production of biofuel and bioproducts, such as omega-3 fatty acids. As a result, we sequenced the nuclear genome and methylome of this oleaginous diatom to identify the genetic traits that enable substantial accumulation of triacylglycerol. The genome is comprised of highly methylated repetitive sequence, which does not significantly change under silicon starved lipid induction,more » and data further suggests the primary role of DNA methylation is to suppress DNA transposition. Annotation of pivotal glycolytic, lipid metabolism, and carbohydrate degradation processes reveal an expanded enzyme repertoire in C. cryptica that would allow for an increased metabolic capacity toward triacylglycerol production. Identification of previously unidentified genes, including those involved in carbon transport and chitin metabolism, provide potential targets for genetic manipulation of carbon flux to further increase its lipid phenotype. New genetic tools were developed, bringing this organism on a par with other microalgae in terms of genetic manipulation and characterization approaches. Furthermore, functional annotation and detailed cross-species comparison of key carbon rich processes in C. cryptica highlights the importance of enzymatic subcellular compartmentation for regulation of carbon flux, which is often overlooked in photosynthetic microeukaryotes. The availability of the genome sequence, as well as advanced genetic manipulation tools enable further development of this organism for deployment in large-scale production systems.« less

  2. Genome and methylome of the oleaginous diatom Cyclotella cryptica reveal genetic flexibility toward a high lipid phenotype

    DOE PAGES

    Traller, Jesse C.; Cokus, Shawn J.; Lopez, David A.; ...

    2016-11-25

    Here, improvement in the performance of eukaryotic microalgae for biofuel and bioproduct production is largely dependent on characterization of metabolic mechanisms within the cell. The marine diatom Cyclotella cryptica, which was originally identified in the Aquatic Species Program, is a promising strain of microalgae for large-scale production of biofuel and bioproducts, such as omega-3 fatty acids. As a result, we sequenced the nuclear genome and methylome of this oleaginous diatom to identify the genetic traits that enable substantial accumulation of triacylglycerol. The genome is comprised of highly methylated repetitive sequence, which does not significantly change under silicon starved lipid induction,more » and data further suggests the primary role of DNA methylation is to suppress DNA transposition. Annotation of pivotal glycolytic, lipid metabolism, and carbohydrate degradation processes reveal an expanded enzyme repertoire in C. cryptica that would allow for an increased metabolic capacity toward triacylglycerol production. Identification of previously unidentified genes, including those involved in carbon transport and chitin metabolism, provide potential targets for genetic manipulation of carbon flux to further increase its lipid phenotype. New genetic tools were developed, bringing this organism on a par with other microalgae in terms of genetic manipulation and characterization approaches. Furthermore, functional annotation and detailed cross-species comparison of key carbon rich processes in C. cryptica highlights the importance of enzymatic subcellular compartmentation for regulation of carbon flux, which is often overlooked in photosynthetic microeukaryotes. The availability of the genome sequence, as well as advanced genetic manipulation tools enable further development of this organism for deployment in large-scale production systems.« less

  3. Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs.

    PubMed

    Pancoska, Petr; Moravek, Zdenek; Moll, Ute M

    2004-01-01

    Nucleic acids are molecules of choice for both established and emerging nanoscale technologies. These technologies benefit from large functional densities of 'DNA processing elements' that can be readily manufactured. To achieve the desired functionality, polynucleotide sequences are currently designed by a process that involves tedious and laborious filtering of potential candidates against a series of requirements and parameters. Here, we present a complete novel methodology for the rapid rational design of large sets of DNA sequences. This method allows for the direct implementation of very complex and detailed requirements for the generated sequences, thus avoiding 'brute force' filtering. At the same time, these sequences have narrow distributions of melting temperatures. The molecular part of the design process can be done without computer assistance, using an efficient 'human engineering' approach by drawing a single blueprint graph that represents all generated sequences. Moreover, the method eliminates the necessity for extensive thermodynamic calculations. Melting temperature can be calculated only once (or not at all). In addition, the isostability of the sequences is independent of the selection of a particular set of thermodynamic parameters. Applications are presented for DNA sequence designs for microarrays, universal microarray zip sequences and electron transfer experiments.

  4. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

    PubMed

    Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

  5. DNA Looping Facilitates Targeting of a Chromatin Remodeling Enzyme

    PubMed Central

    Yadon, Adam N; Singh, Badri Nath; Hampsey, Michael; Tsukiyama, Toshio

    2013-01-01

    Summary ATP-dependent chromatin remodeling enzymes are highly abundant and play pivotal roles regulating DNA-dependent processes. The mechanisms by which they are targeted to specific loci have not been well understood on a genome-wide scale. Here we present evidence that a major targeting mechanism for the Isw2 chromatin remodeling enzyme to specific genomic loci is through sequence-specific transcription factor (TF)-dependent recruitment. Unexpectedly, Isw2 is recruited in a TF-dependent fashion to a large number of loci without TF binding sites. Using the 3C assay, we show that Isw2 can be targeted by Ume6- and TFIIB-dependent DNA looping. These results identify DNA looping as a previously unknown mechanism for the recruitment of a chromatin remodeling enzyme and defines a novel function for DNA looping. We also present evidence suggesting that Ume6-dependent DNA looping is involved in chromatin remodeling and transcriptional repression, revealing a mechanism by which the three-dimensional folding of chromatin affects DNA-dependent processes. PMID:23478442

  6. Molecular Cytogenetics Guides Massively Parallel Sequencing of a Radiation-Induced Chromosome Translocation in Human Cells.

    PubMed

    Cornforth, Michael N; Anur, Pavana; Wang, Nicholas; Robinson, Erin; Ray, F Andrew; Bedford, Joel S; Loucas, Bradford D; Williams, Eli S; Peto, Myron; Spellman, Paul; Kollipara, Rahul; Kittler, Ralf; Gray, Joe W; Bailey, Susan M

    2018-05-11

    Chromosome rearrangements are large-scale structural variants that are recognized drivers of oncogenic events in cancers of all types. Cytogenetics allows for their rapid, genome-wide detection, but does not provide gene-level resolution. Massively parallel sequencing (MPS) promises DNA sequence-level characterization of the specific breakpoints involved, but is strongly influenced by bioinformatics filters that affect detection efficiency. We sought to characterize the breakpoint junctions of chromosomal translocations and inversions in the clonal derivatives of human cells exposed to ionizing radiation. Here, we describe the first successful use of DNA paired-end analysis to locate and sequence across the breakpoint junctions of a radiation-induced reciprocal translocation. The analyses employed, with varying degrees of success, several well-known bioinformatics algorithms, a task made difficult by the involvement of repetitive DNA sequences. As for underlying mechanisms, the results of Sanger sequencing suggested that the translocation in question was likely formed via microhomology-mediated non-homologous end joining (mmNHEJ). To our knowledge, this represents the first use of MPS to characterize the breakpoint junctions of a radiation-induced chromosomal translocation in human cells. Curiously, these same approaches were unsuccessful when applied to the analysis of inversions previously identified by directional genomic hybridization (dGH). We conclude that molecular cytogenetics continues to provide critical guidance for structural variant discovery, validation and in "tuning" analysis filters to enable robust breakpoint identification at the base pair level.

  7. Smooth DNA transport through a narrowed pore geometry.

    PubMed

    Carson, Spencer; Wilson, James; Aksimentiev, Aleksei; Wanunu, Meni

    2014-11-18

    Voltage-driven transport of double-stranded DNA through nanoscale pores holds much potential for applications in quantitative molecular biology and biotechnology, yet the microscopic details of translocation have proven to be challenging to decipher. Earlier experiments showed strong dependence of transport kinetics on pore size: fast regular transport in large pores (> 5 nm diameter), and slower yet heterogeneous transport time distributions in sub-5 nm pores, which imply a large positional uncertainty of the DNA in the pore as a function of the translocation time. In this work, we show that this anomalous transport is a result of DNA self-interaction, a phenomenon that is strictly pore-diameter dependent. We identify a regime in which DNA transport is regular, producing narrow and well-behaved dwell-time distributions that fit a simple drift-diffusion theory. Furthermore, a systematic study of the dependence of dwell time on DNA length reveals a single power-law scaling of 1.37 in the range of 35-20,000 bp. We highlight the resolution of our nanopore device by discriminating via single pulses 100 and 500 bp fragments in a mixture with >98% accuracy. When coupled to an appropriate sequence labeling method, our observation of smooth DNA translocation can pave the way for high-resolution DNA mapping and sizing applications in genomics.

  8. Museum genomics: low-cost and high-accuracy genetic data from historical specimens.

    PubMed

    Rowe, Kevin C; Singhal, Sonal; Macmanes, Matthew D; Ayroles, Julien F; Morelli, Toni Lyn; Rubidge, Emily M; Bi, Ke; Moritz, Craig C

    2011-11-01

    Natural history collections are unparalleled repositories of geographical and temporal variation in faunal conditions. Molecular studies offer an opportunity to uncover much of this variation; however, genetic studies of historical museum specimens typically rely on extracting highly degraded and chemically modified DNA samples from skins, skulls or other dried samples. Despite this limitation, obtaining short fragments of DNA sequences using traditional PCR amplification of DNA has been the primary method for genetic study of historical specimens. Few laboratories have succeeded in obtaining genome-scale sequences from historical specimens and then only with considerable effort and cost. Here, we describe a low-cost approach using high-throughput next-generation sequencing to obtain reliable genome-scale sequence data from a traditionally preserved mammal skin and skull using a simple extraction protocol. We show that single-nucleotide polymorphisms (SNPs) from the genome sequences obtained independently from the skin and from the skull are highly repeatable compared to a reference genome. © 2011 Blackwell Publishing Ltd.

  9. Local Renyi entropic profiles of DNA sequences.

    PubMed

    Vinga, Susana; Almeida, Jonas S

    2007-10-16

    In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

  10. Local Renyi entropic profiles of DNA sequences

    PubMed Central

    Vinga, Susana; Almeida, Jonas S

    2007-01-01

    Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at . Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures. PMID:17939871

  11. Hofmeister series salts enhance purification of plasmid DNA by non-ionic detergents

    PubMed Central

    Lezin, George; Kuehn, Michael R.; Brunelli, Luca

    2011-01-01

    Ion-exchange chromatography is the standard technique used for plasmid DNA purification, an essential molecular biology procedure. Non-ionic detergents (NIDs) have been used for plasmid DNA purification, but it is unclear whether Hofmeister series salts (HSS) change the solubility and phase separation properties of specific NIDs, enhancing plasmid DNA purification. After scaling-up NID-mediated plasmid DNA isolation, we established that NIDs in HSS solutions minimize plasmid DNA contamination with protein. In addition, large-scale NID/HSS solutions eliminated LPS contamination of plasmid DNA more effectively than Qiagen ion-exchange columns. Large-scale NID isolation/NID purification generated increased yields of high quality DNA compared to alkali isolation/column purification. This work characterizes how HSS enhance NID-mediated plasmid DNA purification, and demonstrates that NID phase transition is not necessary for LPS removal from plasmid DNA. Specific NIDs such as IGEPAL CA-520 can be utilized for rapid, inexpensive and efficient laboratory-based large-scale plasmid DNA purification, outperforming Qiagen-based column procedures. PMID:21351074

  12. Stable isotope probing to study functional components of complex microbial ecosystems.

    PubMed

    Mazard, Sophie; Schäfer, Hendrik

    2014-01-01

    This protocol presents a method of dissecting the DNA or RNA of key organisms involved in a specific biochemical process within a complex ecosystem. Stable isotope probing (SIP) allows the labelling and separation of nucleic acids from community members that are involved in important biochemical transformations, yet are often not the most numerically abundant members of a community. This pure culture-independent technique circumvents limitations of traditional microbial isolation techniques or data mining from large-scale whole-community metagenomic studies to tease out the identities and genomic repertoires of microorganisms participating in biological nutrient cycles. SIP experiments can be applied to virtually any ecosystem and biochemical pathway under investigation provided a suitable stable isotope substrate is available. This versatile methodology allows a wide range of analyses to be performed, from fatty-acid analyses, community structure and ecology studies, and targeted metagenomics involving nucleic acid sequencing. SIP experiments provide an effective alternative to large-scale whole-community metagenomic studies by specifically targeting the organisms or biochemical transformations of interest, thereby reducing the sequencing effort and time-consuming bioinformatics analyses of large datasets.

  13. The paradox of HBV evolution as revealed from a 16th century mummy

    PubMed Central

    Duggan, Ana T.; Poinar, Debi; Poinar, Hendrik N.

    2018-01-01

    Hepatitis B virus (HBV) is a ubiquitous viral pathogen associated with large-scale morbidity and mortality in humans. However, there is considerable uncertainty over the time-scale of its origin and evolution. Initial shotgun data from a mid-16th century Italian child mummy, that was previously paleopathologically identified as having been infected with Variola virus (VARV, the agent of smallpox), showed no DNA reads for VARV yet did for hepatitis B virus (HBV). Previously, electron microscopy provided evidence for the presence of VARV in this sample, although similar analyses conducted here did not reveal any VARV particles. We attempted to enrich and sequence for both VARV and HBV DNA. Although we did not recover any reads identified as VARV, we were successful in reconstructing an HBV genome at 163.8X coverage. Strikingly, both the HBV sequence and that of the associated host mitochondrial DNA displayed a nearly identical cytosine deamination pattern near the termini of DNA fragments, characteristic of an ancient origin. In contrast, phylogenetic analyses revealed a close relationship between the putative ancient virus and contemporary HBV strains (of genotype D), at first suggesting contamination. In addressing this paradox we demonstrate that HBV evolution is characterized by a marked lack of temporal structure. This confounds attempts to use molecular clock-based methods to date the origin of this virus over the time-frame sampled so far, and means that phylogenetic measures alone cannot yet be used to determine HBV sequence authenticity. If genuine, this phylogenetic pattern indicates that the genotypes of HBV diversified long before the 16th century, and enables comparison of potential pathogenic similarities between modern and ancient HBV. These results have important implications for our understanding of the emergence and evolution of this common viral pathogen. PMID:29300782

  14. Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.

    PubMed

    Just, Rebecca S; Scheible, Melissa K; Fast, Spence A; Sturk-Andreaggi, Kimberly; Higginbotham, Jennifer L; Lyons, Elizabeth A; Bush, Jocelyn M; Peck, Michelle A; Ring, Joseph D; Diegoli, Toni M; Röck, Alexander W; Huber, Gabriela E; Nagl, Simone; Strobl, Christina; Zimmermann, Bettina; Parson, Walther; Irwin, Jodi A

    2014-05-01

    Forensic mitochondrial DNA (mtDNA) testing requires appropriate, high quality reference population data for estimating the rarity of questioned haplotypes and, in turn, the strength of the mtDNA evidence. Available reference databases (SWGDAM, EMPOP) currently include information from the mtDNA control region; however, novel methods that quickly and easily recover mtDNA coding region data are becoming increasingly available. Though these assays promise to both facilitate the acquisition of mitochondrial genome (mtGenome) data and maximize the general utility of mtDNA testing in forensics, the appropriate reference data and database tools required for their routine application in forensic casework are lacking. To address this deficiency, we have undertaken an effort to: (1) increase the large-scale availability of high-quality entire mtGenome reference population data, and (2) improve the information technology infrastructure required to access/search mtGenome data and employ them in forensic casework. Here, we describe the application of a data generation and analysis workflow to the development of more than 400 complete, forensic-quality mtGenomes from low DNA quantity blood serum specimens as part of a U.S. National Institute of Justice funded reference population databasing initiative. We discuss the minor modifications made to a published mtGenome Sanger sequencing protocol to maintain a high rate of throughput while minimizing manual reprocessing with these low template samples. The successful use of this semi-automated strategy on forensic-like samples provides practical insight into the feasibility of producing complete mtGenome data in a routine casework environment, and demonstrates that large (>2kb) mtDNA fragments can regularly be recovered from high quality but very low DNA quantity specimens. Further, the detailed empirical data we provide on the amplification success rates across a range of DNA input quantities will be useful moving forward as PCR-based strategies for mtDNA enrichment are considered for targeted next-generation sequencing workflows. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  15. Inaugural Genomics Automation Congress and the coming deluge of sequencing data.

    PubMed

    Creighton, Chad J

    2010-10-01

    Presentations at Select Biosciences's first 'Genomics Automation Congress' (Boston, MA, USA) in 2010 focused on next-generation sequencing and the platforms and methodology around them. The meeting provided an overview of sequencing technologies, both new and emerging. Speakers shared their recent work on applying sequencing to profile cells for various levels of biomolecular complexity, including DNA sequences, DNA copy, DNA methylation, mRNA and microRNA. With sequencing time and costs continuing to drop dramatically, a virtual explosion of very large sequencing datasets is at hand, which will probably present challenges and opportunities for high-level data analysis and interpretation, as well as for information technology infrastructure.

  16. An improved divergent synthesis of comb-type branched oligodeoxyribonucleotides (bDNA) containing multiple secondary sequences.

    PubMed

    Horn, T; Chang, C A; Urdea, M S

    1997-12-01

    The divergent synthesis of branched DNA (bDNA) comb structures is described. This new type of bDNA contains one unique oligonucleotide, the primary sequence, covalently attached through a comb-like branch network to many identical copies of a different oligonucleotide, the secondary sequence. The bDNA comb structures were assembled on a solid support and several synthesis parameters were investigated and optimized. The bDNA comb molecules were characterized by polyacrylamide gel electrophoretic methods and by controlled cleavage at periodate-cleavable moieties incorporated during synthesis. The developed chemistry allows synthesis of bDNA comb molecules containing multiple secondary sequences. In the accompanying article we describe the synthesis and characterization of large bDNA combs containing all four deoxynucleotides for use as signal amplifiers in nucleic acid quantification assays.

  17. An improved divergent synthesis of comb-type branched oligodeoxyribonucleotides (bDNA) containing multiple secondary sequences.

    PubMed Central

    Horn, T; Chang, C A; Urdea, M S

    1997-01-01

    The divergent synthesis of branched DNA (bDNA) comb structures is described. This new type of bDNA contains one unique oligonucleotide, the primary sequence, covalently attached through a comb-like branch network to many identical copies of a different oligonucleotide, the secondary sequence. The bDNA comb structures were assembled on a solid support and several synthesis parameters were investigated and optimized. The bDNA comb molecules were characterized by polyacrylamide gel electrophoretic methods and by controlled cleavage at periodate-cleavable moieties incorporated during synthesis. The developed chemistry allows synthesis of bDNA comb molecules containing multiple secondary sequences. In the accompanying article we describe the synthesis and characterization of large bDNA combs containing all four deoxynucleotides for use as signal amplifiers in nucleic acid quantification assays. PMID:9365265

  18. Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights

    PubMed Central

    2011-01-01

    Background Transcription factors (TFs) play a central role in regulating gene expression by interacting with cis-regulatory DNA elements associated with their target genes. Recent surveys have examined the DNA binding specificities of most Saccharomyces cerevisiae TFs, but a comprehensive evaluation of their data has been lacking. Results We analyzed in vitro and in vivo TF-DNA binding data reported in previous large-scale studies to generate a comprehensive, curated resource of DNA binding specificity data for all characterized S. cerevisiae TFs. Our collection comprises DNA binding site motifs and comprehensive in vitro DNA binding specificity data for all possible 8-bp sequences. Investigation of the DNA binding specificities within the basic leucine zipper (bZIP) and VHT1 regulator (VHR) TF families revealed unexpected plasticity in TF-DNA recognition: intriguingly, the VHR TFs, newly characterized by protein binding microarrays in this study, recognize bZIP-like DNA motifs, while the bZIP TF Hac1 recognizes a motif highly similar to the canonical E-box motif of basic helix-loop-helix (bHLH) TFs. We identified several TFs with distinct primary and secondary motifs, which might be associated with different regulatory functions. Finally, integrated analysis of in vivo TF binding data with protein binding microarray data lends further support for indirect DNA binding in vivo by sequence-specific TFs. Conclusions The comprehensive data in this curated collection allow for more accurate analyses of regulatory TF-DNA interactions, in-depth structural studies of TF-DNA specificity determinants, and future experimental investigations of the TFs' predicted target genes and regulatory roles. PMID:22189060

  19. CoCoNUT: an efficient system for the comparison and analysis of genomes

    PubMed Central

    2008-01-01

    Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477

  20. DNA Extraction Protocols for Whole-Genome Sequencing in Marine Organisms.

    PubMed

    Panova, Marina; Aronsson, Henrik; Cameron, R Andrew; Dahl, Peter; Godhe, Anna; Lind, Ulrika; Ortega-Martinez, Olga; Pereyra, Ricardo; Tesson, Sylvie V M; Wrange, Anna-Lisa; Blomberg, Anders; Johannesson, Kerstin

    2016-01-01

    The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths' different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.

  1. MytiBase: a knowledgebase of mussel (M. galloprovincialis) transcribed sequences

    PubMed Central

    Venier, Paola; De Pittà, Cristiano; Bernante, Filippo; Varotto, Laura; De Nardi, Barbara; Bovo, Giuseppe; Roch, Philippe; Novoa, Beatriz; Figueras, Antonio; Pallavicini, Alberto; Lanfranchi, Gerolamo

    2009-01-01

    Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST) sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel) challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01) was constructed as determined by the high rate of gene discovery (65.6%). Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database . Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels. PMID:19203376

  2. Picoliter DNA Sequencing Chemistry on an Electrowetting-based Digital Microfluidic Platform

    PubMed Central

    Ferguson Welch, Erin R.; Lin, Yan-You; Madison, Andrew; Fair, R.B.

    2011-01-01

    The results of investigations into performing DNA sequencing chemistry on a picoliter-scale electrowetting digital microfluidic platform are reported. Pyrosequencing utilizes pyrophosphate produced during nucleotide base addition to initiate a process ending with detection through a chemiluminescence reaction using firefly luciferase. The intensity of light produced during the reaction can be quantified to determine the number of bases added to the DNA strand. The logic-based control and discrete fluid droplets of a digital microfluidic device lend themselves well to the pyrosequencing process. Bead-bound DNA is magnetically held in a single location, and wash or reagent droplets added or split from it to circumvent product dilution. Here we discuss the dispensing, control, and magnetic manipulation of the paramagnetic beads used to hold target DNA. We also demonstrate and characterize the picoliter-scale reaction of luciferase with adenosine triphosphate to represent the detection steps of pyrosequencing and all necessary alterations for working on this scale. PMID:21298802

  3. Large-scale oscillation of structure-related DNA sequence features in human chromosome 21

    NASA Astrophysics Data System (ADS)

    Li, Wentian; Miramontes, Pedro

    2006-08-01

    Human chromosome 21 is the only chromosome in the human genome that exhibits oscillation of the (G+C) content of a cycle length of hundreds kilobases (kb) ( 500kb near the right telomere). We aim at establishing the existence of a similar periodicity in structure-related sequence features in order to relate this (G+C)% oscillation to other biological phenomena. The following quantities are shown to oscillate with the same 500kb periodicity in human chromosome 21: binding energy calculated by two sets of dinucleotide-based thermodynamic parameters, AA/TT and AAA/TTT bi- and tri-nucleotide density, 5'-TA-3' dinucleotide density, and signal for 10- or 11-base periodicity of AA/TT or AAA/TTT. These intrinsic quantities are related to structural features of the double helix of DNA molecules, such as base-pair binding, untwisting or unwinding, stiffness, and a putative tendency for nucleosome formation.

  4. msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data.

    PubMed

    Mayne, Benjamin T; Leemaqz, Shalem Y; Buckberry, Sam; Rodriguez Lopez, Carlos M; Roberts, Claire T; Bianco-Miotto, Tina; Breen, James

    2018-02-01

    Genotyping-by-sequencing (GBS) or restriction-site associated DNA marker sequencing (RAD-seq) is a practical and cost-effective method for analysing large genomes from high diversity species. This method of sequencing, coupled with methylation-sensitive enzymes (often referred to as methylation-sensitive restriction enzyme sequencing or MRE-seq), is an effective tool to study DNA methylation in parts of the genome that are inaccessible in other sequencing techniques or are not annotated in microarray technologies. Current software tools do not fulfil all methylation-sensitive restriction sequencing assays for determining differences in DNA methylation between samples. To fill this computational need, we present msgbsR, an R package that contains tools for the analysis of methylation-sensitive restriction enzyme sequencing experiments. msgbsR can be used to identify and quantify read counts at methylated sites directly from alignment files (BAM files) and enables verification of restriction enzyme cut sites with the correct recognition sequence of the individual enzyme. In addition, msgbsR assesses DNA methylation based on read coverage, similar to RNA sequencing experiments, rather than methylation proportion and is a useful tool in analysing differential methylation on large populations. The package is fully documented and available freely online as a Bioconductor package ( https://bioconductor.org/packages/release/bioc/html/msgbsR.html ).

  5. Extreme-Scale De Novo Genome Assembly

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Georganas, Evangelos; Hofmeyr, Steven; Egan, Rob

    De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and themore » large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.« less

  6. Computational solutions to large-scale data management and analysis

    PubMed Central

    Schadt, Eric E.; Linderman, Michael D.; Sorenson, Jon; Lee, Lawrence; Nolan, Garry P.

    2011-01-01

    Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle our big data problems. PMID:20717155

  7. High-Throughput Block Optical DNA Sequence Identification.

    PubMed

    Sagar, Dodderi Manjunatha; Korshoj, Lee Erik; Hanson, Katrina Bethany; Chowdhury, Partha Pratim; Otoupal, Peter Britton; Chatterjee, Anushree; Nagpal, Prashant

    2018-01-01

    Optical techniques for molecular diagnostics or DNA sequencing generally rely on small molecule fluorescent labels, which utilize light with a wavelength of several hundred nanometers for detection. Developing a label-free optical DNA sequencing technique will require nanoscale focusing of light, a high-throughput and multiplexed identification method, and a data compression technique to rapidly identify sequences and analyze genomic heterogeneity for big datasets. Such a method should identify characteristic molecular vibrations using optical spectroscopy, especially in the "fingerprinting region" from ≈400-1400 cm -1 . Here, surface-enhanced Raman spectroscopy is used to demonstrate label-free identification of DNA nucleobases with multiplexed 3D plasmonic nanofocusing. While nanometer-scale mode volumes prevent identification of single nucleobases within a DNA sequence, the block optical technique can identify A, T, G, and C content in DNA k-mers. The content of each nucleotide in a DNA block can be a unique and high-throughput method for identifying sequences, genes, and other biomarkers as an alternative to single-letter sequencing. Additionally, coupling two complementary vibrational spectroscopy techniques (infrared and Raman) can improve block characterization. These results pave the way for developing a novel, high-throughput block optical sequencing method with lossy genomic data compression using k-mer identification from multiplexed optical data acquisition. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Dynamics of single-stranded DNA tethered to a solid

    NASA Astrophysics Data System (ADS)

    Radiom, Milad; Paul, Mark R.; Ducker, William A.

    2016-06-01

    Tethering is used to deliver specific biological and industrial functions. For example, single-stranded DNA (ssDNA) is tethered to polymerases and long sequences of double-stranded DNA (dsDNA) during replication, and to solids in DNA microarrays. However, tethering ssDNA to a large object limits not only the available ssDNA conformations, but also the range of time-scales over which the mechanical responses of ssDNA are important. In this work we examine the effect of tethering by measurement of the mechanical response of ssDNA that is tethered at each end to two separate atomic force microscope cantilevers in aqueous solution. Thermal motion of the cantilevers drives the ends of the ssDNA chain at frequencies near 2 kHz. The presence of a tethered molecule makes a large difference to the asymmetric cross-correlation of two cantilevers, which enables resolution of the mechanical properties in our experiments. By analysis of the correlated motion of the cantilevers we extract the friction and stiffness of the ssDNA. We find that the measured friction is much larger than the friction that is usually associated with the unencumbered motion of ssDNA. We also find that the measured relaxation time, ∼30 μs, is much greater than prior measurements of the free-molecule relaxation time. We attribute the difference to the loss of conformational possibilities as a result of constraining the ends of the ssDNA.

  9. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    PubMed

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  10. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    PubMed Central

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  11. Development of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing

    PubMed Central

    Dasgupta, Modhumita Ghosh; Dharanishanthi, Veeramuthu; Agarwal, Ishangi; Krutovsky, Konstantin V.

    2015-01-01

    The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family– based QTL and association analysis in Eucalyptus. PMID:25602379

  12. Modeling Structural Dynamics of Biomolecular Complexes by Coarse-Grained Molecular Simulations.

    PubMed

    Takada, Shoji; Kanada, Ryo; Tan, Cheng; Terakawa, Tsuyoshi; Li, Wenfei; Kenzaki, Hiroo

    2015-12-15

    Due to hierarchic nature of biomolecular systems, their computational modeling calls for multiscale approaches, in which coarse-grained (CG) simulations are used to address long-time dynamics of large systems. Here, we review recent developments and applications of CG modeling methods, focusing on our methods primarily for proteins, DNA, and their complexes. These methods have been implemented in the CG biomolecular simulator, CafeMol. Our CG model has resolution such that ∼10 non-hydrogen atoms are grouped into one CG particle on average. For proteins, each amino acid is represented by one CG particle. For DNA, one nucleotide is simplified by three CG particles, representing sugar, phosphate, and base. The protein modeling is based on the idea that proteins have a globally funnel-like energy landscape, which is encoded in the structure-based potential energy function. We first describe two representative minimal models of proteins, called the elastic network model and the classic Go̅ model. We then present a more elaborate protein model, which extends the minimal model to incorporate sequence and context dependent local flexibility and nonlocal contacts. For DNA, we describe a model developed by de Pablo's group that was tuned to well reproduce sequence-dependent structural and thermodynamic experimental data for single- and double-stranded DNAs. Protein-DNA interactions are modeled either by the structure-based term for specific cases or by electrostatic and excluded volume terms for nonspecific cases. We also discuss the time scale mapping in CG molecular dynamics simulations. While the apparent single time step of our CGMD is about 10 times larger than that in the fully atomistic molecular dynamics for small-scale dynamics, large-scale motions can be further accelerated by two-orders of magnitude with the use of CG model and a low friction constant in Langevin dynamics. Next, we present four examples of applications. First, the classic Go̅ model was used to emulate one ATP cycle of a molecular motor, kinesin. Second, nonspecific protein-DNA binding was studied by a combination of elaborate protein and DNA models. Third, a transcription factor, p53, that contains highly fluctuating regions was simulated on two perpendicularly arranged DNA segments, addressing intersegmental transfer of p53. Fourth, we simulated structural dynamics of dinucleosomes connected by a linker DNA finding distinct types of internucleosome docking and salt-concentration-dependent compaction. Finally, we discuss many of limitations in the current approaches and future directions. Especially, more accurate electrostatic treatment and a phospholipid model that matches our CG resolutions are of immediate importance.

  13. Inferring coarse-grain histone-DNA interaction potentials from high-resolution structures of the nucleosome

    NASA Astrophysics Data System (ADS)

    Meyer, Sam; Everaers, Ralf

    2015-02-01

    The histone-DNA interaction in the nucleosome is a fundamental mechanism of genomic compaction and regulation, which remains largely unknown despite increasing structural knowledge of the complex. In this paper, we propose a framework for the extraction of a nanoscale histone-DNA force-field from a collection of high-resolution structures, which may be adapted to a larger class of protein-DNA complexes. We applied the procedure to a large crystallographic database extended by snapshots from molecular dynamics simulations. The comparison of the structural models first shows that, at histone-DNA contact sites, the DNA base-pairs are shifted outwards locally, consistent with locally repulsive forces exerted by the histones. The second step shows that the various force profiles of the structures under analysis derive locally from a unique, sequence-independent, quadratic repulsive force-field, while the sequence preferences are entirely due to internal DNA mechanics. We have thus obtained the first knowledge-derived nanoscale interaction potential for histone-DNA in the nucleosome. The conformations obtained by relaxation of nucleosomal DNA with high-affinity sequences in this potential accurately reproduce the experimental values of binding preferences. Finally we address the more generic binding mechanisms relevant to the 80% genomic sequences incorporated in nucleosomes, by computing the conformation of nucleosomal DNA with sequence-averaged properties. This conformation differs from those found in crystals, and the analysis suggests that repulsive histone forces are related to local stretch tension in nucleosomal DNA, mostly between adjacent contact points. This tension could play a role in the stability of the complex.

  14. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes.

    PubMed Central

    Schena, M; Shalon, D; Heller, R; Chai, A; Brown, P O; Davis, R W

    1996-01-01

    Microarrays containing 1046 human cDNAs of unknown sequence were printed on glass with high-speed robotics. These 1.0-cm2 DNA "chips" were used to quantitatively monitor differential expression of the cognate human genes using a highly sensitive two-color hybridization assay. Array elements that displayed differential expression patterns under given experimental conditions were characterized by sequencing. The identification of known and novel heat shock and phorbol ester-regulated genes in human T cells demonstrates the sensitivity of the assay. Parallel gene analysis with microarrays provides a rapid and efficient method for large-scale human gene discovery. Images Fig. 1 Fig. 2 Fig. 3 PMID:8855227

  15. The genome of Eimeria spp., with special reference to Eimeria tenella--a coccidium from the chicken.

    PubMed

    Shirley, M W

    2000-04-10

    Eimeria spp. contain at least four genomes. The nuclear genome is best studied in the avian species Eimeria tenella and comprises about 60 Mbp DNA contained within ca. 14 chromosomes; other avian and lupine species appear to possess a nuclear genome of similar size. In addition, sequence data and hybridisation studies have provided direct evidence for extrachromosomal mitochondrial and plastid DNA genomes, and double-stranded RNA segments have also been described. The unique phenotype of "precocious" development that characterises some selected lines of Eimeria spp. not only provides the basis for the first generation of live attenuated vaccines, but offers a significant entrée into studies on the regulation of an apicomplexan life-cycle. With a view to identifying loci implicated in the trait of precocious development, a genetic linkage map of the genome of E. tenella is being constructed in this laboratory from analyses of the inheritance of over 400 polymorphic DNA markers in the progeny of a cross between complementary drug-resistant and precocious parents. Other projects that impinge directly or indirectly on the genome and/or genetics of Eimeria spp. are currently in progress in several laboratories, and include the derivation of expressed sequence tag data and the development of ancillary technologies such as transfection techniques. No large-scale genomic DNA sequencing projects have been reported.

  16. Interbreeding among deeply divergent mitochondrial lineages in the American cockroach (Periplaneta americana)

    NASA Astrophysics Data System (ADS)

    von Beeren, Christoph; Stoeckle, Mark Y.; Xia, Joyce; Burke, Griffin; Kronauer, Daniel J. C.

    2015-02-01

    DNA barcoding promises to be a useful tool to identify pest species assuming adequate representation of genetic variants in a reference library. Here we examined mitochondrial DNA barcodes in a global urban pest, the American cockroach (Periplaneta americana). Our sampling effort generated 284 cockroach specimens, most from New York City, plus 15 additional U.S. states and six other countries, enabling the first large-scale survey of P. americana barcode variation. Periplaneta americana barcode sequences (n = 247, including 24 GenBank records) formed a monophyletic lineage separate from other Periplaneta species. We found three distinct P. americana haplogroups with relatively small differences within (<=0.6%) and larger differences among groups (2.4%-4.7%). This could be interpreted as indicative of multiple cryptic species. However, nuclear DNA sequences (n = 77 specimens) revealed extensive gene flow among mitochondrial haplogroups, confirming a single species. This unusual genetic pattern likely reflects multiple introductions from genetically divergent source populations, followed by interbreeding in the invasive range. Our findings highlight the need for comprehensive reference databases in DNA barcoding studies, especially when dealing with invasive populations that might be derived from multiple genetically distinct source populations.

  17. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  18. Company profile: Complete Genomics Inc.

    PubMed

    Reid, Clifford

    2011-02-01

    Complete Genomics Inc. is a life sciences company that focuses on complete human genome sequencing. It is taking a completely different approach to DNA sequencing than other companies in the industry. Rather than building a general-purpose platform for sequencing all organisms and all applications, it has focused on a single application - complete human genome sequencing. The company's Complete Genomics Analysis Platform (CGA™ Platform) comprises an integrated package of biochemistry, instrumentation and software that sequences human genomes at the highest quality, lowest cost and largest scale available. Complete Genomics offers a turnkey service that enables customers to outsource their human genome sequencing to the company's genome sequencing center in Mountain View, CA, USA. Customers send in their DNA samples, the company does all the library preparation, DNA sequencing, assembly and variant analysis, and customers receive research-ready data that they can use for biological discovery.

  19. Quantitative comparison of DNA methylation assays for biomarker development and clinical applications.

    PubMed

    2016-07-01

    DNA methylation patterns are altered in numerous diseases and often correlate with clinically relevant information such as disease subtypes, prognosis and drug response. With suitable assays and after validation in large cohorts, such associations can be exploited for clinical diagnostics and personalized treatment decisions. Here we describe the results of a community-wide benchmarking study comparing the performance of all widely used methods for DNA methylation analysis that are compatible with routine clinical use. We shipped 32 reference samples to 18 laboratories in seven different countries. Researchers in those laboratories collectively contributed 21 locus-specific assays for an average of 27 predefined genomic regions, as well as six global assays. We evaluated assay sensitivity on low-input samples and assessed the assays' ability to discriminate between cell types. Good agreement was observed across all tested methods, with amplicon bisulfite sequencing and bisulfite pyrosequencing showing the best all-round performance. Our technology comparison can inform the selection, optimization and use of DNA methylation assays in large-scale validation studies, biomarker development and clinical diagnostics.

  20. Plasmonic nanoparticle lithography: Fast resist-free laser technique for large-scale sub-50 nm hole array fabrication

    NASA Astrophysics Data System (ADS)

    Pan, Zhenying; Yu, Ye Feng; Valuckas, Vytautas; Yap, Sherry L. K.; Vienne, Guillaume G.; Kuznetsov, Arseniy I.

    2018-05-01

    Cheap large-scale fabrication of ordered nanostructures is important for multiple applications in photonics and biomedicine including optical filters, solar cells, plasmonic biosensors, and DNA sequencing. Existing methods are either expensive or have strict limitations on the feature size and fabrication complexity. Here, we present a laser-based technique, plasmonic nanoparticle lithography, which is capable of rapid fabrication of large-scale arrays of sub-50 nm holes on various substrates. It is based on near-field enhancement and melting induced under ordered arrays of plasmonic nanoparticles, which are brought into contact or in close proximity to a desired material and acting as optical near-field lenses. The nanoparticles are arranged in ordered patterns on a flexible substrate and can be attached and removed from the patterned sample surface. At optimized laser fluence, the nanohole patterning process does not create any observable changes to the nanoparticles and they have been applied multiple times as reusable near-field masks. This resist-free nanolithography technique provides a simple and cheap solution for large-scale nanofabrication.

  1. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less

  2. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

    DOE PAGES

    Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...

    2017-07-18

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less

  3. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

    PubMed Central

    Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883

  4. The landscape of actionable genomic alterations in cell-free circulating tumor DNA from 21,807 advanced cancer patients.

    PubMed

    Zill, Oliver A; Banks, Kimberly C; Fairclough, Stephen R; Mortimer, Stefanie; Vowles, James V; Mokhtari, Reza; Gandara, David R; Mack, Philip C; Odegaard, Justin I; Nagy, Rebecca J; Baca, Arthur M; Eltoukhy, Helmy; Chudova, Darya I; Lanman, Richard B; Talasaz, AmirAli

    2018-05-18

    Cell-free DNA (cfDNA) sequencing provides a non-invasive method for obtaining actionable genomic information to guide personalized cancer treatment, but the presence of multiple alterations in circulation related to treatment and tumor heterogeneity complicate the interpretation of the observed variants. Experimental Design: We describe the somatic mutation landscape of 70 cancer genes from cfDNA deep-sequencing analysis of 21,807 patients with treated, late-stage cancers across >50 cancer types. To facilitate interpretation of the genomic complexity of circulating tumor DNA in advanced, treated cancer patients, we developed methods to identify cfDNA copy-number driver alterations and cfDNA clonality. Patterns and prevalence of cfDNA alterations in major driver genes for non-small cell lung, breast, and colorectal cancer largely recapitulated those from tumor tissue sequencing compendia (TCGA and COSMIC; r=0.90-0.99), with the principle differences in alteration prevalence being due to patient treatment. This highly sensitive cfDNA sequencing assay revealed numerous subclonal tumor-derived alterations, expected as a result of clonal evolution, but leading to an apparent departure from mutual exclusivity in treatment-naïve tumors. Upon applying novel cfDNA clonality and copy-number driver identification methods, robust mutual exclusivity was observed among predicted truncal driver cfDNA alterations (FDR=5x10 -7 for EGFR and ERBB2 ), in effect distinguishing tumor-initiating alterations from secondary alterations. Treatment-associated resistance, including both novel alterations and parallel evolution, was common in the cfDNA cohort and was enriched in patients with targetable driver alterations (>18.6% patients). Together these retrospective analyses of a large cfDNA sequencing data set reveal subclonal structures and emerging resistance in advanced solid tumors. Copyright ©2018, American Association for Cancer Research.

  5. Nuclear genomes distinguish cryptic species suggested by their DNA barcodes and ecology

    PubMed Central

    Janzen, Daniel H.; Burns, John M.; Cong, Qian; Hallwachs, Winnie; Dapkey, Tanya; Manjunath, Ramya; Hajibabaei, Mehrdad; Hebert, Paul D. N.; Grishin, Nick V.

    2017-01-01

    DNA sequencing brings another dimension to exploration of biodiversity, and large-scale mitochondrial DNA cytochrome oxidase I barcoding has exposed many potential new cryptic species. Here, we add complete nuclear genome sequencing to DNA barcoding, ecological distribution, natural history, and subtleties of adult color pattern and size to show that a widespread neotropical skipper butterfly known as Udranomia kikkawai (Weeks) comprises three different species in Costa Rica. Full-length barcodes obtained from all three century-old Venezuelan syntypes of U. kikkawai show that it is a rainforest species occurring from Costa Rica to Brazil. The two new species are Udranomia sallydaleyae Burns, a dry forest denizen occurring from Costa Rica to Mexico, and Udranomia tomdaleyi Burns, which occupies the junction between the rainforest and dry forest and currently is known only from Costa Rica. Whereas the three species are cryptic, differing but slightly in appearance, their complete nuclear genomes totaling 15 million aligned positions reveal significant differences consistent with their 0.00065-Mbp (million base pair) mitochondrial barcodes and their ecological diversification. DNA barcoding of tropical insects reared by a massive inventory suggests that the presence of cryptic species is a widespread phenomenon and that further studies will substantially increase current estimates of insect species richness. PMID:28716927

  6. Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis.

    PubMed

    Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi

    2004-02-01

    To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.

  7. Extracting DNA from 'jaws': high yield and quality from archived tiger shark (Galeocerdo cuvier) skeletal material.

    PubMed

    Nielsen, E E; Morgan, J A T; Maher, S L; Edson, J; Gauthier, M; Pepperell, J; Holmes, B J; Bennett, M B; Ovenden, J R

    2017-05-01

    Archived specimens are highly valuable sources of DNA for retrospective genetic/genomic analysis. However, often limited effort has been made to evaluate and optimize extraction methods, which may be crucial for downstream applications. Here, we assessed and optimized the usefulness of abundant archived skeletal material from sharks as a source of DNA for temporal genomic studies. Six different methods for DNA extraction, encompassing two different commercial kits and three different protocols, were applied to material, so-called bio-swarf, from contemporary and archived jaws and vertebrae of tiger sharks (Galeocerdo cuvier). Protocols were compared for DNA yield and quality using a qPCR approach. For jaw swarf, all methods provided relatively high DNA yield and quality, while large differences in yield between protocols were observed for vertebrae. Similar results were obtained from samples of white shark (Carcharodon carcharias). Application of the optimized methods to 38 museum and private angler trophy specimens dating back to 1912 yielded sufficient DNA for downstream genomic analysis for 68% of the samples. No clear relationships between age of samples, DNA quality and quantity were observed, likely reflecting different preparation and storage methods for the trophies. Trial sequencing of DNA capture genomic libraries using 20 000 baits revealed that a significant proportion of captured sequences were derived from tiger sharks. This study demonstrates that archived shark jaws and vertebrae are potential high-yield sources of DNA for genomic-scale analysis. It also highlights that even for similar tissue types, a careful evaluation of extraction protocols can vastly improve DNA yield. © 2016 John Wiley & Sons Ltd.

  8. Cloning and restriction enzyme mapping of ribosomal DNA of Giardia duodenalis, Giardia ardeae and Giardia muris.

    PubMed

    van Keulen, H; Campbell, S R; Erlandsen, S L; Jarroll, E L

    1991-06-01

    In an attempt to study Giardia at the DNA sequence level, the rRNA genes of three species, Giardia duodenalis, Giardia ardeae and Giardia muris were cloned and restriction enzyme maps were constructed. The rDNA repeats of these Giardia show completely different restriction enzyme recognition patterns. The size of the rDNA repeat ranges from approximately 5.6 kb in G. duodenalis to 7.6 kb in both G. muris and G. ardeae. These size differences are mainly attributable to the variation in length of the spacer. Minor differences exist among these Giardia in the sizes of their small subunit rRNA and the internal transcribed spacer between small and large subunit rRNA. The genetic maps were constructed by sequence analysis of the DNA around the 5' and 3' ends of the mature rRNA genes and between the rRNA covering the 5.8S rRNA gene and internal transcribed spacer. Comparison of the 5.8S rDNA and 3' end of large subunit rDNA from these three Giardia species showed considerable sequence variation, but the rDNA sequences of G. duodenalis and G. ardeae appear more closely related to each other than to G. muris.

  9. Molecular inversion probe assay for allelic quantitation

    PubMed Central

    Ji, Hanlee; Welch, Katrina

    2010-01-01

    Molecular inversion probe (MIP) technology has been demonstrated to be a robust platform for large-scale dual genotyping and copy number analysis. Applications in human genomic and genetic studies include the possibility of running dual germline genotyping and combined copy number variation ascertainment. MIPs analyze large numbers of specific genetic target sequences in parallel, relying on interrogation of a barcode tag, rather than direct hybridization of genomic DNA to an array. The MIP approach does not replace, but is complementary to many of the copy number technologies being performed today. Some specific advantages of MIP technology include: Less DNA required (37 ng vs. 250 ng), DNA quality less important, more dynamic range (amplifications detected up to copy number 60), allele specific information “cleaner” (less SNP crosstalk/contamination), and quality of markers better (fewer individual MIPs versus SNPs needed to identify copy number changes). MIPs can be considered a candidate gene (targeted whole genome) approach and can find specific areas of interest that otherwise may be missed with other methods. PMID:19488872

  10. Large-Scale Collection and Analysis of Full-Length cDNAs from Brachypodium distachyon and Integration with Pooideae Sequence Resources

    PubMed Central

    Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

    2013-01-01

    A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698

  11. Geranyl diphosphate synthase large subunit, and methods of use

    DOEpatents

    Croteau, Rodney B.; Burke, Charles C.; Wildung, Mark R.

    2001-10-16

    A cDNA encoding geranyl diphosphate synthase large subunit from peppermint has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Replicable recombinant cloning vehicles are provided which code for geranyl diphosphate synthase large subunit). In another aspect, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding geranyl diphosphate synthase large subunit. In yet another aspect, the present invention provides isolated, recombinant geranyl diphosphate synthase protein comprising an isolated, recombinant geranyl diphosphate synthase large subunit protein and an isolated, recombinant geranyl diphosphate synthase small subunit protein. Thus, systems and methods are provided for the recombinant expression of geranyl diphosphate synthase.

  12. Mitochondrial genome of the moon jelly Aurelia aurita (Cnidaria, Scyphozoa): A linear DNA molecule encoding a putative DNA-dependent DNA polymerase.

    PubMed

    Shao, Zhiyong; Graf, Shannon; Chaga, Oleg Y; Lavrov, Dennis V

    2006-10-15

    The 16,937-nuceotide sequence of the linear mitochondrial DNA (mt-DNA) molecule of the moon jelly Aurelia aurita (Cnidaria, Scyphozoa) - the first mtDNA sequence from the class Scypozoa and the first sequence of a linear mtDNA from Metazoa - has been determined. This sequence contains genes for 13 energy pathway proteins, small and large subunit rRNAs, and methionine and tryptophan tRNAs. In addition, two open reading frames of 324 and 969 base pairs in length have been found. The deduced amino-acid sequence of one of them, ORF969, displays extensive sequence similarity with the polymerase [but not the exonuclease] domain of family B DNA polymerases, and this ORF has been tentatively identified as dnab. This is the first report of dnab in animal mtDNA. The genes in A. aurita mtDNA are arranged in two clusters with opposite transcriptional polarities; transcription proceeding toward the ends of the molecule. The determined sequences at the ends of the molecule are nearly identical but inverted and lack any obvious potential secondary structures or telomere-like repeat elements. The acquisition of mitochondrial genomic data for the second class of Cnidaria allows us to reconstruct characteristic features of mitochondrial evolution in this animal phylum.

  13. Versatile Gene-Specific Sequence Tags for Arabidopsis Functional Genomics: Transcript Profiling and Reverse Genetics Applications

    PubMed Central

    Hilson, Pierre; Allemeersch, Joke; Altmann, Thomas; Aubourg, Sébastien; Avon, Alexandra; Beynon, Jim; Bhalerao, Rishikesh P.; Bitton, Frédérique; Caboche, Michel; Cannoot, Bernard; Chardakov, Vasil; Cognet-Holliger, Cécile; Colot, Vincent; Crowe, Mark; Darimont, Caroline; Durinck, Steffen; Eickhoff, Holger; de Longevialle, Andéol Falcon; Farmer, Edward E.; Grant, Murray; Kuiper, Martin T.R.; Lehrach, Hans; Léon, Céline; Leyva, Antonio; Lundeberg, Joakim; Lurin, Claire; Moreau, Yves; Nietfeld, Wilfried; Paz-Ares, Javier; Reymond, Philippe; Rouzé, Pierre; Sandberg, Goran; Segura, Maria Dolores; Serizet, Carine; Tabrett, Alexandra; Taconnat, Ludivine; Thareau, Vincent; Van Hummelen, Paul; Vercruysse, Steven; Vuylsteke, Marnik; Weingartner, Magdalena; Weisbeek, Peter J.; Wirta, Valtteri; Wittink, Floyd R.A.; Zabeau, Marc; Small, Ian

    2004-01-01

    Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics. PMID:15489341

  14. Relative information content of polymorphic microsatellites and mitochondrial DNA for inferring dispersal and population genetic structure in the olive sea snake, Aipysurus laevis.

    PubMed

    Lukoschek, V; Waycott, M; Keogh, J S

    2008-07-01

    Polymorphic microsatellites are widely considered more powerful for resolving population structure than mitochondrial DNA (mtDNA) markers, particularly for recently diverged lineages or geographically proximate populations. Weaker population subdivision for biparentally inherited nuclear markers than maternally inherited mtDNA may signal male-biased dispersal but can also be attributed to marker-specific evolutionary characteristics and sampling properties. We discriminated between these competing explanations with a population genetic study on olive sea snakes, Aipysurus laevis. A previous mtDNA study revealed strong regional population structure for A. laevis around northern Australia, where Pleistocene sea-level fluctuations have influenced the genetic signatures of shallow-water marine species. Divergences among phylogroups dated to the Late Pleistocene, suggesting recent range expansions by previously isolated matrilines. Fine-scale population structure within regions was, however, poorly resolved for mtDNA. In order to improve estimates of fine-scale genetic divergence and to compare population structure between nuclear and mtDNA, 354 olive sea snakes (previously sequenced for mtDNA) were genotyped for five microsatellite loci. F statistics and Bayesian multilocus genotype clustering analyses found similar regional population structure as mtDNA and, after standardizing microsatellite F statistics for high heterozygosities, regional divergence estimates were quantitatively congruent between marker classes. Over small spatial scales, however, microsatellites recovered almost no genetic structure and standardized F statistics were orders of magnitude smaller than for mtDNA. Three tests for male-biased dispersal were not significant, suggesting that recent demographic expansions to the typically large population sizes of A. laevis have prevented microsatellites from reaching mutation-drift equilibrium and local populations may still be diverging.

  15. Human structural variation: mechanisms of chromosome rearrangements

    PubMed Central

    Weckselblatt, Brooke; Rudd, M. Katharine

    2015-01-01

    Chromosome structural variation (SV) is a normal part of variation in the human genome, but some classes of SV can cause neurodevelopmental disorders. Analysis of the DNA sequence at SV breakpoints can reveal mutational mechanisms and risk factors for chromosome rearrangement. Large-scale SV breakpoint studies have become possible recently owing to advances in next-generation sequencing (NGS) including whole-genome sequencing (WGS). These findings have shed light on complex forms of SV such as triplications, inverted duplications, insertional translocations, and chromothripsis. Sequence-level breakpoint data resolve SV structure and determine how genes are disrupted, fused, and/or misregulated by breakpoints. Recent improvements in breakpoint sequencing have also revealed non-allelic homologous recombination (NAHR) between paralogous long interspersed nuclear element (LINE) or human endogenous retrovirus (HERV) repeats as a cause of deletions, duplications, and translocations. This review covers the genomic organization of simple and complex constitutional SVs, as well as the molecular mechanisms of their formation. PMID:26209074

  16. Modeling DNA bubble formation at the atomic scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beleva, V; Rasmussen, K. O.; Garcia, A. E.

    We describe the fluctuations of double stranded DNA molecules using a minimalist Go model over a wide range of temperatures. Minimalist models allow us to describe, at the atomic level, the opening and formation of bubbles in DNA double helices. This model includes all the geometrical constraints in helix melting imposed by the 3D structure of the molecule. The DNA forms melted bubbles within double helices. These bubbles form and break as a function of time. The equilibrium average number of broken base pairs shows a sharp change as a function of T. We observe a temperature profile of sequencemore » dependent bubble formation similar to those measured by Zeng et al. Long nuclei acid molecules melt partially through the formations of bubbles. It is known that CG rich sequences melt at higher temperatures than AT rich sequences. The melting temperature, however, is not solely determined by the CG content, but by the sequence through base stacking and solvent interactions. Recently, models that incorporate the sequence and nonlinear dynamics of DNA double strands have shown that DNA exhibits a very rich dynamics. Recent extensions of the Bishop-Peyrard model show that fluctuations in the DNA structure lead to opening in localized regions, and that these regions in the DNA are associated with transcription initiation sites. 1D and 2D models of DNA may contain enough information about stacking and base pairing interactions, but lack the coupling between twisting, bending and base pair opening imposed by the double helical structure of DNA that all atom models easily describe. However, the complexity of the energy function used in all atom simulations (including solvent, ions, etc) does not allow for the description of DNA folding/unfolding events that occur in the microsecond time scale.« less

  17. Genomic sequencing of Pleistocene cave bears

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less

  18. A streamlined method for analysing genome-wide DNA methylation patterns from low amounts of FFPE DNA.

    PubMed

    Ludgate, Jackie L; Wright, James; Stockwell, Peter A; Morison, Ian M; Eccles, Michael R; Chatterjee, Aniruddha

    2017-08-31

    Formalin fixed paraffin embedded (FFPE) tumor samples are a major source of DNA from patients in cancer research. However, FFPE is a challenging material to work with due to macromolecular fragmentation and nucleic acid crosslinking. FFPE tissue particularly possesses challenges for methylation analysis and for preparing sequencing-based libraries relying on bisulfite conversion. Successful bisulfite conversion is a key requirement for sequencing-based methylation analysis. Here we describe a complete and streamlined workflow for preparing next generation sequencing libraries for methylation analysis from FFPE tissues. This includes, counting cells from FFPE blocks and extracting DNA from FFPE slides, testing bisulfite conversion efficiency with a polymerase chain reaction (PCR) based test, preparing reduced representation bisulfite sequencing libraries and massively parallel sequencing. The main features and advantages of this protocol are: An optimized method for extracting good quality DNA from FFPE tissues. An efficient bisulfite conversion and next generation sequencing library preparation protocol that uses 50 ng DNA from FFPE tissue. Incorporation of a PCR-based test to assess bisulfite conversion efficiency prior to sequencing. We provide a complete workflow and an integrated protocol for performing DNA methylation analysis at the genome-scale and we believe this will facilitate clinical epigenetic research that involves the use of FFPE tissue.

  19. Circular replication-associated protein encoding DNA viruses identified in the faecal matter of various animals in New Zealand.

    PubMed

    Steel, Olivia; Kraberger, Simona; Sikorski, Alyssa; Young, Laura M; Catchpole, Ryan J; Stevens, Aaron J; Ladley, Jenny J; Coray, Dorien S; Stainton, Daisy; Dayaram, Anisha; Julian, Laurel; van Bysterveldt, Katherine; Varsani, Arvind

    2016-09-01

    In recent years, innovations in molecular techniques and sequencing technologies have resulted in a rapid expansion in the number of known viral sequences, in particular those with circular replication-associated protein (Rep)-encoding single-stranded (CRESS) DNA genomes. CRESS DNA viruses are present in the virome of many ecosystems and are known to infect a wide range of organisms. A large number of the recently identified CRESS DNA viruses cannot be classified into any known viral families, indicating that the current view of CRESS DNA viral sequence space is greatly underestimated. Animal faecal matter has proven to be a particularly useful source for sampling CRESS DNA viruses in an ecosystem, as it is cost-effective and non-invasive. In this study a viral metagenomic approach was used to explore the diversity of CRESS DNA viruses present in the faeces of domesticated and wild animals in New Zealand. Thirty-eight complete CRESS DNA viral genomes and two circular molecules (that may be defective molecules or single components of multicomponent genomes) were identified from forty-nine individual animal faecal samples. Based on shared genome organisations and sequence similarities, eighteen of the isolates were classified as gemycircularviruses and twelve isolates were classified as smacoviruses. The remaining eight isolates lack significant sequence similarity with any members of known CRESS DNA virus groups. This research adds significantly to our knowledge of CRESS DNA viral diversity in New Zealand, emphasising the prevalence of CRESS DNA viruses in nature, and reinforcing the suggestion that a large proportion of CRESS DNA viruses are yet to be identified. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Fractal landscapes in biological systems: long-range correlations in DNA and interbeat heart intervals

    NASA Technical Reports Server (NTRS)

    Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Hausdorff, J. M.; Havlin, S.; Mietus, J.; Sciortino, F.; Simons, M.

    1992-01-01

    Here we discuss recent advances in applying ideas of fractals and disordered systems to two topics of biological interest, both topics having common the appearance of scale-free phenomena, i.e., correlations that have no characteristic length scale, typically exhibited by physical systems near a critical point and dynamical systems far from equilibrium. (i) DNA nucleotide sequences have traditionally been analyzed using models which incorporate the possibility of short-range nucleotide correlations. We found, instead, a remarkably long-range power law correlation. We found such long-range correlations in intron-containing genes and in non-transcribed regulatory DNA sequences as well as intragenomic DNA, but not in cDNA sequences or intron-less genes. We also found that the myosin heavy chain family gene evolution increases the fractal complexity of the DNA landscapes, consistent with the intron-late hypothesis of gene evolution. (ii) The healthy heartbeat is traditionally thought to be regulated according to the classical principle of homeostasis, whereby physiologic systems operate to reduce variability and achieve an equilibrium-like state. We found, however, that under normal conditions, beat-to-beat fluctuations in heart rate display long-range power law correlations.

  1. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing.

    PubMed

    Hargreaves, Adam D; Mulley, John F

    2015-01-01

    Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0-2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5' and 3' UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species.

  2. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing

    PubMed Central

    Hargreaves, Adam D.

    2015-01-01

    Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0–2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5′ and 3′ UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species. PMID:26623194

  3. Sequencing to Station in 12 Months (Targeting Orbital 5 Launch, March 30th)

    NASA Technical Reports Server (NTRS)

    Smith, David J.; Burton, Aaron Steven

    2015-01-01

    The Biomolecule Sequencer is a Commercial Off-The-Shelf device developed by Oxford Nanopore Technologies and implements a method of DNA sequencing unlike any other current sequencers. The device measures changes in electrical current through a nanopore depending on the sequence of the DNA strand that is passing through it. Since the technology is built on nanometer-scale ion pores, the hardware itself is exceptionally small (3 x 1 x 58 inches), lightweight (less than 120 grams with USB cable), and powered only by a USB connection. The sequencing device is permanent, while the flow cells, to which the samples are added, are periodically replaced. The goal of our upcoming technology demonstration on ISS is to provide evidence that DNA sequencing in space is possible, which holds the exciting potential to enable the identification of microorganisms, monitor changes in microbes and humans in response to spaceflight, and possibly aid in the detection of DNA-based life elsewhere in the universe.

  4. Patterns of DNA barcode variation in Canadian marine molluscs.

    PubMed

    Layton, Kara K S; Martel, André L; Hebert, Paul D N

    2014-01-01

    Molluscs are the most diverse marine phylum and this high diversity has resulted in considerable taxonomic problems. Because the number of species in Canadian oceans remains uncertain, there is a need to incorporate molecular methods into species identifications. A 648 base pair segment of the cytochrome c oxidase subunit I gene has proven useful for the identification and discovery of species in many animal lineages. While the utility of DNA barcoding in molluscs has been demonstrated in other studies, this is the first effort to construct a DNA barcode registry for marine molluscs across such a large geographic area. This study examines patterns of DNA barcode variation in 227 species of Canadian marine molluscs. Intraspecific sequence divergences ranged from 0-26.4% and a barcode gap existed for most taxa. Eleven cases of relatively deep (>2%) intraspecific divergence were detected, suggesting the possible presence of overlooked species. Structural variation was detected in COI with indels found in 37 species, mostly bivalves. Some indels were present in divergent lineages, primarily in the region of the first external loop, suggesting certain areas are hotspots for change. Lastly, mean GC content varied substantially among orders (24.5%-46.5%), and showed a significant positive correlation with nearest neighbour distances. DNA barcoding is an effective tool for the identification of Canadian marine molluscs and for revealing possible cases of overlooked species. Some species with deep intraspecific divergence showed a biogeographic partition between lineages on the Atlantic, Arctic and Pacific coasts, suggesting the role of Pleistocene glaciations in the subdivision of their populations. Indels were prevalent in the barcode region of the COI gene in bivalves and gastropods. This study highlights the efficacy of DNA barcoding for providing insights into sequence variation across a broad taxonomic group on a large geographic scale.

  5. Streamlining the Design-to-Build Transition with Build-Optimization Software Tools.

    PubMed

    Oberortner, Ernst; Cheng, Jan-Fang; Hillson, Nathan J; Deutsch, Samuel

    2017-03-17

    Scaling-up capabilities for the design, build, and test of synthetic biology constructs holds great promise for the development of new applications in fuels, chemical production, or cellular-behavior engineering. Construct design is an essential component in this process; however, not every designed DNA sequence can be readily manufactured, even using state-of-the-art DNA synthesis methods. Current biological computer-aided design and manufacture tools (bioCAD/CAM) do not adequately consider the limitations of DNA synthesis technologies when generating their outputs. Designed sequences that violate DNA synthesis constraints may require substantial sequence redesign or lead to price-premiums and temporal delays, which adversely impact the efficiency of the DNA manufacturing process. We have developed a suite of build-optimization software tools (BOOST) to streamline the design-build transition in synthetic biology engineering workflows. BOOST incorporates knowledge of DNA synthesis success determinants into the design process to output ready-to-build sequences, preempting the need for sequence redesign. The BOOST web application is available at https://boost.jgi.doe.gov and its Application Program Interfaces (API) enable integration into automated, customized DNA design processes. The herein presented results highlight the effectiveness of BOOST in reducing DNA synthesis costs and timelines.

  6. NASBA: A detection and amplification system uniquely suited for RNA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sooknanan, R.; Malek, L.T.

    1995-06-01

    The invention of PCR (polymerase chain reaction) has revolutionized our ability to amplify and manipulate a nucleic acid sequence in vitro. The commercial rewards of this revolution have driven the development of other nuclei acid amplification and detection methodologies. This has created an alphabet soup of technologies that use different amplification methods, including NASBA (nucleic acid sequence-based amplification), LCR (ligase chain reaction), SDA (strand displacement amplification), QBR (Q-beta replicase), CPR (cycling probe reaction), and bDNA (branched DNA). Despite the differences in their processes, these amplification systems can be separated into two broad categories based on how they achieve their goal:more » sequence-based amplification systems, such as PCR, NASBA, and SDA, amplify a target nucleic acid sequence. Signal-based amplification systems, such as LCR, QBR, CPR and bDNA, amplify or alter a signal from a detection reaction that is target-dependent. While the various methods have relative strengths and weaknesses, only NASBA offers the unique ability to homogeneously amplify an RNA analyte in the presence of homologous genomic DNA under isothermal conditions. Since the detection of RNA sequences almost invariably measures biological activity, it is an excellent prognostic indicator of activities as diverse as virus production, gene expression, and cell viability. The isothermal nature of the reaction makes NASBA especially suitable for large-scale manual screening. These features extend NASBA`s application range from research to commercial diagnostic applications. Field test kits are presently under development for human diagnostics as well as the burgeoning fields of food and environmental diagnostic testing. These developments suggest future integration of NASBA into robotic workstations for high-throughput screening as well. 17 refs., 1 tab.« less

  7. Mitochondrial Variation among the Aymara and the Signatures of Population Expansion in the Central Andes

    PubMed Central

    BATAI, KEN; WILLIAMS, SLOAN R.

    2015-01-01

    Objectives The exploitation of marine resources and intensive agriculture led to a marked population increase early in central Andean prehistory. Constant historic and prehistoric population movements also characterize this region. These features undoubtedly affected regional genetic variation, but the exact nature of these effects remains uncertain. Methods Mitochondrial DNA (mtDNA) hypervariable region I sequence variation in 61 Aymara individuals from La Paz, Bolivia, was analyzed and compared to sequences from 47 other South American populations to test hypotheses of whether increased female effective population size and gene flow influenced the mtDNA variation among central Andean populations. Results The Aymara and Quechua were genetically diverse showing evidence of population expansion and large effective population size, and a demographic expansion model fits the mtDNA variation found among central Andean populations well. Estimated migration rates and the results of AMOVA and multidimensional scaling analysis suggest that female gene flow was also an important factor, influencing genetic variation among the central Andeans as well as lowland populations from western South America. mtDNA variation in south central Andes correlated better with geographic proximity than with language, and fit a population continuity model. Conclusion The mtDNA data suggests that the central Andeans experienced population expansion, most likely because of rapid demographic expansion after introduction of intensive agriculture, but roles of female gene flow need to be further explored. PMID:24449040

  8. Bacterial discrimination by means of a universal array approach mediated by LDR (ligase detection reaction)

    PubMed Central

    Busti, Elena; Bordoni, Roberta; Castiglioni, Bianca; Monciardini, Paolo; Sosio, Margherita; Donadio, Stefano; Consolandi, Clarissa; Rossi Bernardi, Luigi; Battaglia, Cristina; De Bellis, Gianluca

    2002-01-01

    Background PCR amplification of bacterial 16S rRNA genes provides the most comprehensive and flexible means of sampling bacterial communities. Sequence analysis of these cloned fragments can provide a qualitative and quantitative insight of the microbial population under scrutiny although this approach is not suited to large-scale screenings. Other methods, such as denaturing gradient gel electrophoresis, heteroduplex or terminal restriction fragment analysis are rapid and therefore amenable to field-scale experiments. A very recent addition to these analytical tools is represented by microarray technology. Results Here we present our results using a Universal DNA Microarray approach as an analytical tool for bacterial discrimination. The proposed procedure is based on the properties of the DNA ligation reaction and requires the design of two probes specific for each target sequence. One oligo carries a fluorescent label and the other a unique sequence (cZipCode or complementary ZipCode) which identifies a ligation product. Ligated fragments, obtained in presence of a proper template (a PCR amplified fragment of the 16s rRNA gene) contain either the fluorescent label or the unique sequence and therefore are addressed to the location on the microarray where the ZipCode sequence has been spotted. Such an array is therefore "Universal" being unrelated to a specific molecular analysis. Here we present the design of probes specific for some groups of bacteria and their application to bacterial diagnostics. Conclusions The combined use of selective probes, ligation reaction and the Universal Array approach yielded an analytical procedure with a good power of discrimination among bacteria. PMID:12243651

  9. Component identification of electron transport chains in curdlan-producing Agrobacterium sp. ATCC 31749 and its genome-specific prediction using comparative genome and phylogenetic trees analysis.

    PubMed

    Zhang, Hongtao; Setubal, Joao Carlos; Zhan, Xiaobei; Zheng, Zhiyong; Yu, Lijun; Wu, Jianrong; Chen, Dingqiang

    2011-06-01

    Agrobacterium sp. ATCC 31749 (formerly named Alcaligenes faecalis var. myxogenes) is a non-pathogenic aerobic soil bacterium used in large scale biotechnological production of curdlan. However, little is known about its genomic information. DNA partial sequence of electron transport chains (ETCs) protein genes were obtained in order to understand the components of ETC and genomic-specificity in Agrobacterium sp. ATCC 31749. Degenerate primers were designed according to ETC conserved sequences in other reported species. DNA partial sequences of ETC genes in Agrobacterium sp. ATCC 31749 were cloned by the PCR method using degenerate primers. Based on comparative genomic analysis, nine electron transport elements were ascertained, including NADH ubiquinone oxidoreductase, succinate dehydrogenase complex II, complex III, cytochrome c, ubiquinone biosynthesis protein ubiB, cytochrome d terminal oxidase, cytochrome bo terminal oxidase, cytochrome cbb (3)-type terminal oxidase and cytochrome caa (3)-type terminal oxidase. Similarity and phylogenetic analyses of these genes revealed that among fully sequenced Agrobacterium species, Agrobacterium sp. ATCC 31749 is closest to Agrobacterium tumefaciens C58. Based on these results a comprehensive ETC model for Agrobacterium sp. ATCC 31749 is proposed.

  10. Genetic characterisation of Taenia multiceps cysts from ruminants in Greece.

    PubMed

    Al-Riyami, Shumoos; Ioannidou, Evi; Koehler, Anson V; Hussain, Muhammad H; Al-Rawahi, Abdulmajeed H; Giadinis, Nektarios D; Lafi, Shawkat Q; Papadopoulos, Elias; Jabbar, Abdul

    2016-03-01

    This study was designed to genetically characterise the larval stage (coenurus) of Taenia multiceps from ruminants in Greece, utilising DNA regions within the cytochrome c oxidase subunit 1 (partial cox1) and NADH dehydrogenase 1 (pnad1) mitochondrial (mt) genes, respectively. A molecular-phylogenetic approach was used to analyse the pcox1 and pnad1 amplicons derived from genomic DNA samples from individual cysts (n=105) from cattle (n=3), goats (n=5) and sheep (n=97). Results revealed five and six distinct electrophoretic profiles for pcox1 and pnad1, respectively, using single-strand conformation polymorphism. Direct sequencing of selected amplicons representing each of these profiles defined five haplotypes each for pcox1 and pnad1, among all 105 isolates. Phylogenetic analysis of individual sequence data for each locus, including a range of well-defined reference sequences, inferred that all isolates of T. multiceps cysts from ruminants in Greece clustered with previously published sequences from different continents. The present study provides a foundation for future large-scale studies on the epidemiology of T. multiceps in ruminants as well as dogs in Greece. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Optimization of cDNA-AFLP experiments using genomic sequence data.

    PubMed

    Kivioja, Teemu; Arvas, Mikko; Saloheimo, Markku; Penttilä, Merja; Ukkonen, Esko

    2005-06-01

    cDNA amplified fragment length polymorphism (cDNA-AFLP) is one of the few genome-wide level expression profiling methods capable of finding genes that have not yet been cloned or even predicted from sequence but have interesting expression patterns under the studied conditions. In cDNA-AFLP, a complex cDNA mixture is divided into small subsets using restriction enzymes and selective PCR. A large cDNA-AFLP experiment can require a substantial amount of resources, such as hundreds of PCR amplifications and gel electrophoresis runs, followed by manual cutting of a large number of bands from the gels. Our aim was to test whether this workload can be reduced by rational design of the experiment. We used the available genomic sequence information to optimize cDNA-AFLP experiments beforehand so that as many transcripts as possible could be profiled with a given amount of resources. Optimization of the selection of both restriction enzymes and selective primers for cDNA-AFLP experiments has not been performed previously. The in silico tests performed suggest that substantial amounts of resources can be saved by the optimization of cDNA-AFLP experiments.

  12. Effect of Noise on DNA Sequencing via Transverse Electronic Transport

    PubMed Central

    Krems, Matt; Zwolak, Michael; Pershin, Yuriy V.; Di Ventra, Massimiliano

    2009-01-01

    Abstract Previous theoretical studies have shown that measuring the transverse current across DNA strands while they translocate through a nanopore or channel may provide a statistically distinguishable signature of the DNA bases, and may thus allow for rapid DNA sequencing. However, fluctuations of the environment, such as ionic and DNA motion, introduce important scattering processes that may affect the viability of this approach to sequencing. To understand this issue, we have analyzed a simple model that captures the role of this complex environment in electronic dephasing and its ability to remove charge carriers from current-carrying states. We find that these effects do not strongly influence the current distributions due to the off-resonant nature of tunneling through the nucleotides—a result we expect to be a common feature of transport in molecular junctions. In particular, only large scattering strengths, as compared to the energetic gap between the molecular states and the Fermi level, significantly alter the form of the current distributions. Since this gap itself is quite large, the current distributions remain protected from this type of noise, further supporting the possibility of using transverse electronic transport measurements for DNA sequencing. PMID:19804730

  13. Vander Lugt correlation of DNA sequence data

    NASA Astrophysics Data System (ADS)

    Christens-Barry, William A.; Hawk, James F.; Martin, James C.

    1990-12-01

    DNA, the molecule containing the genetic code of an organism, is a linear chain of subunits. It is the sequence of subunits, of which there are four kinds, that constitutes the unique blueprint of an individual. This sequence is the focus of a large number of analyses performed by an army of geneticists, biologists, and computer scientists. Most of these analyses entail searches for specific subsequences within the larger set of sequence data. Thus, most analyses are essentially pattern recognition or correlation tasks. Yet, there are special features to such analysis that influence the strategy and methods of an optical pattern recognition approach. While the serial processing employed in digital electronic computers remains the main engine of sequence analyses, there is no fundamental reason that more efficient parallel methods cannot be used. We describe an approach using optical pattern recognition (OPR) techniques based on matched spatial filtering. This allows parallel comparison of large blocks of sequence data. In this study we have simulated a Vander Lugt1 architecture implementing our approach. Searches for specific target sequence strings within a block of DNA sequence from the Co/El plasmid2 are performed.

  14. Phylogenetic congruence of armored scale insects (Hemiptera: Diaspididae) and their primary endosymbionts from the phylum Bacteroidetes.

    PubMed

    Gruwell, Matthew E; Morse, Geoffrey E; Normark, Benjamin B

    2007-07-01

    Insects in the sap-sucking hemipteran suborder Sternorrhyncha typically harbor maternally transmitted bacteria housed in a specialized organ, the bacteriome. In three of the four superfamilies of Sternorrhyncha (Aphidoidea, Aleyrodoidea, Psylloidea), the bacteriome-associated (primary) bacterial lineage is from the class Gammaproteobacteria (phylum Proteobacteria). The fourth superfamily, Coccoidea (scale insects), has a diverse array of bacterial endosymbionts whose affinities are largely unexplored. We have amplified fragments of two bacterial ribosomal genes from each of 68 species of armored scale insects (Diaspididae). In spite of initially using primers designed for Gammaproteobacteria, we consistently amplified sequences from a different bacterial phylum: Bacteroidetes. We use these sequences (16S and 23S, 2105 total base pairs), along with previously published sequences from the armored scale hosts (elongation factor 1alpha and 28S rDNA) to investigate phylogenetic congruence between the two clades. The Bayesian tree for the bacteria is roughly congruent with that of the hosts, with 67% of nodes identical. Partition homogeneity tests found no significant difference between the host and bacterial data sets. Of thirteen Shimodaira-Hasegawa tests, comparing the original Bayesian bacterial tree to bacterial trees with incongruent clades forced to match the host tree, 12 found no significant difference. A significant difference in topology was found only when the entire host tree was compared with the entire bacterial tree. For the bacterial data set, the treelengths of the most parsimonious host trees are only 1.8-2.4% longer than that of the most parsimonious bacterial trees. The high level of congruence between the topologies indicates that these Bacteroidetes are the primary endosymbionts of armored scale insects. To investigate the phylogenetic affinities of these endosymbionts, we aligned some of their 16S rDNA sequences with other known Bacteroidetes endosymbionts and with other similar sequences identified by BLAST searches. Although the endosymbionts of armored scales are only distantly related to the endosymbionts of the other sternorrhynchan insects, they are closely related to bacteria associated with eriococcid and margarodid scale insects, to cockroach and auchenorrynchan endosymbionts (Blattabacterium and Sulcia), and to male-killing endosymbionts of ladybird beetles. We propose the name "Candidatus Uzinura diaspidicola" for the primary endosymbionts of armored scale insects.

  15. Cell transformation mediated by chromosomal deoxyribonucleic acid of polyoma virus-transformed cells.

    PubMed Central

    Della Valle, G; Fenton, R G; Basilico, C

    1981-01-01

    To study the mechanism of deoxyribonucleic acid (DNA)-mediated gene transfer, normal rat cells were transfected with total cellular DNA extracted from polyoma virus-transformed cells. This resulted in the appearance of the transformed phenotype in 1 X 10(-6) to 3 X 10(-6) of the transfected cells. Transformation was invariably associated with the acquisition of integrated viral DNA sequences characteristic of the donor DNA. This was caused not by the integration of free DNA molecules, but by the transfer of large DNA fragments (10 to 20 kilobases) containing linked cellular and viral sequences. Although Southern blot analysis showed that integration did not appear to occur in a homologous region of the recipient chromosome, the frequency of transformation was rather high when compared with that of purified polyoma DNA, perhaps due to "position" effects or to the high efficiency of recombination of large DNA fragments. Images PMID:6100965

  16. Brief Overview of a Decade of Genome-Wide Association Studies on Primary Hypertension.

    PubMed

    Azam, Afifah Binti; Azizan, Elena Aisha Binti

    2018-01-01

    Primary hypertension is widely believed to be a complex polygenic disorder with the manifestation influenced by the interactions of genomic and environmental factors making identification of susceptibility genes a major challenge. With major advancement in high-throughput genotyping technology, genome-wide association study (GWAS) has become a powerful tool for researchers studying genetically complex diseases. GWASs work through revealing links between DNA sequence variation and a disease or trait with biomedical importance. The human genome is a very long DNA sequence which consists of billions of nucleotides arranged in a unique way. A single base-pair change in the DNA sequence is known as a single nucleotide polymorphism (SNP). With the help of modern genotyping techniques such as chip-based genotyping arrays, thousands of SNPs can be genotyped easily. Large-scale GWASs, in which more than half a million of common SNPs are genotyped and analyzed for disease association in hundreds of thousands of cases and controls, have been broadly successful in identifying SNPs associated with heart diseases, diabetes, autoimmune diseases, and psychiatric disorders. It is however still debatable whether GWAS is the best approach for hypertension. The following is a brief overview on the outcomes of a decade of GWASs on primary hypertension.

  17. Does TATA matter? A structural exploration of the selectivity determinants in its complexes with TATA box-binding protein.

    PubMed Central

    Pastor, N; Pardo, L; Weinstein, H

    1997-01-01

    The binding of the TATA box-binding protein (TBP) to a TATA sequence in DNA is essential for eukaryotic basal transcription. TBP binds in the minor groove of DNA, causing a large distortion of the DNA helix. Given the apparent stereochemical equivalence of AT and TA basepairs in the minor groove, DNA deformability must play a significant role in binding site selection, because not all AT-rich sequences are bound effectively by TBP. To gain insight into the precise role that the properties of the TATA sequence have in determining the specificity of the DNA substrates of TBP, the solution structure and dynamics of seven DNA dodecamers have been studied by using molecular dynamics simulations. The analysis of the structural properties of basepair steps in these TATA sequences suggests a reason for the preference for alternating pyrimidine-purine (YR) sequences, but indicates that these properties cannot be the sole determinant of the sequence specificity of TBP. Rather, recognition depends on the interplay between the inherent deformability of the DNA and steric complementarity at the molecular interface. Images FIGURE 2 PMID:9251783

  18. DNA barcoding as a tool for coral reef conservation

    NASA Astrophysics Data System (ADS)

    Neigel, J.; Domingo, A.; Stake, J.

    2007-09-01

    DNA Barcoding (DBC) is a method for taxonomic identification of animals that is based entirely on the 5' portion of the mitochondrial gene, cytochrome oxidase subunit I ( COI-5). It can be especially useful for identification of larval forms or incomplete specimens lacking diagnostic morphological characters. DBC can also facilitate the discovery of species and in defining “molecular taxonomic units” in problematic groups. However, DBC is not a panacea for coral reef taxonomy. In two of the most ecologically important groups on coral reefs, the Anthozoa and Porifera, COI-5 sequences have diverged too little to be diagnostic for all species. Other problems for DBC include paraphyly in mitochondrial gene trees and lack of differentiation between hybrids and their maternal ancestors. DBC also depends on the availability of databases of COI-5 sequences, which are still in early stages of development. A global effort to barcode all fish species has demonstrated the importance of large-scale coordination and is yielding promising results. Whether or not COI-5 by itself is sufficient for species assignments has become a contentious question; it is generally advantageous to use sequences from multiple loci.

  19. Development of new strains and related SCAR markers for an edible mushroom, Hypsizygus marmoreus.

    PubMed

    Lee, Chang Y; Park, Jeong-Eun; Lee, Jia; Kim, Jong-Kuk; Ro, Hyeon-Su

    2012-02-01

    New fast-growing and less bitter varieties of Hypsizygus marmoreus were developed by crossing monokaryotic mycelia from a commercial strain (Hm1-1) and a wild strain (Hm3-10). Six of the better tasting new strains with a shorter cultivation period were selected from 400 crosses in a large-scale cultivation experiment. We attempted to develop sequence characterized amplified region (SCAR) markers to identify the new strain from other commercial strains. For the SCAR markers, we conducted molecular genetic analysis on a wild strain and the eight most cultivated H. marmoreus strains collected from various areas in East Asia by randomly amplified polymorphic DNA. Ten unique DNA bands for a commercial Hm1-1 strain and the Hm3-10 strain were extracted and their sequences were determined. Primer sets were designed based on the determined sequences. PCR reactions with the primer sets revealed that four primer sets successfully discriminated the new strains from other commercial strains and are thus suitable for commercial purposes. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  20. Striking Plasticity of CRISPR-Cas9 and Key Role of Non-target DNA, as Revealed by Molecular Simulations.

    PubMed

    Palermo, Giulia; Miao, Yinglong; Walker, Ross C; Jinek, Martin; McCammon, J Andrew

    2016-10-26

    The CRISPR (clustered regularly interspaced short palindromic repeats)-Cas9 system recently emerged as a transformative genome-editing technology that is innovating basic bioscience and applied medicine and biotechnology. The endonuclease Cas9 associates with a guide RNA to match and cleave complementary sequences in double stranded DNA, forming an RNA:DNA hybrid and a displaced non-target DNA strand. Although extensive structural studies are ongoing, the conformational dynamics of Cas9 and its interplay with the nucleic acids during association and DNA cleavage are largely unclear. Here, by employing multi-microsecond time scale molecular dynamics, we reveal the conformational plasticity of Cas9 and identify key determinants that allow its large-scale conformational changes during nucleic acid binding and processing. We show how the "closure" of the protein, which accompanies nucleic acid binding, fundamentally relies on highly coupled and specific motions of the protein domains, collectively initiating the prominent conformational changes needed for nucleic acid association. We further reveal a key role of the non-target DNA during the process of activation of the nuclease HNH domain, showing how the nontarget DNA positioning triggers local conformational changes that favor the formation of a catalytically competent Cas9. Finally, a remarkable conformational plasticity is identified as an intrinsic property of the HNH domain, constituting a necessary element that allows for the HNH repositioning. These novel findings constitute a reference for future experimental studies aimed at a full characterization of the dynamic features of the CRISPR-Cas9 system, and-more importantly-call for novel structure engineering efforts that are of fundamental importance for the rational design of new genome-engineering applications.

  1. Mapping DNA polymerase errors by single-molecule sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, David F.; Lu, Jenny; Chang, Seungwoo

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less

  2. Mapping DNA polymerase errors by single-molecule sequencing

    DOE PAGES

    Lee, David F.; Lu, Jenny; Chang, Seungwoo; ...

    2016-05-16

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less

  3. Smooth DNA Transport through a Narrowed Pore Geometry

    PubMed Central

    Carson, Spencer; Wilson, James; Aksimentiev, Aleksei; Wanunu, Meni

    2014-01-01

    Voltage-driven transport of double-stranded DNA through nanoscale pores holds much potential for applications in quantitative molecular biology and biotechnology, yet the microscopic details of translocation have proven to be challenging to decipher. Earlier experiments showed strong dependence of transport kinetics on pore size: fast regular transport in large pores (> 5 nm diameter), and slower yet heterogeneous transport time distributions in sub-5 nm pores, which imply a large positional uncertainty of the DNA in the pore as a function of the translocation time. In this work, we show that this anomalous transport is a result of DNA self-interaction, a phenomenon that is strictly pore-diameter dependent. We identify a regime in which DNA transport is regular, producing narrow and well-behaved dwell-time distributions that fit a simple drift-diffusion theory. Furthermore, a systematic study of the dependence of dwell time on DNA length reveals a single power-law scaling of 1.37 in the range of 35–20,000 bp. We highlight the resolution of our nanopore device by discriminating via single pulses 100 and 500 bp fragments in a mixture with >98% accuracy. When coupled to an appropriate sequence labeling method, our observation of smooth DNA translocation can pave the way for high-resolution DNA mapping and sizing applications in genomics. PMID:25418307

  4. XS: a FASTQ read simulator.

    PubMed

    Pratas, Diogo; Pinho, Armando J; Rodrigues, João M O S

    2014-01-16

    The emerging next-generation sequencing (NGS) is bringing, besides the natural huge amounts of data, an avalanche of new specialized tools (for analysis, compression, alignment, among others) and large public and private network infrastructures. Therefore, a direct necessity of specific simulation tools for testing and benchmarking is rising, such as a flexible and portable FASTQ read simulator, without the need of a reference sequence, yet correctly prepared for producing approximately the same characteristics as real data. We present XS, a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores). XS provides an efficient and convenient method for fast simulation of FASTQ files, such as those from Ion Torrent (currently uncovered by other simulators), Roche-454, Illumina and ABI-SOLiD sequencing machines. This tool is publicly available at http://bioinformatics.ua.pt/software/xs/.

  5. Click nucleic acid ligation: applications in biology and nanotechnology.

    PubMed

    El-Sagheer, Afaf H; Brown, Tom

    2012-08-21

    Biochemical strategies that use a combination of synthetic oligonucleotides, thermostable DNA polymerases, and DNA ligases can produce large DNA constructs up to 1 megabase in length. Although these ambitious targets are feasible biochemically, comparable technologies for the chemical synthesis of long DNA strands lag far behind. The best available chemical approach is the solid-phase phosphoramidite method, which can be used to assemble DNA strands up to 150 bases in length. Beyond this point, deficiencies in the chemistry make it impossible to produce pure DNA. A possible alternative approach to the chemical synthesis of large DNA strands is to join together carefully purified synthetic oligonucleotides by chemical methods. Click ligation by the copper-catalyzed azide-alkyne (CuAAC) reaction could facilitate this process. In this Account, we describe the synthesis, characterization, and applications of oligonucleotides prepared by click ligation. The alkyne and azide oligonucleotide strands can be prepared by standard protocols, and the ligation reaction is compatible with a wide range of chemical modifications to DNA and RNA. We have employed click ligation to synthesize DNA constructs up to 300 bases in length and much longer sequences are feasible. When the resulting triazole linkage is placed in a PCR template, various DNA polymerases correctly copy the entire base sequence. We have also successfully demonstrated both in vitro transcription and rolling circle amplification through the modified linkage. This linkage has shown in vivo biocompatibility: an antibiotic resistance gene containing triazole linkages functions in E. coli . Using click ligation, we have synthesized hairpin ribozymes up to 100 nucleotides in length and a hammerhead ribozyme with the triazole linkage located at the substrate cleavage site. At the opposite end of the length scale, click-ligated, cyclic mini-DNA duplexes have been used as models to study base pairing. Cyclic duplexes have potential therapeutic applications. They have extremely high thermodynamic stability, have increased resistance to enzymatic degradation, and have been investigated as decoys for regulatory proteins. For potential nanotechnology applications, we have synthesized double stranded DNA catenanes by click ligation. Other researchers have studied covalently fixed multistranded DNA constructs including triplexes and quadruplexes.

  6. Rapid isolation of microsatellite DNAs and identification of polymorphic mitochondrial DNA regions in the fish rotan (Perccottus glenii) invading European Russia

    USGS Publications Warehouse

    King, Timothy L.; Eackles, Michael S.; Reshetnikov, Andrey N.

    2015-01-01

    Human-mediated translocations and subsequent large-scale colonization by the invasive fish rotan (Perccottus glenii Dybowski, 1877; Perciformes, Odontobutidae), also known as Amur or Chinese sleeper, has resulted in dramatic transformations of small lentic ecosystems. However, no detailed genetic information exists on population structure, levels of effective movement, or relatedness among geographic populations of P. glenii within the European part of the range. We used massively parallel genomic DNA shotgun sequencing on the semiconductor-based Ion Torrent Personal Genome Machine (PGM) sequencing platform to identify nuclear microsatellite and mitochondrial DNA sequences in P. glenii from European Russia. Here we describe the characterization of nine nuclear microsatellite loci, ascertain levels of allelic diversity, heterozygosity, and demographic status of P. glenii collected from Ilev, Russia, one of several initial introduction points in European Russia. In addition, we mapped sequence reads to the complete P. glenii mitochondrial DNA sequence to identify polymorphic regions. Nuclear microsatellite markers developed for P. glenii yielded sufficient genetic diversity to: (1) produce unique multilocus genotypes; (2) elucidate structure among geographic populations; and (3) provide unique perspectives for analysis of population sizes and historical demographics. Among 4.9 million filtered P. glenii Ion Torrent PGM sequence reads, 11,304 mapped to the mitochondrial genome (NC_020350). This resulted in 100 % coverage of this genome to a mean coverage depth of 102X. A total of 130 variable sites were observed between the publicly available genome from China and the studied composite mitochondrial genome. Among these, 82 were diagnostic and monomorphic between the mitochondrial genomes and distributed among 15 genome regions. The polymorphic sites (N = 48) were distributed among 11 mitochondrial genome regions. Our results also indicate that sequence reads generated from two three-hour runs on the Ion Torrent PGM can generate a sufficient number of nuclear and mitochondrial markers to improve understanding of the evolutionary and ecological dynamics of non-model and in particular, invasive species.

  7. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

    PubMed

    Shortt, Jonathan A; Card, Daren C; Schield, Drew R; Liu, Yang; Zhong, Bo; Castoe, Todd A; Carlton, Elizabeth J; Pollock, David D

    2017-01-01

    In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other parasitic helminthes.

  8. Probing the structure and function of biopolymer-carbon nanotube hybrids with molecular dynamics

    NASA Astrophysics Data System (ADS)

    Johnson, Robert R.

    2009-12-01

    Nanoscience deals with the characterization and manipulation of matter on the atomic/molecular size scale in order to deepen our understanding of condensed matter and develop revolutionary technology. Meeting the demands of the rapidly advancing nanotechnological frontier requires novel, multifunctional nanoscale materials. Among the most promising nanomaterials to fulfill this need are biopolymer-carbon nanotube hybrids (Bio-CNT). Bio-CNT consists of a single-walled carbon nanotube (CNT) coated with a self-assembled layer of biopolymers such as DNA or protein. Experiments have demonstrated that these nanomaterials possess a wide range of technologically useful properties with applications in nanoelectronics, medicine, homeland security, environmental safety and microbiology. However, a fundamental understanding of the self-assembly mechanics, structure and energetics of Bio-CNT is lacking. The objective of this thesis is to address this deficiency through molecular dynamics (MD) simulation, which provides an atomic-scale window into the behavior of this unique nanomaterial. MD shows that Bio-CNT composed of single-stranded DNA (ssDNA) self-assembles via the formation of high affinity contacts between DNA bases and the CNT sidewall. Calculation of the base-CNT binding free energy by thermodynamic integration reveals that these contacts result from the attractive pi--pi stacking interaction. Binding affinities follow the trend G > A > T > C. MD reveals that long ssDNA sequences are driven into a helical wrapping about CNT with a sub-10 nm pitch by electrostatic and torsional interactions in the backbone. A large-scale replica exchange molecular dynamics simulation reveals that ssDNA-CNT hybrids are disordered. At room temperature, ssDNA can reside in several low-energy conformations that contain a sequence-specific arrangement of bases detached from CNT surface. MD demonstrates that protein-CNT hybrids composed of the Coxsackie-adenovirus receptor are biologically active and function as a nanobiosensor with specific recognition of Knob proteins from the adenovirus capsid. Simulation also shows that the rigid CNT damps structural fluctuations in bound proteins, which may have important ramifications for biosensing devices composed of protein-CNT hybrids. These results expand current knowledge of Bio-CNT and demonstrate the effectiveness of MD for investigations of nanobiomolecular systems.

  9. Sequence-dependent nanometer-scale conformational dynamics of individual RecBCD–DNA complexes

    PubMed Central

    Carter, Ashley R.; Seaberg, Maasa H.; Fan, Hsiu-Fang; Sun, Gang; Wilds, Christopher J.; Li, Hung-Wen; Perkins, Thomas T.

    2016-01-01

    RecBCD is a multifunctional enzyme that possesses both helicase and nuclease activities. To gain insight into the mechanism of its helicase function, RecBCD unwinding at low adenosine triphosphate (ATP) (2–4 μM) was measured using an optical-trapping assay featuring 1 base-pair (bp) precision. Instead of uniformly sized steps, we observed forward motion convolved with rapid, large-scale (∼4 bp) variations in DNA length. We interpret this motion as conformational dynamics of the RecBCD–DNA complex in an unwinding-competent state, arising, in part, by an enzyme-induced, back-and-forth motion relative to the dsDNA that opens and closes the duplex. Five observations support this interpretation. First, these dynamics were present in the absence of ATP. Second, the onset of the dynamics was coupled to RecBCD entering into an unwinding-competent state that required a sufficiently long 5′ strand to engage the RecD helicase. Third, the dynamics were modulated by the GC-content of the dsDNA. Fourth, the dynamics were suppressed by an engineered interstrand cross-link in the dsDNA that prevented unwinding. Finally, these dynamics were suppressed by binding of a specific non-hydrolyzable ATP analog. Collectively, these observations show that during unwinding, RecBCD binds to DNA in a dynamic mode that is modulated by the nucleotide state of the ATP-binding pocket. PMID:27220465

  10. An Exploration into Fern Genome Space.

    PubMed

    Wolf, Paul G; Sessa, Emily B; Marchant, Daniel Blaine; Li, Fay-Wei; Rothfels, Carl J; Sigel, Erin M; Gitzendanner, Matthew A; Visger, Clayton J; Banks, Jo Ann; Soltis, Douglas E; Soltis, Pamela S; Pryer, Kathleen M; Der, Joshua P

    2015-08-26

    Ferns are one of the few remaining major clades of land plants for which a complete genome sequence is lacking. Knowledge of genome space in ferns will enable broad-scale comparative analyses of land plant genes and genomes, provide insights into genome evolution across green plants, and shed light on genetic and genomic features that characterize ferns, such as their high chromosome numbers and large genome sizes. As part of an initial exploration into fern genome space, we used a whole genome shotgun sequencing approach to obtain low-density coverage (∼0.4X to 2X) for six fern species from the Polypodiales (Ceratopteris, Pteridium, Polypodium, Cystopteris), Cyatheales (Plagiogyria), and Gleicheniales (Dipteris). We explore these data to characterize the proportion of the nuclear genome represented by repetitive sequences (including DNA transposons, retrotransposons, ribosomal DNA, and simple repeats) and protein-coding genes, and to extract chloroplast and mitochondrial genome sequences. Such initial sweeps of fern genomes can provide information useful for selecting a promising candidate fern species for whole genome sequencing. We also describe variation of genomic traits across our sample and highlight some differences and similarities in repeat structure between ferns and seed plants. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  12. Molecular microbial and chemical investigation of the bioremediation of two-phase olive mill waste using laboratory-scale bioreactors.

    PubMed

    Morillo, J A; Aguilera, M; Antízar-Ladislao, B; Fuentes, S; Ramos-Cormenzana, A; Russell, N J; Monteoliva-Sánchez, M

    2008-05-01

    Two-phase olive mill waste (TPOMW) is a semisolid effluent that is rich in contaminating polyphenols and is produced in large amounts by the industry of olive oil production. Laboratory-scale bioreactors were used to investigate the biodegradation of TPOMW by its indigenous microbiota. The effect of nutrient addition (inorganic N and P) and aeration of the bioreactors was studied. Microbial changes were investigated by PCR-temperature time gradient electrophoresis (TTGE) and following the dynamics of polar lipid fatty acids (PLFA). The greatest decrease in the polyphenolic and organic matter contents of bioreactors was concomitant with an increase in the PLFA fungal/bacterial ratio. Amplicon sequences of nuclear ribosomal internal transcribed spacer region (ITS) and 16S rDNA allowed identification of fungal and bacterial types, respectively, by comparative DNA sequence analyses. Predominant fungi identified included members of the genera Penicillium, Candida, Geotrichum, Pichia, Cladosporium, and Aschochyta. A total of 14 bacterial genera were detected, with a dominance of organisms that have previously been associated with plant material. Overall, this work highlights that indigenous microbiota within the bioreactors through stimulation of the fungal fraction, is able to degrade the polyphenolic content without the inoculation of specific microorganisms.

  13. A laboratory information management system for DNA barcoding workflows.

    PubMed

    Vu, Thuy Duong; Eberhardt, Ursula; Szöke, Szániszló; Groenewald, Marizeth; Robert, Vincent

    2012-07-01

    This paper presents a laboratory information management system for DNA sequences (LIMS) created and based on the needs of a DNA barcoding project at the CBS-KNAW Fungal Biodiversity Centre (Utrecht, the Netherlands). DNA barcoding is a global initiative for species identification through simple DNA sequence markers. We aim at generating barcode data for all strains (or specimens) included in the collection (currently ca. 80 k). The LIMS has been developed to better manage large amounts of sequence data and to keep track of the whole experimental procedure. The system has allowed us to classify strains more efficiently as the quality of sequence data has improved, and as a result, up-to-date taxonomic names have been given to strains and more accurate correlation analyses have been carried out.

  14. The genomic organization of a human creatine transporter (CRTR) gene located in Xq28

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sandoval, N.; Bauer, D.; Brenner, V.

    1996-07-15

    During the course of a large-scale sequencing project in Xq28, a human creatine transporter (CRTR) gene was discovered. The gene is located approximately 36 kb centromeric to ALD. The gene contains 13 exons and spans about 8.5 kb of genomic DNA. Since the creatine transporter has a prominent function in muscular physiology, it is a candidate gene for Barth syndrome and infantile cardiomyopathy mapped to Xq28. 19 refs., 1 fig., 1 tab.

  15. Privacy Challenges of Genomic Big Data.

    PubMed

    Shen, Hong; Ma, Jian

    2017-01-01

    With the rapid advancement of high-throughput DNA sequencing technologies, genomics has become a big data discipline where large-scale genetic information of human individuals can be obtained efficiently with low cost. However, such massive amount of personal genomic data creates tremendous challenge for privacy, especially given the emergence of direct-to-consumer (DTC) industry that provides genetic testing services. Here we review the recent development in genomic big data and its implications on privacy. We also discuss the current dilemmas and future challenges of genomic privacy.

  16. Structural Analysis of Biodiversity

    PubMed Central

    Sirovich, Lawrence; Stoeckle, Mark Y.; Zhang, Yu

    2010-01-01

    Large, recently-available genomic databases cover a wide range of life forms, suggesting opportunity for insights into genetic structure of biodiversity. In this study we refine our recently-described technique using indicator vectors to analyze and visualize nucleotide sequences. The indicator vector approach generates correlation matrices, dubbed Klee diagrams, which represent a novel way of assembling and viewing large genomic datasets. To explore its potential utility, here we apply the improved algorithm to a collection of almost 17000 DNA barcode sequences covering 12 widely-separated animal taxa, demonstrating that indicator vectors for classification gave correct assignment in all 11000 test cases. Indicator vector analysis revealed discontinuities corresponding to species- and higher-level taxonomic divisions, suggesting an efficient approach to classification of organisms from poorly-studied groups. As compared to standard distance metrics, indicator vectors preserve diagnostic character probabilities, enable automated classification of test sequences, and generate high-information density single-page displays. These results support application of indicator vectors for comparative analysis of large nucleotide data sets and raise prospect of gaining insight into broad-scale patterns in the genetic structure of biodiversity. PMID:20195371

  17. The mitochondrial genome of the pathogenic yeast Candida subhashii: GC-rich linear DNA with a protein covalently attached to the 5′ termini

    PubMed Central

    Fricova, Dominika; Valach, Matus; Farkas, Zoltan; Pfeiffer, Ilona; Kucsera, Judit; Tomaska, Lubomir; Nosek, Jozef

    2010-01-01

    As a part of our initiative aimed at a large-scale comparative analysis of fungal mitochondrial genomes, we determined the complete DNA sequence of the mitochondrial genome of the yeast Candida subhashii and found that it exhibits a number of peculiar features. First, the mitochondrial genome is represented by linear dsDNA molecules of uniform length (29 795 bp), with an unusually high content of guanine and cytosine residues (52.7 %). Second, the coding sequences lack introns; thus, the genome has a relatively compact organization. Third, the termini of the linear molecules consist of long inverted repeats and seem to contain a protein covalently bound to terminal nucleotides at the 5′ ends. This architecture resembles the telomeres in a number of linear viral and plasmid DNA genomes classified as invertrons, in which the terminal proteins serve as specific primers for the initiation of DNA synthesis. Finally, although the mitochondrial genome of C. subhashii contains essentially the same set of genes as other closely related pathogenic Candida species, we identified additional ORFs encoding two homologues of the family B protein-priming DNA polymerases and an unknown protein. The terminal structures and the genes for DNA polymerases are reminiscent of linear mitochondrial plasmids, indicating that this genome architecture might have emerged from fortuitous recombination between an ancestral, presumably circular, mitochondrial genome and an invertron-like element. PMID:20395267

  18. Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules.

    PubMed

    Li, Yueqi; Xiang, Limin; Palma, Julio L; Asai, Yoshihiro; Tao, Nongjian

    2016-04-15

    Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models.

  19. Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules

    PubMed Central

    Li, Yueqi; Xiang, Limin; Palma, Julio L.; Asai, Yoshihiro; Tao, Nongjian

    2016-01-01

    Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models. PMID:27079152

  20. Chemical synthesis and characterization of branched oligodeoxyribonucleotides (bDNA) for use as signal amplifiers in nucleic acid quantification assays.

    PubMed

    Horn, T; Chang, C A; Urdea, M S

    1997-12-01

    The divergent synthesis of bDNA structures is described. This new type of branched DNA contains one unique oligonucleotide, the primary sequence, covalently attached through a comb-like branching network to many identical copies of a different oligonucleotide, the secondary sequence. The bDNA comb molecules were assembled on a solid support using parameters optimized for bDNA synthesis. The chemistry was used to synthesize bDNA comb molecules containing 15 secondary sequences. The bDNA comb molecules were elaborated by enzymatic ligation into branched amplification multimers, large bDNA molecules (a total of 1068 nt) containing an average of 36 repeated DNA oligomer sequences, each capable of hybridizing specifically to an alkaline phosphatase-labeled oligonucleotide. The bDNA comb molecules were characterized by electrophoretic methods and by controlled cleavage at periodate-cleavable moieties incorporated during synthesis. The branched amplification multimers have been used as signal amplifiers in nucleic acid quantification assays for detection of viral infection. It is possible to detect as few as 50 molecules with bDNA technology.

  1. Chemical synthesis and characterization of branched oligodeoxyribonucleotides (bDNA) for use as signal amplifiers in nucleic acid quantification assays.

    PubMed Central

    Horn, T; Chang, C A; Urdea, M S

    1997-01-01

    The divergent synthesis of bDNA structures is described. This new type of branched DNA contains one unique oligonucleotide, the primary sequence, covalently attached through a comb-like branching network to many identical copies of a different oligonucleotide, the secondary sequence. The bDNA comb molecules were assembled on a solid support using parameters optimized for bDNA synthesis. The chemistry was used to synthesize bDNA comb molecules containing 15 secondary sequences. The bDNA comb molecules were elaborated by enzymatic ligation into branched amplification multimers, large bDNA molecules (a total of 1068 nt) containing an average of 36 repeated DNA oligomer sequences, each capable of hybridizing specifically to an alkaline phosphatase-labeled oligonucleotide. The bDNA comb molecules were characterized by electrophoretic methods and by controlled cleavage at periodate-cleavable moieties incorporated during synthesis. The branched amplification multimers have been used as signal amplifiers in nucleic acid quantification assays for detection of viral infection. It is possible to detect as few as 50 molecules with bDNA technology. PMID:9365266

  2. Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia

    PubMed Central

    2014-01-01

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G + C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems. PMID:24655715

  3. Comparative genomic analysis of single-molecule sequencing and hybrid approaches for finishing the Clostridium autoethanogenum JA1-1 strain DSM 10061 genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, Steven D; Nagaraju, Shilpa; Utturkar, Sagar M

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G +more » C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems.« less

  4. A compact, in vivo screen of all 6-mers reveals drivers of tissue-specific expression and guides synthetic regulatory element design.

    PubMed

    Smith, Robin P; Riesenfeld, Samantha J; Holloway, Alisha K; Li, Qiang; Murphy, Karl K; Feliciano, Natalie M; Orecchia, Lorenzo; Oksenberg, Nir; Pollard, Katherine S; Ahituv, Nadav

    2013-07-18

    Large-scale annotation efforts have improved our ability to coarsely predict regulatory elements throughout vertebrate genomes. However, it is unclear how complex spatiotemporal patterns of gene expression driven by these elements emerge from the activity of short, transcription factor binding sequences. We describe a comprehensive promoter extension assay in which the regulatory potential of all 6 base-pair (bp) sequences was tested in the context of a minimal promoter. To enable this large-scale screen, we developed algorithms that use a reverse-complement aware decomposition of the de Bruijn graph to design a library of DNA oligomers incorporating every 6-bp sequence exactly once. Our library multiplexes all 4,096 unique 6-mers into 184 double-stranded 15-bp oligomers, which is sufficiently compact for in vivo testing. We injected each multiplexed construct into zebrafish embryos and scored GFP expression in 15 tissues at two developmental time points. Twenty-seven constructs produced consistent expression patterns, with the majority doing so in only one tissue. Functional sequences are enriched near biologically relevant genes, match motifs for developmental transcription factors, and are required for enhancer activity. By concatenating tissue-specific functional sequences, we generated completely synthetic enhancers for the notochord, epidermis, spinal cord, forebrain and otic lateral line, and show that short regulatory sequences do not always function modularly. This work introduces a unique in vivo catalog of short, functional regulatory sequences and demonstrates several important principles of regulatory element organization. Furthermore, we provide resources for designing compact, reverse-complement aware k-mer libraries.

  5. Poly A- transcripts expressed in HeLa cells.

    PubMed

    Wu, Qingfa; Kim, Yeong C; Lu, Jian; Xuan, Zhenyu; Chen, Jun; Zheng, Yonglan; Zhou, Tom; Zhang, Michael Q; Wu, Chung-I; Wang, San Ming

    2008-07-30

    Transcripts expressed in eukaryotes are classified as poly A+ transcripts or poly A- transcripts based on the presence or absence of the 3' poly A tail. Most transcripts identified so far are poly A+ transcripts, whereas the poly A- transcripts remain largely unknown. We developed the TRD (Total RNA Detection) system for transcript identification. The system detects the transcripts through the following steps: 1) depleting the abundant ribosomal and small-size transcripts; 2) synthesizing cDNA without regard to the status of the 3' poly A tail; 3) applying the 454 sequencing technology for massive 3' EST collection from the cDNA; and 4) determining the genome origins of the detected transcripts by mapping the sequences to the human genome reference sequences. Using this system, we characterized the cytoplasmic transcripts from HeLa cells. Of the 13,467 distinct 3' ESTs analyzed, 24% are poly A-, 36% are poly A+, and 40% are bimorphic with poly A+ features but without the 3' poly A tail. Most of the poly A- 3' ESTs do not match known transcript sequences; they have a similar distribution pattern in the genome as the poly A+ and bimorphic 3' ESTs, and their mapped intergenic regions are evolutionarily conserved. Experiments confirmed the authenticity of the detected poly A- transcripts. Our study provides the first large-scale sequence evidence for the presence of poly A- transcripts in eukaryotes. The abundance of the poly A- transcripts highlights the need for comprehensive identification of these transcripts for decoding the transcriptome, annotating the genome and studying biological relevance of the poly A- transcripts.

  6. Cadmium sulfide nanocluster-based electrochemical stripping detection of DNA hybridization.

    PubMed

    Zhu, Ningning; Zhang, Aiping; He, Pingang; Fang, Yuzhi

    2003-03-01

    A novel, sensitive electrochemical DNA hybridization detection assay, using cadmium sulfide (CdS) nanoclusters as the oligonucleotide labeling tag, is described. The assay relies on the hybridization of the target DNA with the CdS nanocluster oligonucleotide DNA probe, followed by the dissolution of the CdS nanoclusters anchored on the hybrids and the indirect determination of the dissolved cadmium ions by sensitive anodic stripping voltammetry (ASV) at a mercury-coated glassy carbon electrode (GCE). The results showed that only a complementary sequence could form a double-stranded dsDNA-CdS with the DNA probe and give an obvious electrochemical response. A three-base mismatch sequence and non-complementary sequence had negligible response. The combination of the large number of cadmium ions released from each dsDNA hybrid with the remarkable sensitivity of the electrochemical stripping analysis for cadmium at mercury-film GCE allows detection at levels as low as 0.2 pmol L(-1) of the complementary sequence of DNA.

  7. Inquiry-based experiments for large-scale introduction to PCR and restriction enzyme digests.

    PubMed

    Johanson, Kelly E; Watt, Terry J

    2015-01-01

    Polymerase chain reaction and restriction endonuclease digest are important techniques that should be included in all Biochemistry and Molecular Biology laboratory curriculums. These techniques are frequently taught at an advanced level, requiring many hours of student and faculty time. Here we present two inquiry-based experiments that are designed for introductory laboratory courses and combine both techniques. In both approaches, students must determine the identity of an unknown DNA sequence, either a gene sequence or a primer sequence, based on a combination of PCR product size and restriction digest pattern. The experimental design is flexible, and can be adapted based on available instructor preparation time and resources, and both approaches can accommodate large numbers of students. We implemented these experiments in our courses with a combined total of 584 students and have an 85% success rate. Overall, students demonstrated an increase in their understanding of the experimental topics, ability to interpret the resulting data, and proficiency in general laboratory skills. © 2015 The International Union of Biochemistry and Molecular Biology.

  8. Brown and polar bear Y chromosomes reveal extensive male-biased gene flow within brother lineages.

    PubMed

    Bidon, Tobias; Janke, Axel; Fain, Steven R; Eiken, Hans Geir; Hagen, Snorre B; Saarma, Urmas; Hallström, Björn M; Lecomte, Nicolas; Hailer, Frank

    2014-06-01

    Brown and polar bears have become prominent examples in phylogeography, but previous phylogeographic studies relied largely on maternally inherited mitochondrial DNA (mtDNA) or were geographically restricted. The male-specific Y chromosome, a natural counterpart to mtDNA, has remained underexplored. Although this paternally inherited chromosome is indispensable for comprehensive analyses of phylogeographic patterns, technical difficulties and low variability have hampered its application in most mammals. We developed 13 novel Y-chromosomal sequence and microsatellite markers from the polar bear genome and screened these in a broad geographic sample of 130 brown and polar bears. We also analyzed a 390-kb-long Y-chromosomal scaffold using sequencing data from published male ursine genomes. Y chromosome evidence support the emerging understanding that brown and polar bears started to diverge no later than the Middle Pleistocene. Contrary to mtDNA patterns, we found 1) brown and polar bears to be reciprocally monophyletic sister (or rather brother) lineages, without signals of introgression, 2) male-biased gene flow across continents and on phylogeographic time scales, and 3) male dispersal that links the Alaskan ABC islands population to mainland brown bears. Due to female philopatry, mtDNA provides a highly structured estimate of population differentiation, while male-biased gene flow is a homogenizing force for nuclear genetic variation. Our findings highlight the importance of analyzing both maternally and paternally inherited loci for a comprehensive view of phylogeographic history, and that mtDNA-based phylogeographic studies of many mammals should be reevaluated. Recent advances in sequencing technology render the analysis of Y-chromosomal variation feasible, even in nonmodel organisms. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Multi-modulus algorithm based on global artificial fish swarm intelligent optimization of DNA encoding sequences.

    PubMed

    Guo, Y C; Wang, H; Wu, H P; Zhang, M Q

    2015-12-21

    Aimed to address the defects of the large mean square error (MSE), and the slow convergence speed in equalizing the multi-modulus signals of the constant modulus algorithm (CMA), a multi-modulus algorithm (MMA) based on global artificial fish swarm (GAFS) intelligent optimization of DNA encoding sequences (GAFS-DNA-MMA) was proposed. To improve the convergence rate and reduce the MSE, this proposed algorithm adopted an encoding method based on DNA nucleotide chains to provide a possible solution to the problem. Furthermore, the GAFS algorithm, with its fast convergence and global search ability, was used to find the best sequence. The real and imaginary parts of the initial optimal weight vector of MMA were obtained through DNA coding of the best sequence. The simulation results show that the proposed algorithm has a faster convergence speed and smaller MSE in comparison with the CMA, the MMA, and the AFS-DNA-MMA.

  10. Plant DNA sequences from feces: potential means for assessing diets of wild primates.

    PubMed

    Bradley, Brenda J; Stiller, Mathias; Doran-Sheehy, Diane M; Harris, Tara; Chapman, Colin A; Vigilant, Linda; Poinar, Hendrik

    2007-06-01

    Analyses of plant DNA in feces provides a promising, yet largely unexplored, means of documenting the diets of elusive primates. Here we demonstrate the promise and pitfalls of this approach using DNA extracted from fecal samples of wild western gorillas (Gorilla gorilla) and black and white colobus monkeys (Colobus guereza). From these DNA extracts we amplified, cloned, and sequenced small segments of chloroplast DNA (part of the rbcL gene) and plant nuclear DNA (ITS-2). The obtained sequences were compared to sequences generated from known plant samples and to those in GenBank to identify plant taxa in the feces. With further optimization, this method could provide a basic evaluation of minimum primate dietary diversity even when knowledge of local flora is limited. This approach may find application in studies characterizing the diets of poorly-known, unhabituated primate species or assaying consumer-resource relationships in an ecosystem. (c) 2007 Wiley-Liss, Inc.

  11. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    PubMed

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well.

  12. Chemical biology on the genome.

    PubMed

    Balasubramanian, Shankar

    2014-08-15

    In this article I discuss studies towards understanding the structure and function of DNA in the context of genomes from the perspective of a chemist. The first area I describe concerns the studies that led to the invention and subsequent development of a method for sequencing DNA on a genome scale at high speed and low cost, now known as Solexa/Illumina sequencing. The second theme will feature the four-stranded DNA structure known as a G-quadruplex with a focus on its fundamental properties, its presence in cellular genomic DNA and the prospects for targeting such a structure in cels with small molecules. The final topic for discussion is naturally occurring chemically modified DNA bases with an emphasis on chemistry for decoding (or sequencing) such modifications in genomic DNA. The genome is a fruitful topic to be further elucidated by the creation and application of chemical approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Rapid DNA Sequencing by Direct Nanoscale Reading of Nucleotide Bases on Individual DNA Chains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, James Weifu; Meller, Amit

    2007-01-01

    Since the independent invention of DNA sequencing by Sanger and by Gilbert 30 years ago, it has grown from a small scale technique capable of reading several kilobase-pair of sequence per day into today's multibillion dollar industry. This growth has spurred the development of new sequencing technologies that do not involve either electrophoresis or Sanger sequencing chemistries. Sequencing by Synthesis (SBS) involves multiple parallel micro-sequencing addition events occurring on a surface, where data from each round is detected by imaging. New High Throughput Technologies for DNA Sequencing and Genomics is the second volume in the Perspectives in Bioanalysis series, whichmore » looks at the electroanalytical chemistry of nucleic acids and proteins, development of electrochemical sensors and their application in biomedicine and in the new fields of genomics and proteomics. The authors have expertly formatted the information for a wide variety of readers, including new developments that will inspire students and young scientists to create new tools for science and medicine in the 21st century. Reviews of complementary developments in Sanger and SBS sequencing chemistries, capillary electrophoresis and microdevice integration, MS sequencing and applications set the framework for the book.« less

  14. The study of human Y chromosome variation through ancient DNA.

    PubMed

    Kivisild, Toomas

    2017-05-01

    High throughput sequencing methods have completely transformed the study of human Y chromosome variation by offering a genome-scale view on genetic variation retrieved from ancient human remains in context of a growing number of high coverage whole Y chromosome sequence data from living populations from across the world. The ancient Y chromosome sequences are providing us the first exciting glimpses into the past variation of male-specific compartment of the genome and the opportunity to evaluate models based on previously made inferences from patterns of genetic variation in living populations. Analyses of the ancient Y chromosome sequences are challenging not only because of issues generally related to ancient DNA work, such as DNA damage-induced mutations and low content of endogenous DNA in most human remains, but also because of specific properties of the Y chromosome, such as its highly repetitive nature and high homology with the X chromosome. Shotgun sequencing of uniquely mapping regions of the Y chromosomes to sufficiently high coverage is still challenging and costly in poorly preserved samples. To increase the coverage of specific target SNPs capture-based methods have been developed and used in recent years to generate Y chromosome sequence data from hundreds of prehistoric skeletal remains. Besides the prospects of testing directly as how much genetic change in a given time period has accompanied changes in material culture the sequencing of ancient Y chromosomes allows us also to better understand the rate at which mutations accumulate and get fixed over time. This review considers genome-scale evidence on ancient Y chromosome diversity that has recently started to accumulate in geographic areas favourable to DNA preservation. More specifically the review focuses on examples of regional continuity and change of the Y chromosome haplogroups in North Eurasia and in the New World.

  15. Extraordinary Structured Noncoding RNAs Revealed by Bacterial Metagenome Analysis

    PubMed Central

    Weinberg, Zasha; Perreault, Jonathan; Meyer, Michelle M.; Breaker, Ronald R.

    2012-01-01

    Estimates of the total number of bacterial species1-3 suggest that existing DNA sequence databases carry only a tiny fraction of the total amount of DNA sequence space represented by this division of life. Indeed, environmental DNA samples have been shown to encode many previously unknown classes of proteins4 and RNAs5. Bioinformatics searches6-10 of genomic DNA from bacteria commonly identify novel noncoding RNAs (ncRNAs)10-12 such as riboswitches13,14. In rare instances, RNAs that exhibit more extensive sequence and structural conservation across a wide range of bacteria are encountered15,16. Given that large structured RNAs are known to carry out complex biochemical functions such as protein synthesis and RNA processing reactions, identifying more RNAs of great size and intricate structure is likely to reveal additional biochemical functions that can be achieved by RNA. We applied an updated computational pipeline17 to discover ncRNAs that rival the known large ribozymes in size and structural complexity or that are among the most abundant RNAs in bacteria that encode them. These RNAs would have been difficult or impossible to detect without examining environmental DNA sequences, suggesting that numerous RNAs with extraordinary size, structural complexity, or other exceptional characteristics remain to be discovered in unexplored sequence space. PMID:19956260

  16. Construction of a robust microarray from a non-model species (largemouth bass) using pyrosequencing technology

    PubMed Central

    Garcia-Reyero, Natàlia; Griffitt, Robert J.; Liu, Li; Kroll, Kevin J.; Farmerie, William G.; Barber, David S.; Denslow, Nancy D.

    2009-01-01

    A novel custom microarray for largemouth bass (Micropterus salmoides) was designed with sequences obtained from a normalized cDNA library using the 454 Life Sciences GS-20 pyrosequencer. This approach yielded in excess of 58 million bases of high-quality sequence. The sequence information was combined with 2,616 reads obtained by traditional suppressive subtractive hybridizations to derive a total of 31,391 unique sequences. Annotation and coding sequences were predicted for these transcripts where possible. 16,350 annotated transcripts were selected as target sequences for the design of the custom largemouth bass oligonucleotide microarray. The microarray was validated by examining the transcriptomic response in male largemouth bass exposed to 17β-œstradiol. Transcriptomic responses were assessed in liver and gonad, and indicated gene expression profiles typical of exposure to œstradiol. The results demonstrate the potential to rapidly create the tools necessary to assess large scale transcriptional responses in non-model species, paving the way for expanded impact of toxicogenomics in ecotoxicology. PMID:19936325

  17. Nanopore-based fourth-generation DNA sequencing technology.

    PubMed

    Feng, Yanxiao; Zhang, Yuechuan; Ying, Cuifeng; Wang, Deqiang; Du, Chunlei

    2015-02-01

    Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  18. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline.

    PubMed

    Reid, Jeffrey G; Carroll, Andrew; Veeraraghavan, Narayanan; Dahdouli, Mahmoud; Sundquist, Andreas; English, Adam; Bainbridge, Matthew; White, Simon; Salerno, William; Buhay, Christian; Yu, Fuli; Muzny, Donna; Daly, Richard; Duyk, Geoff; Gibbs, Richard A; Boerwinkle, Eric

    2014-01-29

    Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.

  19. Mapping the Space of Genomic Signatures

    PubMed Central

    Kari, Lila; Hill, Kathleen A.; Sayem, Abu S.; Karamichalis, Rallis; Bryans, Nathaniel; Davis, Katelyn; Dattani, Nikesh S.

    2015-01-01

    We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber. PMID:26000734

  20. Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.

    PubMed

    Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil

    2015-07-17

    In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.

  1. A New Perspective on Polyploid Fragaria (Strawberry) Genome Composition Based on Large-Scale, Multi-Locus Phylogenetic Analysis

    PubMed Central

    Yang, Yilong

    2017-01-01

    Abstract The subgenomic compositions of the octoploid (2n = 8× = 56) strawberry (Fragaria) species, including the economically important cultivated species Fragaria x ananassa, have been a topic of long-standing interest. Phylogenomic approaches utilizing next-generation sequencing technologies offer a new window into species relationships and the subgenomic compositions of polyploids. We have conducted a large-scale phylogenetic analysis of Fragaria (strawberry) species using the Fluidigm Access Array system and 454 sequencing platform. About 24 single-copy or low-copy nuclear genes distributed across the genome were amplified and sequenced from 96 genomic DNA samples representing 16 Fragaria species from diploid (2×) to decaploid (10×), including the most extensive sampling of octoploid taxa yet reported. Individual gene trees were constructed by different tree-building methods. Mosaic genomic structures of diploid Fragaria species consisting of sequences at different phylogenetic positions were observed. Our findings support the presence in octoploid species of genetic signatures from at least five diploid ancestors (F. vesca, F. iinumae, F. bucharica, F. viridis, and at least one additional allele contributor of unknown identity), and questions the extent to which distinct subgenomes are preserved over evolutionary time in the allopolyploid Fragaria species. In addition, our data support divergence between the two wild octoploid species, F. virginiana and F. chiloensis. PMID:29045639

  2. The DNA Methylome of Human Peripheral Blood Mononuclear Cells

    PubMed Central

    Ye, Mingzhi; Zheng, Hancheng; Yu, Jian; Wu, Honglong; Sun, Jihua; Zhang, Hongyu; Chen, Quan; Luo, Ruibang; Chen, Minfeng; He, Yinghua; Jin, Xin; Zhang, Qinghui; Yu, Chang; Zhou, Guangyu; Sun, Jinfeng; Huang, Yebo; Zheng, Huisong; Cao, Hongzhi; Zhou, Xiaoyu; Guo, Shicheng; Hu, Xueda; Li, Xin; Kristiansen, Karsten; Bolund, Lars; Xu, Jiujin; Wang, Wen; Yang, Huanming; Wang, Jian; Li, Ruiqiang; Beck, Stephan; Wang, Jun; Zhang, Xiuqing

    2010-01-01

    DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies. PMID:21085693

  3. Molecular Precision at Micrometer Length Scales: Hierarchical Assembly of DNA-Protein Nanostructures.

    PubMed

    Schiffels, Daniel; Szalai, Veronika A; Liddle, J Alexander

    2017-07-25

    Robust self-assembly across length scales is a ubiquitous feature of biological systems but remains challenging for synthetic structures. Taking a cue from biology-where disparate molecules work together to produce large, functional assemblies-we demonstrate how to engineer microscale structures with nanoscale features: Our self-assembly approach begins by using DNA polymerase to controllably create double-stranded DNA (dsDNA) sections on a single-stranded template. The single-stranded DNA (ssDNA) sections are then folded into a mechanically flexible skeleton by the origami method. This process simultaneously shapes the structure at the nanoscale and directs the large-scale geometry. The DNA skeleton guides the assembly of RecA protein filaments, which provides rigidity at the micrometer scale. We use our modular design strategy to assemble tetrahedral, rectangular, and linear shapes of defined dimensions. This method enables the robust construction of complex assemblies, greatly extending the range of DNA-based self-assembly methods.

  4. A genomic regulatory network for development

    NASA Technical Reports Server (NTRS)

    Davidson, Eric H.; Rast, Jonathan P.; Oliveri, Paola; Ransick, Andrew; Calestani, Cristina; Yuh, Chiou-Hwa; Minokawa, Takuya; Amore, Gabriele; Hinman, Veronica; Arenas-Mena, Cesar; hide

    2002-01-01

    Development of the body plan is controlled by large networks of regulatory genes. A gene regulatory network that controls the specification of endoderm and mesoderm in the sea urchin embryo is summarized here. The network was derived from large-scale perturbation analyses, in combination with computational methodologies, genomic data, cis-regulatory analysis, and molecular embryology. The network contains over 40 genes at present, and each node can be directly verified at the DNA sequence level by cis-regulatory analysis. Its architecture reveals specific and general aspects of development, such as how given cells generate their ordained fates in the embryo and why the process moves inexorably forward in developmental time.

  5. Conformational heterogeneity and bubble dynamics in single bacterial transcription initiation complexes

    PubMed Central

    Duchi, Diego; Gryte, Kristofer; Robb, Nicole C; Morichaud, Zakia; Sheppard, Carol; Wigneshweraraj, Sivaramesh

    2018-01-01

    Abstract Transcription initiation is a major step in gene regulation for all organisms. In bacteria, the promoter DNA is first recognized by RNA polymerase (RNAP) to yield an initial closed complex. This complex subsequently undergoes conformational changes resulting in DNA strand separation to form a transcription bubble and an RNAP-promoter open complex; however, the series and sequence of conformational changes, and the factors that influence them are unclear. To address the conformational landscape and transitions in transcription initiation, we applied single-molecule Förster resonance energy transfer (smFRET) on immobilized Escherichia coli transcription open complexes. Our results revealed the existence of two stable states within RNAP–DNA complexes in which the promoter DNA appears to adopt closed and partially open conformations, and we observed large-scale transitions in which the transcription bubble fluctuated between open and closed states; these transitions, which occur roughly on the 0.1 s timescale, are distinct from the millisecond-timescale dynamics previously observed within diffusing open complexes. Mutational studies indicated that the σ70 region 3.2 of the RNAP significantly affected the bubble dynamics. Our results have implications for many steps of transcription initiation, and support a bend-load-open model for the sequence of transitions leading to bubble opening during open complex formation. PMID:29177430

  6. A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences

    NASA Technical Reports Server (NTRS)

    Ho, P. S.; Ellison, M. J.; Quigley, G. J.; Rich, A.

    1986-01-01

    The ease with which a particular DNA segment adopts the left-handed Z-conformation depends largely on the sequence and on the degree of negative supercoiling to which it is subjected. We describe a computer program (Z-hunt) that is designed to search long sequences of naturally occurring DNA and retrieve those nucleotide combinations of up to 24 bp in length which show a strong propensity for Z-DNA formation. Incorporated into Z-hunt is a statistical mechanical model based on empirically determined energetic parameters for the B to Z transition accumulated to date. The Z-forming potential of a sequence is assessed by ranking its behavior as a function of negative superhelicity relative to the behavior of similar sized randomly generated nucleotide sequences assembled from over 80,000 combinations. The program makes it possible to compare directly the Z-forming potential of sequences with different base compositions and different sequence lengths. Using Z-hunt, we have analyzed the DNA sequences of the bacteriophage phi X174, plasmid pBR322, the animal virus SV40 and the replicative form of the eukaryotic adenovirus-2. The results are compared with those previously obtained by others from experiments designed to locate Z-DNA forming regions in these sequences using probes which show specificity for the left-handed DNA conformation.

  7. APPLICATION OF DNA MICROARRAYS TO REPRODUCTIVE TOXICOLOGY AND THE DEVELOPMENT OF A TESTIS ARRAY

    EPA Science Inventory

    With the advent of sequence information for entire mammalian genomes, it is now possible to analyze gene expression and gene polymorphisms on a genomic scale. The primary tool for analysis of gene expression is the DNA microarray. We have used commercially available cDNA micro...

  8. Error Rate Comparison during Polymerase Chain Reaction by DNA Polymerase

    DOE PAGES

    McInerney, Peter; Adams, Paul; Hadi, Masood Z.

    2014-01-01

    As larger-scale cloning projects become more prevalent, there is an increasing need for comparisons among high fidelity DNA polymerases used for PCR amplification. All polymerases marketed for PCR applications are tested for fidelity properties (i.e., error rate determination) by vendors, and numerous literature reports have addressed PCR enzyme fidelity. Nonetheless, it is often difficult to make direct comparisons among different enzymes due to numerous methodological and analytical differences from study to study. We have measured the error rates for 6 DNA polymerases commonly used in PCR applications, including 3 polymerases typically used for cloning applications requiring high fidelity. Error ratemore » measurement values reported here were obtained by direct sequencing of cloned PCR products. The strategy employed here allows interrogation of error rate across a very large DNA sequence space, since 94 unique DNA targets were used as templates for PCR cloning. The six enzymes included in the study, Taq polymerase, AccuPrime-Taq High Fidelity, KOD Hot Start, cloned Pfu polymerase, Phusion Hot Start, and Pwo polymerase, we find the lowest error rates with Pfu , Phusion, and Pwo polymerases. Error rates are comparable for these 3 enzymes and are >10x lower than the error rate observed with Taq polymerase. Mutation spectra are reported, with the 3 high fidelity enzymes displaying broadly similar types of mutations. For these enzymes, transition mutations predominate, with little bias observed for type of transition.« less

  9. DnaSAM: Software to perform neutrality testing for large datasets with complex null models.

    PubMed

    Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B

    2010-05-01

    Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.

  10. Effect of ionic strength and cationic DNA affinity binders on the DNA sequence selective alkylation of guanine N7-positions by nitrogen mustards

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hartley, J.A.; Forrow, S.M.; Souhami, R.L.

    Large variations in alkylation intensities exist among guanines in a DNA sequence following treatment with chemotherapeutic alkylating agents such as nitrogen mustards, and the substituent attached to the reactive group can impose a distinct sequence preference for reaction. In order to understand further the structural and electrostatic factors which determine the sequence selectivity of alkylation reactions, the effect of increase ionic strength, the intercalator ethidium bromide, AT-specific minor groove binders distamycin A and netropsin, and the polyamine spermine on guanine N7-alkylation by L-phenylalanine mustard (L-Pam), uracil mustard (UM), and quinacrine mustard (QM) was investigated with a modification of the guanine-specificmore » chemical cleavage technique for DNA sequencing. The result differed with both the nitrogen mustard and the cationic agent used. The effect, which resulted in both enhancement and suppression of alkylation sites, was most striking in the case of netropsin and distamycin A, which differed from each other. DNA footprinting indicated that selective binding to AT sequences in the minor groove of DNA can have long-range effects on the alkylation pattern of DNA in the major groove.« less

  11. Production of recombinant streptokinase from Streptococcus pyogenes isolate and its potential for thrombolytic therapy.

    PubMed

    Assiri, Abdullah S; El-Gamal, Basiouny A; Hafez, Elsayed E; Haidara, Mohamed A

    2014-12-01

    To produce an effective recombinant streptokinase (rSK) from pathogenic Streptococcus pyogenes isolate in yeast, and evaluate its potential for thrombolytic therapy. This study was conducted from November 2012 to December 2013 at King Khalid University, Abha, Kingdom of Saudi Arabia (KSA). Throat swabs collected from 45 pharyngitis patients in Asser Central Hospital, Abha, KSA were used to isolate Streptococcus pyogenes. The bacterial DNA was used for amplification of the streptokinase gene (1200 bp). The gene was cloned and in vitro transcribed in an eukaryotic expression vector that was transformed into yeast Pichia pastoris SMD1168, and the rSK protein was purified and tested for its thrombolytic activity. The Streptococcus pyogenes strain was isolated and its DNA nucleotide sequence revealed similarity to other Streptococcus pyogenes in the Gene bank. Sequencing of the amplified gene based on DNA nucleotide sequence revealed a SK gene closely related to other SK genes in the Gene bank. However, based on deduced amino acids sequence, the gene formed a separate cluster different from clusters formed by other examined genes, suggesting a new bacterial isolate and accordingly a new gene. The purified protein showed 82% clot lysis compared to a commercial SK (81%) at an enzyme concentration of 2000 U/ml. The present yeast rSK showed similar thrombolytic activity in vitro as that of a commercial SK, suggesting its potential for thrombolytic therapy and large scale production. 

  12. DNA Compass: a secure, client-side site for navigating personal genetic information

    PubMed Central

    Curnin, Charles; Gordon, Assaf; Erlich, Yaniv

    2017-01-01

    Abstract Motivation: Millions of individuals have access to raw genomic data using direct-to-consumer companies. The advent of large-scale sequencing projects, such as the Precision Medicine Initiative, will further increase the number of individuals with access to their own genomic information. However, querying genomic data requires a computer terminal and computational skill to analyze the data—an impediment for the general public. Results: DNA Compass is a website designed to empower the public by enabling simple navigation of personal genomic data. Users can query the status of their genomic variants for over 1658 markers or tens of millions of documented single nucleotide polymorphisms (SNPs). DNA Compass presents the relevant genotypes of the user side-by-side with explanatory scientific resources. The genotype data never leaves the user’s computer, a feature that provides improved security and performance. More than 12 000 unique users, mainly from the general genetic genealogy community, have already used DNA Compass, demonstrating its utility. Availability and Implementation: DNA Compass is freely available on https://compass.dna.land. Contact: yaniv@cs.columbia.edu PMID:28334237

  13. A large-scale full-length cDNA analysis to explore the budding yeast transcriptome

    PubMed Central

    Miura, Fumihito; Kawaguchi, Noriko; Sese, Jun; Toyoda, Atsushi; Hattori, Masahira; Morishita, Shinichi; Ito, Takashi

    2006-01-01

    We performed a large-scale cDNA analysis to explore the transcriptome of the budding yeast Saccharomyces cerevisiae. We sequenced two cDNA libraries, one from the cells exponentially growing in a minimal medium and the other from meiotic cells. Both libraries were generated by using a vector-capping method that allows the accurate mapping of transcription start sites (TSSs). Consequently, we identified 11,575 TSSs associated with 3,638 annotated genomic features, including 3,599 ORFs, to suggest that most yeast genes have two or more TSSs. In addition, we identified 45 previously undescribed introns, including those affecting current ORF annotations and those spliced alternatively. Furthermore, the analysis revealed 667 transcription units in the intergenic regions and transcripts derived from antisense strands of 367 known features. We also found that 348 ORFs carry TSSs in their 3′-halves to generate sense transcripts starting from inside the ORFs. These results indicate that the budding yeast transcriptome is considerably more complex than previously thought, and it shares many recently revealed characteristics with the transcriptomes of mammals and other higher eukaryotes. Thus, the genome-wide active transcription that generates novel classes of transcripts appears to be an intrinsic feature of the eukaryotic cells. The budding yeast will serve as a versatile model for the studies on these aspects of transcriptome, and the full-length cDNA clones can function as an invaluable resource in such studies. PMID:17101987

  14. Lineage Tracking for Probing Heritable Phenotypes at Single-Cell Resolution

    PubMed Central

    Cottinet, Denis; Condamine, Florence; Bremond, Nicolas; Griffiths, Andrew D.; Rainey, Paul B.; de Visser, J. Arjan G. M.; Baudry, Jean; Bibette, Jérôme

    2016-01-01

    Determining the phenotype and genotype of single cells is central to understand microbial evolution. DNA sequencing technologies allow the detection of mutants at high resolution, but similar approaches for phenotypic analyses are still lacking. We show that a drop-based millifluidic system enables the detection of heritable phenotypic changes in evolving bacterial populations. At time intervals, cells were sampled and individually compartmentalized in 100 nL drops. Growth through 15 generations was monitored using a fluorescent protein reporter. Amplification of heritable changes–via growth–over multiple generations yields phenotypically distinct clusters reflecting variation relevant for evolution. To demonstrate the utility of this approach, we follow the evolution of Escherichia coli populations during 30 days of starvation. Phenotypic diversity was observed to rapidly increase upon starvation with the emergence of heritable phenotypes. Mutations corresponding to each phenotypic class were identified by DNA sequencing. This scalable lineage-tracking technology opens the door to large-scale phenotyping methods with special utility for microbiology and microbial population biology. PMID:27077662

  15. Lineage Tracking for Probing Heritable Phenotypes at Single-Cell Resolution.

    PubMed

    Cottinet, Denis; Condamine, Florence; Bremond, Nicolas; Griffiths, Andrew D; Rainey, Paul B; de Visser, J Arjan G M; Baudry, Jean; Bibette, Jérôme

    2016-01-01

    Determining the phenotype and genotype of single cells is central to understand microbial evolution. DNA sequencing technologies allow the detection of mutants at high resolution, but similar approaches for phenotypic analyses are still lacking. We show that a drop-based millifluidic system enables the detection of heritable phenotypic changes in evolving bacterial populations. At time intervals, cells were sampled and individually compartmentalized in 100 nL drops. Growth through 15 generations was monitored using a fluorescent protein reporter. Amplification of heritable changes-via growth-over multiple generations yields phenotypically distinct clusters reflecting variation relevant for evolution. To demonstrate the utility of this approach, we follow the evolution of Escherichia coli populations during 30 days of starvation. Phenotypic diversity was observed to rapidly increase upon starvation with the emergence of heritable phenotypes. Mutations corresponding to each phenotypic class were identified by DNA sequencing. This scalable lineage-tracking technology opens the door to large-scale phenotyping methods with special utility for microbiology and microbial population biology.

  16. Prevalence of somatic mitochondrial mutations and spatial distribution of mitochondria in non-small cell lung cancer.

    PubMed

    Kazdal, Daniel; Harms, Alexander; Endris, Volker; Penzel, Roland; Kriegsmann, Mark; Eichhorn, Florian; Muley, Thomas; Stenzinger, Albrecht; Pfarr, Nicole; Weichert, Wilko; Warth, Arne

    2017-07-11

    Mitochondria are considered relevant players in many tumour entities and first data indicate beneficial effects of mitochondria-targeted antioxidants in both cancer prevention and anticancer therapies. To further dissect the potential roles of mitochondria in NSCLC we comprehensively analysed somatic mitochondrial mutations, determined the spatial distribution of mitochondrial DNA within complete tumour sections and investigated the mitochondrial load in a large-scale approach. Whole mitochondrial genome sequencing of 26 matched tumour and non-neoplastic tissue samples extended by reviewing published data of 326 cases. Systematical stepwise real-time PCR quantification of mitochondrial DNA covering 16 whole surgical tumour sections. Immunohistochemical determination of the mitochondrial load in 171 adenocarcinoma and 145 squamous cell carcinoma. Our results demonstrate very low recurrences (max. 1.7%) and a broad distribution of 456 different somatic mitochondrial mutations. Large inter- and intra-tumour heterogeneity were seen for mitochondrial DNA copy numbers in conjunction with a correlation to the predominant histological growth pattern. Furthermore, tumour cells had significantly higher mitochondrial level compared to adjacent stroma, whereas differences between tumour entities were negligible. Non-evident somatic mitochondrial mutations and highly varying mitochondrial DNA level delineate challenges for the approach of mitochondria-targeted anticancer therapies in NSCLC.

  17. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.

    PubMed

    Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D

    2016-10-01

    Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Evaluating the role of coherent delocalized phonon-like modes in DNA cyclization

    DOE PAGES

    Alexandrov, Ludmil B.; Rasmussen, Kim Ø.; Bishop, Alan R.; ...

    2017-08-29

    The innate flexibility of a DNA sequence is quantified by the Jacobson-Stockmayer’s J-factor, which measures the propensity for DNA loop formation. Recent studies of ultra-short DNA sequences revealed a discrepancy of up to six orders of magnitude between experimentally measured and theoretically predicted J-factors. These large differences suggest that, in addition to the elastic moduli of the double helix, other factors contribute to loop formation. We develop a new theoretical model that explores how coherent delocalized phonon-like modes in DNA provide single-stranded ”flexible hinges” to assist in loop formation. We also combine the Czapla-Swigon-Olson structural model of DNA with ourmore » extended Peyrard-Bishop-Dauxois model and, without changing any of the parameters of the two models, apply this new computational framework to 86 experimentally characterized DNA sequences. Our results demonstrate that the new computational framework can predict J-factors within an order of magnitude of experimental measurements for most ultra-short DNA sequences, while continuing to accurately describe the J-factors of longer sequences. Furthermore, we demonstrate that our computational framework can be used to describe the cyclization of DNA sequences that contain a base pair mismatch. Overall, our results support the conclusion that coherent delocalized phonon-like modes play an important role in DNA cyclization.« less

  19. Evaluating the role of coherent delocalized phonon-like modes in DNA cyclization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alexandrov, Ludmil B.; Rasmussen, Kim Ø.; Bishop, Alan R.

    The innate flexibility of a DNA sequence is quantified by the Jacobson-Stockmayer’s J-factor, which measures the propensity for DNA loop formation. Recent studies of ultra-short DNA sequences revealed a discrepancy of up to six orders of magnitude between experimentally measured and theoretically predicted J-factors. These large differences suggest that, in addition to the elastic moduli of the double helix, other factors contribute to loop formation. We develop a new theoretical model that explores how coherent delocalized phonon-like modes in DNA provide single-stranded ”flexible hinges” to assist in loop formation. We also combine the Czapla-Swigon-Olson structural model of DNA with ourmore » extended Peyrard-Bishop-Dauxois model and, without changing any of the parameters of the two models, apply this new computational framework to 86 experimentally characterized DNA sequences. Our results demonstrate that the new computational framework can predict J-factors within an order of magnitude of experimental measurements for most ultra-short DNA sequences, while continuing to accurately describe the J-factors of longer sequences. Furthermore, we demonstrate that our computational framework can be used to describe the cyclization of DNA sequences that contain a base pair mismatch. Overall, our results support the conclusion that coherent delocalized phonon-like modes play an important role in DNA cyclization.« less

  20. mtDNAmanager: a Web-based tool for the management and quality analysis of mitochondrial DNA control-region sequences

    PubMed Central

    Lee, Hwan Young; Song, Injee; Ha, Eunho; Cho, Sung-Bae; Yang, Woo Ick; Shin, Kyoung-Jin

    2008-01-01

    Background For the past few years, scientific controversy has surrounded the large number of errors in forensic and literature mitochondrial DNA (mtDNA) data. However, recent research has shown that using mtDNA phylogeny and referring to known mtDNA haplotypes can be useful for checking the quality of sequence data. Results We developed a Web-based bioinformatics resource "mtDNAmanager" that offers a convenient interface supporting the management and quality analysis of mtDNA sequence data. The mtDNAmanager performs computations on mtDNA control-region sequences to estimate the most-probable mtDNA haplogroups and retrieves similar sequences from a selected database. By the phased designation of the most-probable haplogroups (both expected and estimated haplogroups), mtDNAmanager enables users to systematically detect errors whilst allowing for confirmation of the presence of clear key diagnostic mutations and accompanying mutations. The query tools of mtDNAmanager also facilitate database screening with two options of "match" and "include the queried nucleotide polymorphism". In addition, mtDNAmanager provides Web interfaces for users to manage and analyse their own data in batch mode. Conclusion The mtDNAmanager will provide systematic routines for mtDNA sequence data management and analysis via easily accessible Web interfaces, and thus should be very useful for population, medical and forensic studies that employ mtDNA analysis. mtDNAmanager can be accessed at . PMID:19014619

  1. UV-Visible Spectroscopy-Based Quantification of Unlabeled DNA Bound to Gold Nanoparticles.

    PubMed

    Baldock, Brandi L; Hutchison, James E

    2016-12-20

    DNA-functionalized gold nanoparticles have been increasingly applied as sensitive and selective analytical probes and biosensors. The DNA ligands bound to a nanoparticle dictate its reactivity, making it essential to know the type and number of DNA strands bound to the nanoparticle surface. Existing methods used to determine the number of DNA strands per gold nanoparticle (AuNP) require that the sequences be fluorophore-labeled, which may affect the DNA surface coverage and reactivity of the nanoparticle and/or require specialized equipment and other fluorophore-containing reagents. We report a UV-visible-based method to conveniently and inexpensively determine the number of DNA strands attached to AuNPs of different core sizes. When this method is used in tandem with a fluorescence dye assay, it is possible to determine the ratio of two unlabeled sequences of different lengths bound to AuNPs. Two sizes of citrate-stabilized AuNPs (5 and 12 nm) were functionalized with mixtures of short (5 base) and long (32 base) disulfide-terminated DNA sequences, and the ratios of sequences bound to the AuNPs were determined using the new method. The long DNA sequence was present as a lower proportion of the ligand shell than in the ligand exchange mixture, suggesting it had a lower propensity to bind the AuNPs than the short DNA sequence. The ratio of DNA sequences bound to the AuNPs was not the same for the large and small AuNPs, which suggests that the radius of curvature had a significant influence on the assembly of DNA strands onto the AuNPs.

  2. High-Throughput Sequencing Reveals Drastic Changes in Fungal Communities in the Phyllosphere of Norway Spruce (Picea abies) Following Invasion of the Spruce Bud Scale (Physokermes piceae).

    PubMed

    Menkis, Audrius; Marčiulynas, Adas; Gedminas, Artūras; Lynikienė, Jūratė; Povilaitienė, Aistė

    2015-11-01

    The aim of this study was to assess the diversity and composition of fungal communities in damaged and undamaged shoots of Norway spruce (Picea abies) following recent invasion of the spruce bud scale (Physokermes piceae) in Lithuania. Sampling was done in July 2013 and included 50 random lateral shoots from ten random trees in each of five visually undamaged and five damaged 40-50-year-old pure stands of P. abies. DNA was isolated from 500 individual shoots, subjected to amplification of the internal transcribed spacer of fungal ribosomal DNA (ITS rDNA), barcoded and sequenced. Clustering of 149,426 high-quality sequences resulted in 1193 non-singleton contigs of which 1039 (87.1 %) were fungal. In total, there were 893 fungal taxa in damaged shoots and 608 taxa in undamaged shoots (p < 0.0001). Furthermore, 431 (41.5 %) fungal taxa were exclusively in damaged shoots, 146 (14.0 %) were exclusively in undamaged shoots, and 462 (44.5 %) were common to both types of samples. Correspondence analysis showed that study sites representing damaged and undamaged shoots were separated from each other, indicating that in these fungal communities, these were largely different and, therefore, heavily affected by P. piceae. In conclusion, the results demonstrated that invasive alien tree pests may have a profound effect on fungal mycobiota associated with the phyllosphere of P. abies, and therefore, in addition to their direct negative effect owing physical damage of the tissue, they may also indirectly determine health, sustainability and, ultimately, distribution of the forest tree species.

  3. Molecular dynamics studies on the DNA-binding process of ERG.

    PubMed

    Beuerle, Matthias G; Dufton, Neil P; Randi, Anna M; Gould, Ian R

    2016-11-15

    The ETS family of transcription factors regulate gene targets by binding to a core GGAA DNA-sequence. The ETS factor ERG is required for homeostasis and lineage-specific functions in endothelial cells, some subset of haemopoietic cells and chondrocytes; its ectopic expression is linked to oncogenesis in multiple tissues. To date details of the DNA-binding process of ERG including DNA-sequence recognition outside the core GGAA-sequence are largely unknown. We combined available structural and experimental data to perform molecular dynamics simulations to study the DNA-binding process of ERG. In particular we were able to reproduce the ERG DNA-complex with a DNA-binding simulation starting in an unbound configuration with a final root-mean-square-deviation (RMSD) of 2.1 Å to the core ETS domain DNA-complex crystal structure. This allowed us to elucidate the relevance of amino acids involved in the formation of the ERG DNA-complex and to identify Arg385 as a novel key residue in the DNA-binding process. Moreover we were able to show that water-mediated hydrogen bonds are present between ERG and DNA in our simulations and that those interactions have the potential to achieve sequence recognition outside the GGAA core DNA-sequence. The methodology employed in this study shows the promising capabilities of modern molecular dynamics simulations in the field of protein DNA-interactions.

  4. Large-Scale Mitochondrial DNA Analysis of the Domestic Goat Reveals Six Haplogroups with High Diversity

    PubMed Central

    Naderi, Saeid; Rezaei, Hamid-Reza; Taberlet, Pierre; Zundel, Stéphanie; Rafat, Seyed-Abbas; Naghash, Hamid-Reza; El-Barody, Mohamed A. A.; Ertugrul, Okan; Pompanon, François

    2007-01-01

    Background From the beginning of domestication, the transportation of domestic animals resulted in genetic and demographic processes that explain their present distribution and genetic structure. Thus studying the present genetic diversity helps to better understand the history of domestic species. Methodology/Principal Findings The genetic diversity of domestic goats has been characterized with 2430 individuals from all over the old world, including 946 new individuals from regions poorly studied until now (mainly the Fertile Crescent). These individuals represented 1540 haplotypes for the HVI segment of the mitochondrial DNA (mtDNA) control region. This large-scale study allowed the establishment of a clear nomenclature of the goat maternal haplogroups. Only five of the six previously defined groups of haplotypes were divergent enough to be considered as different haplogroups. Moreover a new mitochondrial group has been localized around the Fertile Crescent. All groups showed very high haplotype diversity. Most of this diversity was distributed among groups and within geographic regions. The weak geographic structure may result from the worldwide distribution of the dominant A haplogroup (more than 90% of the individuals). The large-scale distribution of other haplogroups (except one), may be related to human migration. The recent fragmentation of local goat populations into discrete breeds is not detectable with mitochondrial markers. The estimation of demographic parameters from mismatch analyses showed that all groups had a recent demographic expansion corresponding roughly to the period when domestication took place. But even with a large data set it remains difficult to give relative dates of expansion for different haplogroups because of large confidence intervals. Conclusions/Significance We propose standard criteria for the definition of the different haplogroups based on the result of mismatch analysis and on the use of sequences of reference. Such a method could be also applied for clarifying the nomenclature of mitochondrial haplogroups in other domestic species. PMID:17925860

  5. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    USGS Publications Warehouse

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  6. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    PubMed

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  7. Identifying currents in the gene pool for bacterial populations using an integrative approach.

    PubMed

    Tang, Jing; Hanage, William P; Fraser, Christophe; Corander, Jukka

    2009-08-01

    The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html.

  8. The EMBL nucleotide sequence database

    PubMed Central

    Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann

    2001-01-01

    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039

  9. Spliced DNA Sequences in the Paramecium Germline: Their Properties and Evolutionary Potential

    PubMed Central

    Catania, Francesco; McGrath, Casey L.; Doak, Thomas G.; Lynch, Michael

    2013-01-01

    Despite playing a crucial role in germline-soma differentiation, the evolutionary significance of developmentally regulated genome rearrangements (DRGRs) has received scant attention. An example of DRGR is DNA splicing, a process that removes segments of DNA interrupting genic and/or intergenic sequences. Perhaps, best known for shaping immune-system genes in vertebrates, DNA splicing plays a central role in the life of ciliated protozoa, where thousands of germline DNA segments are eliminated after sexual reproduction to regenerate a functional somatic genome. Here, we identify and chronicle the properties of 5,286 sequences that putatively undergo DNA splicing (i.e., internal eliminated sequences [IESs]) across the genomes of three closely related species of the ciliate Paramecium (P. tetraurelia, P. biaurelia, and P. sexaurelia). The study reveals that these putative IESs share several physical characteristics. Although our results are consistent with excision events being largely conserved between species, episodes of differential IES retention/excision occur, may have a recent origin, and frequently involve coding regions. Our findings indicate interconversion between somatic—often coding—DNA sequences and noncoding IESs, and provide insights into the role of DNA splicing in creating potentially functional genetic innovation. PMID:23737328

  10. DNA tetrominoes: the construction of DNA nanostructures using self-organised heterogeneous deoxyribonucleic acids shapes.

    PubMed

    Ong, Hui San; Rahim, Mohd Syafiq; Firdaus-Raih, Mohd; Ramlan, Effirul Ikhwan

    2015-01-01

    The unique programmability of nucleic acids offers alternative in constructing excitable and functional nanostructures. This work introduces an autonomous protocol to construct DNA Tetris shapes (L-Shape, B-Shape, T-Shape and I-Shape) using modular DNA blocks. The protocol exploits the rich number of sequence combinations available from the nucleic acid alphabets, thus allowing for diversity to be applied in designing various DNA nanostructures. Instead of a deterministic set of sequences corresponding to a particular design, the protocol promotes a large pool of DNA shapes that can assemble to conform to any desired structures. By utilising evolutionary programming in the design stage, DNA blocks are subjected to processes such as sequence insertion, deletion and base shifting in order to enrich the diversity of the resulting shapes based on a set of cascading filters. The optimisation algorithm allows mutation to be exerted indefinitely on the candidate sequences until these sequences complied with all the four fitness criteria. Generated candidates from the protocol are in agreement with the filter cascades and thermodynamic simulation. Further validation using gel electrophoresis indicated the formation of the designed shapes. Thus, supporting the plausibility of constructing DNA nanostructures in a more hierarchical, modular, and interchangeable manner.

  11. Assessing Diversity of DNA Structure-Related Sequence Features in Prokaryotic Genomes

    PubMed Central

    Huang, Yongjie; Mrázek, Jan

    2014-01-01

    Prokaryotic genomes are diverse in terms of their nucleotide and oligonucleotide composition as well as presence of various sequence features that can affect physical properties of the DNA molecule. We present a survey of local sequence patterns which have a potential to promote non-canonical DNA conformations (i.e. different from standard B-DNA double helix) and interpret the results in terms of relationships with organisms' habitats, phylogenetic classifications, and other characteristics. Our present work differs from earlier similar surveys not only by investigating a wider range of sequence patterns in a large number of genomes but also by using a more realistic null model to assess significant deviations. Our results show that simple sequence repeats and Z-DNA-promoting patterns are generally suppressed in prokaryotic genomes, whereas palindromes and inverted repeats are over-represented. Representation of patterns that promote Z-DNA and intrinsic DNA curvature increases with increasing optimal growth temperature (OGT), and decreases with increasing oxygen requirement. Additionally, representations of close direct repeats, palindromes and inverted repeats exhibit clear negative trends with increasing OGT. The observed relationships with environmental characteristics, particularly OGT, suggest possible evolutionary scenarios of structural adaptation of DNA to particular environmental niches. PMID:24408877

  12. Sequence Analysis of Leuconostoc mesenteroides Bacteriophage Φ1-A4 Isolated from an Industrial Vegetable Fermentation▿

    PubMed Central

    Lu, Z.; Altermann, E.; Breidt, F.; Kozyavkin, S.

    2010-01-01

    Vegetable fermentations rely on the proper succession of a variety of lactic acid bacteria (LAB). Leuconostoc mesenteroides initiates fermentation. As fermentation proceeds, L. mesenteroides dies off and other LAB complete the fermentation. Phages infecting L. mesenteroides may significantly influence the die-off of L. mesenteroides. However, no L. mesenteroides phages have been previously genetically characterized. Knowledge of more phage genome sequences may provide new insights into phage genomics, phage evolution, and phage-host interactions. We have determined the complete genome sequence of L. mesenteroides phage Φ1-A4, isolated from an industrial sauerkraut fermentation. The phage possesses a linear, double-stranded DNA genome consisting of 29,508 bp with a G+C content of 36%. Fifty open reading frames (ORFs) were predicted. Putative functions were assigned to 26 ORFs (52%), including 5 ORFs of structural proteins. The phage genome was modularly organized, containing DNA replication, DNA-packaging, head and tail morphogenesis, cell lysis, and DNA regulation/modification modules. In silico analyses showed that Φ1-A4 is a unique lytic phage with a large-scale genome inversion (∼30% of the genome). The genome inversion encompassed the lysis module, part of the structural protein module, and a cos site. The endolysin gene was flanked by two holin genes. The tail morphogenesis module was interspersed with cell lysis genes and other genes with unknown functions. The predicted amino acid sequences of the phage proteins showed little similarity to other phages, but functional analyses showed that Φ1-A4 clusters with several Lactococcus phages. To our knowledge, Φ1-A4 is the first genetically characterized L. mesenteroides phage. PMID:20118355

  13. Double-strand break repair processes drive evolution of the mitochondrial genome in Arabidopsis.

    PubMed

    Davila, Jaime I; Arrieta-Montiel, Maria P; Wamboldt, Yashitola; Cao, Jun; Hagmann, Joerg; Shedge, Vikas; Xu, Ying-Zhi; Weigel, Detlef; Mackenzie, Sally A

    2011-09-27

    The mitochondrial genome of higher plants is unusually dynamic, with recombination and nonhomologous end-joining (NHEJ) activities producing variability in size and organization. Plant mitochondrial DNA also generally displays much lower nucleotide substitution rates than mammalian or yeast systems. Arabidopsis displays these features and expedites characterization of the mitochondrial recombination surveillance gene MSH1 (MutS 1 homolog), lending itself to detailed study of de novo mitochondrial genome activity. In the present study, we investigated the underlying basis for unusual plant features as they contribute to rapid mitochondrial genome evolution. We obtained evidence of double-strand break (DSB) repair, including NHEJ, sequence deletions and mitochondrial asymmetric recombination activity in Arabidopsis wild-type and msh1 mutants on the basis of data generated by Illumina deep sequencing and confirmed by DNA gel blot analysis. On a larger scale, with mitochondrial comparisons across 72 Arabidopsis ecotypes, similar evidence of DSB repair activity differentiated ecotypes. Forty-seven repeat pairs were active in DNA exchange in the msh1 mutant. Recombination sites showed asymmetrical DNA exchange within lengths of 50- to 556-bp sharing sequence identity as low as 85%. De novo asymmetrical recombination involved heteroduplex formation, gene conversion and mismatch repair activities. Substoichiometric shifting by asymmetrical exchange created the appearance of rapid sequence gain and loss in association with particular repeat classes. Extensive mitochondrial genomic variation within a single plant species derives largely from DSB activity and its repair. Observed gene conversion and mismatch repair activity contribute to the low nucleotide substitution rates seen in these genomes. On a phenotypic level, these patterns of rearrangement likely contribute to the reproductive versatility of higher plants.

  14. CGDV: a webtool for circular visualization of genomics and transcriptomics data.

    PubMed

    Jha, Vineet; Singh, Gulzar; Kumar, Shiva; Sonawane, Amol; Jere, Abhay; Anamika, Krishanpal

    2017-10-24

    Interpretation of large-scale data is very challenging and currently there is scarcity of web tools which support automated visualization of a variety of high throughput genomics and transcriptomics data and for a wide variety of model organisms along with user defined karyotypes. Circular plot provides holistic visualization of high throughput large scale data but it is very complex and challenging to generate as most of the available tools need informatics expertise to install and run them. We have developed CGDV (Circos for Genomics and Transcriptomics Data Visualization), a webtool based on Circos, for seamless and automated visualization of a variety of large scale genomics and transcriptomics data. CGDV takes output of analyzed genomics or transcriptomics data of different formats, such as vcf, bed, xls, tab limited matrix text file, CNVnator raw output and Gene fusion raw output, to plot circular view of the sample data. CGDV take cares of generating intermediate files required for circos. CGDV is freely available at https://cgdv-upload.persistent.co.in/cgdv/ . The circular plot for each data type is tailored to gain best biological insights into the data. The inter-relationship between data points, homologous sequences, genes involved in fusion events, differential expression pattern, sequencing depth, types and size of variations and enrichment of DNA binding proteins can be seen using CGDV. CGDV thus helps biologists and bioinformaticians to visualize a variety of genomics and transcriptomics data seamlessly.

  15. preAssemble: a tool for automatic sequencer trace data processing.

    PubMed

    Adzhubei, Alexei A; Laerdahl, Jon K; Vlasova, Anna V

    2006-01-17

    Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages--Phred and Staden are used by preAssemble to perform sequence quality processing. The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.

  16. Massively parallel sequencing of forensic STRs: Considerations of the DNA commission of the International Society for Forensic Genetics (ISFG) on minimal nomenclature requirements.

    PubMed

    Parson, Walther; Ballard, David; Budowle, Bruce; Butler, John M; Gettings, Katherine B; Gill, Peter; Gusmão, Leonor; Hares, Douglas R; Irwin, Jodi A; King, Jonathan L; Knijff, Peter de; Morling, Niels; Prinz, Mechthild; Schneider, Peter M; Neste, Christophe Van; Willuweit, Sascha; Phillips, Christopher

    2016-05-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that provide a precise description of the repeat allele structure of a STR marker and variants that may reside in the flanking areas of the repeat region. When a STR contains a complex arrangement of repeat motifs, the level of genetic polymorphism revealed by the sequence data can increase substantially. As repeat structures can be complex and include substitutions, insertions, deletions, variable tandem repeat arrangements of multiple nucleotide motifs, and flanking region SNPs, established capillary electrophoresis (CE) allele descriptions must be supplemented by a new system of STR allele nomenclature, which retains backward compatibility with the CE data that currently populate national DNA databases and that will continue to be produced for the coming years. Thus, there is a pressing need to produce a standardized framework for describing complex sequences that enable comparison with currently used repeat allele nomenclature derived from conventional CE systems. It is important to discern three levels of information in hierarchical order (i) the sequence, (ii) the alignment, and (iii) the nomenclature of STR sequence data. We propose a sequence (text) string format the minimal requirement of data storage that laboratories should follow when adopting MPS of STRs. We further discuss the variant annotation and sequence comparison framework necessary to maintain compatibility among established and future data. This system must be easy to use and interpret by the DNA specialist, based on a universally accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need to follow updated rules and be generated by expert systems that translate MPS sequences to match CE conventions in order to guarantee compatibility between the different generations of STR data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. Evaluating the feasibility of using candidate DNA barcodes in discriminating species of the large Asteraceae family.

    PubMed

    Gao, Ting; Yao, Hui; Song, Jingyuan; Zhu, Yingjie; Liu, Chang; Chen, Shilin

    2010-10-26

    Five DNA regions, namely, rbcL, matK, ITS, ITS2, and psbA-trnH, have been recommended as primary DNA barcodes for plants. Studies evaluating these regions for species identification in the large plant taxon, which includes a large number of closely related species, have rarely been reported. The feasibility of using the five proposed DNA regions was tested for discriminating plant species within Asteraceae, the largest family of flowering plants. Among these markers, ITS2 was the most useful in terms of universality, sequence variation, and identification capability in the Asteraceae family. The species discriminating power of ITS2 was also explored in a large pool of 3,490 Asteraceae sequences that represent 2,315 species belonging to 494 different genera. The result shows that ITS2 correctly identified 76.4% and 97.4% of plant samples at the species and genus levels, respectively. In addition, ITS2 displayed a variable ability to discriminate related species within different genera. ITS2 is the best DNA barcode for the Asteraceae family. This approach significantly broadens the application of DNA barcoding to resolve classification problems in the family Asteraceae at the genera and species levels.

  18. Detection of Merkel Cell Polyomavirus DNA in Serum Samples of Healthy Blood Donors

    PubMed Central

    Mazzoni, Elisa; Rotondo, John C.; Marracino, Luisa; Selvatici, Rita; Bononi, Ilaria; Torreggiani, Elena; Touzé, Antoine; Martini, Fernanda; Tognon, Mauro G.

    2017-01-01

    Merkel cell polyomavirus (MCPyV) has been detected in 80% of Merkel cell carcinomas (MCC). In the host, the MCPyV reservoir remains elusive. MCPyV DNA sequences were revealed in blood donor buffy coats. In this study, MCPyV DNA sequences were investigated in the sera (n = 190) of healthy blood donors. Two MCPyV DNA sequences, coding for the viral oncoprotein large T antigen (LT), were investigated using polymerase chain reaction (PCR) methods and DNA sequencing. Circulating MCPyV sequences were detected in sera with a prevalence of 2.6% (5/190), at low-DNA viral load, which is in the range of 1–4 and 1–5 copies/μl by real-time PCR and droplet digital PCR, respectively. DNA sequencing carried out in the five MCPyV-positive samples indicated that the two MCPyV LT sequences which were analyzed belong to the MKL-1 strain. Circulating MCPyV LT sequences are present in blood donor sera. MCPyV-positive samples from blood donors could represent a potential vehicle for MCPyV infection in receivers, whereas an increase in viral load may occur with multiple blood transfusions. In certain patient conditions, such as immune-depression/suppression, additional disease or old age, transfusion of MCPyV-positive samples could be an additional risk factor for MCC onset. PMID:29238698

  19. Previously unknown and highly divergent ssDNA viruses populate the oceans.

    PubMed

    Labonté, Jessica M; Suttle, Curtis A

    2013-11-01

    Single-stranded DNA (ssDNA) viruses are economically important pathogens of plants and animals, and are widespread in oceans; yet, the diversity and evolutionary relationships among marine ssDNA viruses remain largely unknown. Here we present the results from a metagenomic study of composite samples from temperate (Saanich Inlet, 11 samples; Strait of Georgia, 85 samples) and subtropical (46 samples, Gulf of Mexico) seawater. Most sequences (84%) had no evident similarity to sequenced viruses. In total, 608 putative complete genomes of ssDNA viruses were assembled, almost doubling the number of ssDNA viral genomes in databases. These comprised 129 genetically distinct groups, each represented by at least one complete genome that had no recognizable similarity to each other or to other virus sequences. Given that the seven recognized families of ssDNA viruses have considerable sequence homology within them, this suggests that many of these genetic groups may represent new viral families. Moreover, nearly 70% of the sequences were similar to one of these genomes, indicating that most of the sequences could be assigned to a genetically distinct group. Most sequences fell within 11 well-defined gene groups, each sharing a common gene. Some of these encoded putative replication and coat proteins that had similarity to sequences from viruses infecting eukaryotes, suggesting that these were likely from viruses infecting eukaryotic phytoplankton and zooplankton.

  20. PopSc: Computing Toolkit for Basic Statistics of Molecular Population Genetics Simultaneously Implemented in Web-Based Calculator, Python and R

    PubMed Central

    Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia

    2016-01-01

    Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis. PMID:27792763

  1. PopSc: Computing Toolkit for Basic Statistics of Molecular Population Genetics Simultaneously Implemented in Web-Based Calculator, Python and R.

    PubMed

    Chen, Shi-Yi; Deng, Feilong; Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia

    2016-01-01

    Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis.

  2. DNA barcodes for 1/1000 of the animal kingdom.

    PubMed

    Hebert, Paul D N; Dewaard, Jeremy R; Landry, Jean-François

    2010-06-23

    This study reports DNA barcodes for more than 1300 Lepidoptera species from the eastern half of North America, establishing that 99.3 per cent of these species possess diagnostic barcode sequences. Intraspecific divergences averaged just 0.43 per cent among this assemblage, but most values were lower. The mean was elevated by deep barcode divergences (greater than 2%) in 5.1 per cent of the species, often involving the sympatric occurrence of two barcode clusters. A few of these cases have been analysed in detail, revealing species overlooked by the current taxonomic system. This study also provided a large-scale test of the extent of regional divergence in barcode sequences, indicating that geographical differentiation in the Lepidoptera of eastern North America is small, even when comparisons involve populations as much as 2800 km apart. The present results affirm that a highly effective system for the identification of Lepidoptera in this region can be built with few records per species because of the limited intra-specific variation. As most terrestrial and marine taxa are likely to possess a similar pattern of population structure, an effective DNA-based identification system can be developed with modest effort.

  3. Application of next-generation sequencing for rapid marker development in molecular plant breeding: a case study on anthracnose disease resistance in Lupinus angustifolius L.

    PubMed Central

    2012-01-01

    Background In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Results Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. Conclusions We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species. PMID:22805587

  4. Application of next-generation sequencing for rapid marker development in molecular plant breeding: a case study on anthracnose disease resistance in Lupinus angustifolius L.

    PubMed

    Yang, Huaan; Tao, Ye; Zheng, Zequn; Li, Chengdao; Sweetingham, Mark W; Howieson, John G

    2012-07-17

    In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species.

  5. Facile Site-Directed Mutagenesis of Large Constructs Using Gibson Isothermal DNA Assembly.

    PubMed

    Yonemoto, Isaac T; Weyman, Philip D

    2017-01-01

    Site-directed mutagenesis is a commonly used molecular biology technique to manipulate biological sequences, and is especially useful for studying sequence determinants of enzyme function or designing proteins with improved activity. We describe a strategy using Gibson Isothermal DNA Assembly to perform site-directed mutagenesis on large (>~20 kbp) constructs that are outside the effective range of standard techniques such as QuikChange II (Agilent Technologies), but more reliable than traditional cloning using restriction enzymes and ligation.

  6. Scaling up discovery of hidden diversity in fungi: impacts of barcoding approaches.

    PubMed

    Yahr, Rebecca; Schoch, Conrad L; Dentinger, Bryn T M

    2016-09-05

    The fungal kingdom is a hyperdiverse group of multicellular eukaryotes with profound impacts on human society and ecosystem function. The challenge of documenting and describing fungal diversity is exacerbated by their typically cryptic nature, their ability to produce seemingly unrelated morphologies from a single individual and their similarity in appearance to distantly related taxa. This multiplicity of hurdles resulted in the early adoption of DNA-based comparisons to study fungal diversity, including linking curated DNA sequence data to expertly identified voucher specimens. DNA-barcoding approaches in fungi were first applied in specimen-based studies for identification and discovery of taxonomic diversity, but are now widely deployed for community characterization based on sequencing of environmental samples. Collectively, fungal barcoding approaches have yielded important advances across biological scales and research applications, from taxonomic, ecological, industrial and health perspectives. A major outstanding issue is the growing problem of 'sequences without names' that are somewhat uncoupled from the traditional framework of fungal classification based on morphology and preserved specimens. This review summarizes some of the most significant impacts of fungal barcoding, its limitations, and progress towards the challenge of effective utilization of the exponentially growing volume of data gathered from high-throughput sequencing technologies.This article is part of the themed issue 'From DNA barcodes to biomes'. © 2016 The Authors.

  7. Mitochondrial DNA pattern of the fine shrimp Metapenaeus elegans (De Man, 1907) in the lagoon of Segara Anakan, Central Java, using Hind III

    NASA Astrophysics Data System (ADS)

    Nugraha, Fitra Arya Dwi; Holil, Kholifah; Kurniawan, Nia

    2017-05-01

    Ecological damages to the Lagoon of Segara Anakan, Central Java, as well as large-scale and continuous exploitation are threatening the sustainability of fine shrimp, Metapenaeus elegans, and resources. Information in regards to genetic resources is crucial to establish long-term conservation programs and to preserve germplasm quality. This study aims to evaluate the number and size of the fragment which is digested with restriction enzyme Hind III. Seven individuals of Metapenaeus elegans from the Lagoon of Segara Anakan were examined using Hind III. Amplification of mitochondrial DNA resulted in 950 bp, and the digestion using Hind III generated four fragments consisting of 114 bp, 200 bp, 250 bp, and 386 bp, which formed a monomorphic pattern. The restriction pattern showed the probability of homozygosity of alleles that restricted using Hind III. Homozygosity indicates no variation of DNA sequence.

  8. Mapping the yeast genome by melting in nanofluidic devices

    NASA Astrophysics Data System (ADS)

    Welch, Robert L.; Czolkos, Ilja; Sladek, Rob; Reisner, Walter

    2012-02-01

    Optical mapping of DNA provides large-scale genomic information that can be used to assemble contigs from next-generation sequencing, and to detect re-arrangements between single cells. A recent optical mapping technique called denaturation mapping has the unique advantage of using physical principles rather than the action of enzymes to probe genomic structure. The absence of reagents or reaction steps makes denaturation mapping simpler than other protocols. Denaturation mapping uses fluorescence microscopy to image the pattern of partial melting along a DNA molecule extended in a channel of cross-section ˜100nm at the heart of a nanofluidic device. We successfully aligned melting maps from single DNA molecules to a theoretical map of the yeast genome (11.6Mbp) to identify their location. By aligning hundreds of molecules we assembled a consensus melting map of the yeast genome with 95% coverage.

  9. TaqMan Real-Time PCR Assays To Assess Arbuscular Mycorrhizal Responses to Field Manipulation of Grassland Biodiversity: Effects of Soil Characteristics, Plant Species Richness, and Functional Traits▿ †

    PubMed Central

    König, Stephan; Wubet, Tesfaye; Dormann, Carsten F.; Hempel, Stefan; Renker, Carsten; Buscot, François

    2010-01-01

    Large-scale (temporal and/or spatial) molecular investigations of the diversity and distribution of arbuscular mycorrhizal fungi (AMF) require considerable sampling efforts and high-throughput analysis. To facilitate such efforts, we have developed a TaqMan real-time PCR assay to detect and identify AMF in environmental samples. First, we screened the diversity in clone libraries, generated by nested PCR, of the nuclear ribosomal DNA internal transcribed spacer (ITS) of AMF in environmental samples. We then generated probes and forward primers based on the detected sequences, enabling AMF sequence type-specific detection in TaqMan multiplex real-time PCR assays. In comparisons to conventional clone library screening and Sanger sequencing, the TaqMan assay approach provided similar accuracy but higher sensitivity with cost and time savings. The TaqMan assays were applied to analyze the AMF community composition within plots of a large-scale plant biodiversity manipulation experiment, the Jena Experiment, primarily designed to investigate the interactive effects of plant biodiversity on element cycling and trophic interactions. The results show that environmental variables hierarchically shape AMF communities and that the sequence type spectrum is strongly affected by previous land use and disturbance, which appears to favor disturbance-tolerant members of the genus Glomus. The AMF species richness of disturbance-associated communities can be largely explained by richness of plant species and plant functional groups, while plant productivity and soil parameters appear to have only weak effects on the AMF community. PMID:20418424

  10. The Large Subunit rDNA Sequence of Plasmodiophora brassicae Does not Contain Intra-species Polymorphism

    PubMed Central

    Schwelm, Arne; Berney, Cédric; Dixelius, Christina; Bass, David; Neuhauser, Sigrid

    2016-01-01

    Clubroot disease caused by Plasmodiophora brassicae is one of the most important diseases of cultivated brassicas. P. brassicae occurs in pathotypes which differ in the aggressiveness towards their Brassica host plants. To date no DNA based method to distinguish these pathotypes has been described. In 2011 polymorphism within the 28S rDNA of P. brassicae was reported which potentially could allow to distinguish pathotypes without the need of time-consuming bioassays. However, isolates of P. brassicae from around the world analysed in this study do not show polymorphism in their LSU rDNA sequences. The previously described polymorphism most likely derived from soil inhabiting Cercozoa more specifically Neoheteromita-like glissomonads. Here we correct the LSU rDNA sequence of P. brassicae. By using FISH we demonstrate that our newly generated sequence belongs to the causal agent of clubroot disease. PMID:27750174

  11. Biotechnological mass production of DNA origami

    NASA Astrophysics Data System (ADS)

    Praetorius, Florian; Kick, Benjamin; Behler, Karl L.; Honemann, Maximilian N.; Weuster-Botz, Dirk; Dietz, Hendrik

    2017-12-01

    DNA nanotechnology, in particular DNA origami, enables the bottom-up self-assembly of micrometre-scale, three-dimensional structures with nanometre-precise features. These structures are customizable in that they can be site-specifically functionalized or constructed to exhibit machine-like or logic-gating behaviour. Their use has been limited to applications that require only small amounts of material (of the order of micrograms), owing to the limitations of current production methods. But many proposed applications, for example as therapeutic agents or in complex materials, could be realized if more material could be used. In DNA origami, a nanostructure is assembled from a very long single-stranded scaffold molecule held in place by many short single-stranded staple oligonucleotides. Only the bacteriophage-derived scaffold molecules are amenable to scalable and efficient mass production; the shorter staple strands are obtained through costly solid-phase synthesis or enzymatic processes. Here we show that single strands of DNA of virtually arbitrary length and with virtually arbitrary sequences can be produced in a scalable and cost-efficient manner by using bacteriophages to generate single-stranded precursor DNA that contains target strand sequences interleaved with self-excising ‘cassettes’, with each cassette comprising two Zn2+-dependent DNA-cleaving DNA enzymes. We produce all of the necessary single strands of DNA for several DNA origami using shaker-flask cultures, and demonstrate end-to-end production of macroscopic amounts of a DNA origami nanorod in a litre-scale stirred-tank bioreactor. Our method is compatible with existing DNA origami design frameworks and retains the modularity and addressability of DNA origami objects that are necessary for implementing custom modifications using functional groups. With all of the production and purification steps amenable to scaling, we expect that our method will expand the scope of DNA nanotechnology in many areas of science and technology.

  12. Biotechnological mass production of DNA origami.

    PubMed

    Praetorius, Florian; Kick, Benjamin; Behler, Karl L; Honemann, Maximilian N; Weuster-Botz, Dirk; Dietz, Hendrik

    2017-12-06

    DNA nanotechnology, in particular DNA origami, enables the bottom-up self-assembly of micrometre-scale, three-dimensional structures with nanometre-precise features. These structures are customizable in that they can be site-specifically functionalized or constructed to exhibit machine-like or logic-gating behaviour. Their use has been limited to applications that require only small amounts of material (of the order of micrograms), owing to the limitations of current production methods. But many proposed applications, for example as therapeutic agents or in complex materials, could be realized if more material could be used. In DNA origami, a nanostructure is assembled from a very long single-stranded scaffold molecule held in place by many short single-stranded staple oligonucleotides. Only the bacteriophage-derived scaffold molecules are amenable to scalable and efficient mass production; the shorter staple strands are obtained through costly solid-phase synthesis or enzymatic processes. Here we show that single strands of DNA of virtually arbitrary length and with virtually arbitrary sequences can be produced in a scalable and cost-efficient manner by using bacteriophages to generate single-stranded precursor DNA that contains target strand sequences interleaved with self-excising 'cassettes', with each cassette comprising two Zn 2+ -dependent DNA-cleaving DNA enzymes. We produce all of the necessary single strands of DNA for several DNA origami using shaker-flask cultures, and demonstrate end-to-end production of macroscopic amounts of a DNA origami nanorod in a litre-scale stirred-tank bioreactor. Our method is compatible with existing DNA origami design frameworks and retains the modularity and addressability of DNA origami objects that are necessary for implementing custom modifications using functional groups. With all of the production and purification steps amenable to scaling, we expect that our method will expand the scope of DNA nanotechnology in many areas of science and technology.

  13. Integrated massively parallel sequencing of 15 autosomal STRs and Amelogenin using a simplified library preparation approach.

    PubMed

    Xue, Jian; Wu, Riga; Pan, Yajiao; Wang, Shunxia; Qu, Baowang; Qin, Ying; Shi, Yuequn; Zhang, Chuchu; Li, Ran; Zhang, Liyan; Zhou, Cheng; Sun, Hongyu

    2018-04-02

    Massively parallel sequencing (MPS) technologies, also termed as next-generation sequencing (NGS), are becoming increasingly popular in study of short tandem repeats (STR). However, current library preparation methods are usually based on ligation or two-round PCR that requires more steps, making it time-consuming (about 2 days), laborious and expensive. In this study, a 16-plex STR typing system was designed with fusion primer strategy based on the Ion Torrent S5 XL platform which could effectively resolve the above challenges for forensic DNA database-type samples (bloodstains, saliva stains, etc.). The efficiency of this system was tested in 253 Han Chinese participants. The libraries were prepared without DNA isolation and adapter ligation, and the whole process only required approximately 5 h. The proportion of thoroughly genotyped samples in which all the 16 loci were successfully genotyped was 86% (220/256). Of the samples, 99.7% showed 100% concordance between NGS-based STR typing and capillary electrophoresis (CE)-based STR typing. The inconsistency might have been caused by off-ladder alleles and mutations in primer binding sites. Overall, this panel enabled the large-scale genotyping of the DNA samples with controlled quality and quantity because it is a simple, operation-friendly process flow that saves labor, time and costs. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Genome-wide analysis of Tol2 transposon reintegration in zebrafish.

    PubMed

    Kondrychyn, Igor; Garcia-Lecea, Marta; Emelyanov, Alexander; Parinov, Sergey; Korzh, Vladimir

    2009-09-08

    Tol2, a member of the hAT family of transposons, has become a useful tool for genetic manipulation of model animals, but information about its interactions with vertebrate genomes is still limited. Furthermore, published reports on Tol2 have mainly been based on random integration of the transposon system after co-injection of a plasmid DNA harboring the transposon and a transposase mRNA. It is important to understand how Tol2 would behave upon activation after integration into the genome. We performed a large-scale enhancer trap (ET) screen and generated 338 insertions of the Tol2 transposon-based ET cassette into the zebrafish genome. These insertions were generated by remobilizing the transposon from two different donor sites in two transgenic lines. We found that 39% of Tol2 insertions occurred in transcription units, mostly into introns. Analysis of the transposon target sites revealed no strict specificity at the DNA sequence level. However, Tol2 was prone to target AT-rich regions with weak palindromic consensus sequences centered at the insertion site. Our systematic analysis of sequential remobilizations of the Tol2 transposon from two independent sites within a vertebrate genome has revealed properties such as a tendency to integrate into transcription units and into AT-rich palindrome-like sequences. This information will influence the development of various applications involving DNA transposons and Tol2 in particular.

  15. An att site-based recombination reporter system for genome engineering and synthetic DNA assembly.

    PubMed

    Bland, Michael J; Ducos-Galand, Magaly; Val, Marie-Eve; Mazel, Didier

    2017-07-14

    Direct manipulation of the genome is a widespread technique for genetic studies and synthetic biology applications. The tyrosine and serine site-specific recombination systems of bacteriophages HK022 and ΦC31 are widely used for stable directional exchange and relocation of DNA sequences, making them valuable tools in these contexts. We have developed site-specific recombination tools that allow the direct selection of recombination events by embedding the attB site from each system within the β-lactamase resistance coding sequence (bla). The HK and ΦC31 tools were developed by placing the attB sites from each system into the signal peptide cleavage site coding sequence of bla. All possible open reading frames (ORFs) were inserted and tested for recombination efficiency and bla activity. Efficient recombination was observed for all tested ORFs (3 for HK, 6 for ΦC31) as shown through a cointegrate formation assay. The bla gene with the embedded attB site was functional for eight of the nine constructs tested. The HK/ΦC31 att-bla system offers a simple way to directly select recombination events, thus enhancing the use of site-specific recombination systems for carrying out precise, large-scale DNA manipulation, and adding useful tools to the genetics toolbox. We further show the power and flexibility of bla to be used as a reporter for recombination.

  16. Targeted enrichment of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis from victims of the Black Death.

    PubMed

    Schuenemann, Verena J; Bos, Kirsten; DeWitte, Sharon; Schmedes, Sarah; Jamieson, Joslyn; Mittnik, Alissa; Forrest, Stephen; Coombes, Brian K; Wood, James W; Earn, David J D; White, William; Krause, Johannes; Poinar, Hendrik N

    2011-09-20

    Although investigations of medieval plague victims have identified Yersinia pestis as the putative etiologic agent of the pandemic, methodological limitations have prevented large-scale genomic investigations to evaluate changes in the pathogen's virulence over time. We screened over 100 skeletal remains from Black Death victims of the East Smithfield mass burial site (1348-1350, London, England). Recent methods of DNA enrichment coupled with high-throughput DNA sequencing subsequently permitted reconstruction of ten full human mitochondrial genomes (16 kb each) and the full pPCP1 (9.6 kb) virulence-associated plasmid at high coverage. Comparisons of molecular damage profiles between endogenous human and Y. pestis DNA confirmed its authenticity as an ancient pathogen, thus representing the longest contiguous genomic sequence for an ancient pathogen to date. Comparison of our reconstructed plasmid against modern Y. pestis shows identity with several isolates matching the Medievalis biovar; however, our chromosomal sequences indicate the victims were infected with a Y. pestis variant that has not been previously reported. Our data reveal that the Black Death in medieval Europe was caused by a variant of Y. pestis that may no longer exist, and genetic data carried on its pPCP1 plasmid were not responsible for the purported epidemiological differences between ancient and modern forms of Y. pestis infections.

  17. Self-Organizing Hidden Markov Model Map (SOHMMM).

    PubMed

    Ferles, Christos; Stafylopatis, Andreas

    2013-12-01

    A hybrid approach combining the Self-Organizing Map (SOM) and the Hidden Markov Model (HMM) is presented. The Self-Organizing Hidden Markov Model Map (SOHMMM) establishes a cross-section between the theoretic foundations and algorithmic realizations of its constituents. The respective architectures and learning methodologies are fused in an attempt to meet the increasing requirements imposed by the properties of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and protein chain molecules. The fusion and synergy of the SOM unsupervised training and the HMM dynamic programming algorithms bring forth a novel on-line gradient descent unsupervised learning algorithm, which is fully integrated into the SOHMMM. Since the SOHMMM carries out probabilistic sequence analysis with little or no prior knowledge, it can have a variety of applications in clustering, dimensionality reduction and visualization of large-scale sequence spaces, and also, in sequence discrimination, search and classification. Two series of experiments based on artificial sequence data and splice junction gene sequences demonstrate the SOHMMM's characteristics and capabilities. Copyright © 2013 Elsevier Ltd. All rights reserved.

  18. Detection and interrogation of biomolecules via nanoscale probes: From fundamental physics to DNA sequencing

    NASA Astrophysics Data System (ADS)

    Zwolak, Michael

    2013-03-01

    A rapid and low-cost method to sequence DNA would revolutionize personalized medicine, where genetic information is used to diagnose, treat, and prevent diseases. There is a longstanding interest in nanopores as a platform for rapid interrogation of single DNA molecules. I will discuss a sequencing protocol based on the measurement of transverse electronic currents during the translocation of single-stranded DNA through nanopores. Using molecular dynamics simulations coupled to quantum mechanical calculations of the tunneling current, I will show that the DNA nucleotides are predicted to have distinguishable electronic signatures in experimentally realizable systems. Several recent experiments support our theoretical predictions. In addition to their possible impact in medicine and biology, the above methods offer ideal test beds to study open scientific issues in the relatively unexplored area at the interface between solids, liquids, and biomolecules at the nanometer length scale. http://mike.zwolak.org

  19. Design of a 9K illumina BeadChip for polar bears (Ursus maritimus) from RAD and transcriptome sequencing.

    PubMed

    Malenfant, René M; Coltman, David W; Davis, Corey S

    2015-05-01

    Single-nucleotide polymorphisms (SNPs) offer numerous advantages over anonymous markers such as microsatellites, including improved estimation of population parameters, finer-scale resolution of population structure and more precise genomic dissection of quantitative traits. However, many SNPs are needed to equal the resolution of a single microsatellite, and reliable large-scale genotyping of SNPs remains a challenge in nonmodel species. Here, we document the creation of a 9K Illumina Infinium BeadChip for polar bears (Ursus maritimus), which will be used to investigate: (i) the fine-scale population structure among Canadian polar bears and (ii) the genomic architecture of phenotypic traits in the Western Hudson Bay subpopulation. To this end, we used restriction-site associated DNA (RAD) sequencing from 38 bears across their circumpolar range, as well as blood/fat transcriptome sequencing of 10 individuals from Western Hudson Bay. Six-thousand RAD SNPs and 3000 transcriptomic SNPs were selected for the chip, based primarily on genomic spacing and gene function respectively. Of the 9000 SNPs ordered from Illumina, 8042 were successfully printed, and - after genotyping 1450 polar bears - 5441 of these SNPs were found to be well clustered and polymorphic. Using this array, we show rapid linkage disequilibrium decay among polar bears, we demonstrate that in a subsample of 78 individuals, our SNPs detect known genetic structure more clearly than 24 microsatellites genotyped for the same individuals and that these results are not driven by the SNP ascertainment scheme. Here, we present one of the first large-scale genotyping resources designed for a threatened species. © 2014 John Wiley & Sons Ltd.

  20. An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies

    PubMed Central

    2012-01-01

    Background The combination of chromatin immunoprecipitation with two-channel microarray technology enables genome-wide mapping of binding sites of DNA-interacting proteins (ChIP-on-chip) or sites with methylated CpG di-nucleotides (DNA methylation microarray). These powerful tools are the gateway to understanding gene transcription regulation. Since the goals of such studies, the sample preparation procedures, the microarray content and study design are all different from transcriptomics microarrays, the data pre-processing strategies traditionally applied to transcriptomics microarrays may not be appropriate. Particularly, the main challenge of the normalization of "regulation microarrays" is (i) to make the data of individual microarrays quantitatively comparable and (ii) to keep the signals of the enriched probes, representing DNA sequences from the precipitate, as distinguishable as possible from the signals of the un-enriched probes, representing DNA sequences largely absent from the precipitate. Results We compare several widely used normalization approaches (VSN, LOWESS, quantile, T-quantile, Tukey's biweight scaling, Peng's method) applied to a selection of regulation microarray datasets, ranging from DNA methylation to transcription factor binding and histone modification studies. Through comparison of the data distributions of control probes and gene promoter probes before and after normalization, and assessment of the power to identify known enriched genomic regions after normalization, we demonstrate that there are clear differences in performance between normalization procedures. Conclusion T-quantile normalization applied separately on the channels and Tukey's biweight scaling outperform other methods in terms of the conservation of enriched and un-enriched signal separation, as well as in identification of genomic regions known to be enriched. T-quantile normalization is preferable as it additionally improves comparability between microarrays. In contrast, popular normalization approaches like quantile, LOWESS, Peng's method and VSN normalization alter the data distributions of regulation microarrays to such an extent that using these approaches will impact the reliability of the downstream analysis substantially. PMID:22276688

  1. Characterization of DNA-protein interactions using high-throughput sequencing data from pulldown experiments

    NASA Astrophysics Data System (ADS)

    Moreland, Blythe; Oman, Kenji; Curfman, John; Yan, Pearlly; Bundschuh, Ralf

    Methyl-binding domain (MBD) protein pulldown experiments have been a valuable tool in measuring the levels of methylated CpG dinucleotides. Due to the frequent use of this technique, high-throughput sequencing data sets are available that allow a detailed quantitative characterization of the underlying interaction between methylated DNA and MBD proteins. Analyzing such data sets, we first found that two such proteins cannot bind closer to each other than 2 bp, consistent with structural models of the DNA-protein interaction. Second, the large amount of sequencing data allowed us to find rather weak but nevertheless clearly statistically significant sequence preferences for several bases around the required CpG. These results demonstrate that pulldown sequencing is a high-precision tool in characterizing DNA-protein interactions. This material is based upon work supported by the National Science Foundation under Grant No. DMR-1410172.

  2. SeqCompress: an algorithm for biological sequence compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz; Bajwa, Hassan

    2014-10-01

    The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount of data efficiently. Biological sequence data storage cost has become a noticeable proportion of total cost in the generation and analysis. Particularly increase in DNA sequencing rate is significantly outstripping the rate of increase in disk storage capacity, which may go beyond the limit of storage capacity. It is essential to develop algorithms that handle large data sets via better memory management. This article presents a DNA sequence compression algorithm SeqCompress that copes with the space complexity of biological sequences. The algorithm is based on lossless data compression and uses statistical model as well as arithmetic coding to compress DNA sequences. The proposed algorithm is compared with recent specialized compression tools for biological sequences. Experimental results show that proposed algorithm has better compression gain as compared to other existing algorithms. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    PubMed Central

    2011-01-01

    Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061

  4. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    PubMed

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  5. Analysis of mutational spectra by denaturant capillary electrophoresis

    PubMed Central

    Ekstrøm, Per O.; Khrapko, Konstantin; Li-Sucholeiki, Xiao-Cheng; Hunter, Ian W.; Thilly, William G.

    2009-01-01

    Numbers and kinds of point mutant within DNA from cells, tissues and human population may be discovered for nearly any 75–250bp DNA sequence. High fidelity DNA amplification incorporating a thermally stable DNA “clamp” is followed by separation by denaturing capillary electrophoresis (DCE). DCE allows for peak collection and verification sequencing. DCE in a mode of cycling temperature, e.g.+/− 5°C, CyDCE, permits high resolution of mutant sequences using computer defined analytes without preliminary optimization experiments. DNA sequencers have been modified to permit higher throughput CyDCE and a massively parallel,~25,000 capillary system, has been designed for pangenomic scans in large human populations. DCE has been used to define quantitative point mutational spectra for study a wide variety of genetic phenomena: errors of DNA polymerases, mutations induced in human cells by chemicals and irradiation, testing of human gene-common disease associations and the discovery of origins of point mutations in human development and carcinogenesis. PMID:18600220

  6. Cloning, genomic organization, and chromosomal localization of human citrate transport protein to the DiGeorge/velocardiofacial syndrome minimal critical region

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goldmuntz, E.; Budarf, M.L.; Wang, Zhili

    1996-04-15

    DiGeorge syndrome (DGS) and velocardiofacial syndrome have been shown to be associated with microdeletions of chromosomal region 22q11. More recently, patients with conotruncal anomaly face syndrome and some nonsyndromic patients with isolated forms of conotruncal cardiac defects have been found to have 22q11 microdeletions as well. The commonly deleted region, called the DiGeorge chromosomal region (DGCR), spans approximately 1.2 mb and is estimated to contain at least 30 genes. We report a computational approach for gene identification that makes use of large-scale sequencing of cosmids from a contig spanning the DGCR. Using this methodology, we have mapped the human homologmore » of a rodent citrate transport protein to the DGCR. We have isolated a partial cDNA containing the complete open reading frame and have determined the genomic structure by comparing the genomic sequence from the cosmid to the sequence of the cDNA clone. Whether the citrate transport protein can be implicated in the biological etiology of DGS or other 22q11 microdeletion syndromes remains to be defined. 36 refs., 3 figs., 1 tab.« less

  7. Epigenetics of prostate cancer.

    PubMed

    McKee, Tawnya C; Tricoli, James V

    2015-01-01

    The introduction of novel technologies that can be applied to the investigation of the molecular underpinnings of human cancer has allowed for new insights into the mechanisms associated with tumor development and progression. They have also advanced the diagnosis, prognosis and treatment of cancer. These technologies include microarray and other analysis methods for the generation of large-scale gene expression data on both mRNA and miRNA, next-generation DNA sequencing technologies utilizing a number of platforms to perform whole genome, whole exome, or targeted DNA sequencing to determine somatic mutational differences and gene rearrangements, and a variety of proteomic analysis platforms including liquid chromatography/mass spectrometry (LC/MS) analysis to survey alterations in protein profiles in tumors. One other important advancement has been our current ability to survey the methylome of human tumors in a comprehensive fashion through the use of sequence-based and array-based methylation analysis (Bock et al., Nat Biotechnol 28:1106-1114, 2010; Harris et al., Nat Biotechnol 28:1097-1105, 2010). The focus of this chapter is to present and discuss the evidence for key genes involved in prostate tumor development, progression, or resistance to therapy that are regulated by methylation-induced silencing.

  8. Poly A- Transcripts Expressed in HeLa Cells

    PubMed Central

    Lu, Jian; Xuan, Zhenyu; Chen, Jun; Zheng, Yonglan; Zhou, Tom; Zhang, Michael Q.; Wu, Chung-I; Wang, San Ming

    2008-01-01

    Background Transcripts expressed in eukaryotes are classified as poly A+ transcripts or poly A- transcripts based on the presence or absence of the 3′ poly A tail. Most transcripts identified so far are poly A+ transcripts, whereas the poly A- transcripts remain largely unknown. Methodology/Principal Findings We developed the TRD (Total RNA Detection) system for transcript identification. The system detects the transcripts through the following steps: 1) depleting the abundant ribosomal and small-size transcripts; 2) synthesizing cDNA without regard to the status of the 3′ poly A tail; 3) applying the 454 sequencing technology for massive 3′ EST collection from the cDNA; and 4) determining the genome origins of the detected transcripts by mapping the sequences to the human genome reference sequences. Using this system, we characterized the cytoplasmic transcripts from HeLa cells. Of the 13,467 distinct 3′ ESTs analyzed, 24% are poly A-, 36% are poly A+, and 40% are bimorphic with poly A+ features but without the 3′ poly A tail. Most of the poly A- 3′ ESTs do not match known transcript sequences; they have a similar distribution pattern in the genome as the poly A+ and bimorphic 3′ ESTs, and their mapped intergenic regions are evolutionarily conserved. Experiments confirmed the authenticity of the detected poly A- transcripts. Conclusion/Significance Our study provides the first large-scale sequence evidence for the presence of poly A- transcripts in eukaryotes. The abundance of the poly A- transcripts highlights the need for comprehensive identification of these transcripts for decoding the transcriptome, annotating the genome and studying biological relevance of the poly A- transcripts. PMID:18665230

  9. Development of a High-Throughput Resequencing Array for the Detection of Pathogenic Mutations in Osteogenesis Imperfecta

    PubMed Central

    Wang, Yao; Cui, Yazhou; Zhou, Xiaoyan; Han, Jinxiang

    2015-01-01

    Objective Osteogenesis imperfecta (OI) is a rare inherited skeletal disease, characterized by bone fragility and low bone density. The mutations in this disorder have been widely reported to be on various exonal hotspots of the candidate genes, including COL1A1, COL1A2, CRTAP, LEPRE1, and FKBP10, thus creating a great demand for precise genetic tests. However, large genome sizes make the process daunting and the analyses, inefficient and expensive. Therefore, we aimed at developing a fast, accurate, efficient, and cheaper sequencing platform for OI diagnosis; and to this end, use of an advanced array-based technique was proposed. Method A CustomSeq Affymetrix Resequencing Array was established for high-throughput sequencing of five genes simultaneously. Genomic DNA extraction from 13 OI patients and 85 normal controls and amplification using long-range PCR (LR-PCR) were followed by DNA fragmentation and chip hybridization, according to standard Affymetrix protocols. Hybridization signals were determined using GeneChip Sequence Analysis Software (GSEQ). To examine the feasibility, the outcome from new resequencing approach was validated by conventional capillary sequencing method. Result Overall call rates using resequencing array was 96–98% and the agreement between microarray and capillary sequencing was 99.99%. 11 out of 13 OI patients with pathogenic mutations were successfully detected by the chip analysis without adjustment, and one mutation could also be identified using manual visual inspection. Conclusion A high-throughput resequencing array was developed that detects the disease-associated mutations in OI, providing a potential tool to facilitate large-scale genetic screening for OI patients. Through this method, a novel mutation was also found. PMID:25742658

  10. Chemocoding as an identification tool where morphological- and DNA-based methods fall short: Inga as a case study.

    PubMed

    Endara, María-José; Coley, Phyllis D; Wiggins, Natasha L; Forrister, Dale L; Younkin, Gordon C; Nicholls, James A; Pennington, R Toby; Dexter, Kyle G; Kidner, Catherine A; Stone, Graham N; Kursar, Thomas A

    2018-04-01

    The need for species identification and taxonomic discovery has led to the development of innovative technologies for large-scale plant identification. DNA barcoding has been useful, but fails to distinguish among many species in species-rich plant genera, particularly in tropical regions. Here, we show that chemical fingerprinting, or 'chemocoding', has great potential for plant identification in challenging tropical biomes. Using untargeted metabolomics in combination with multivariate analysis, we constructed species-level fingerprints, which we define as chemocoding. We evaluated the utility of chemocoding with species that were defined morphologically and subject to next-generation DNA sequencing in the diverse and recently radiated neotropical genus Inga (Leguminosae), both at single study sites and across broad geographic scales. Our results show that chemocoding is a robust method for distinguishing morphologically similar species at a single site and for identifying widespread species across continental-scale ranges. Given that species are the fundamental unit of analysis for conservation and biodiversity research, the development of accurate identification methods is essential. We suggest that chemocoding will be a valuable additional source of data for a quick identification of plants, especially for groups where other methods fall short. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.

  11. In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

    PubMed

    Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J

    2015-01-01

    The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.

  12. Design and construction of functional AAV vectors.

    PubMed

    Gray, John T; Zolotukhin, Serge

    2011-01-01

    Using the basic principles of molecular biology and laboratory techniques presented in this chapter, researchers should be able to create a wide variety of AAV vectors for both clinical and basic research applications. Basic vector design concepts are covered for both protein coding gene expression and small non-coding RNA gene expression cassettes. AAV plasmid vector backbones (available via AddGene) are described, along with critical sequence details for a variety of modular expression components that can be inserted as needed for specific applications. Protocols are provided for assembling the various DNA components into AAV vector plasmids in Escherichia coli, as well as for transferring these vector sequences into baculovirus genomes for large-scale production of AAV in the insect cell production system.

  13. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  14. DNA-encoded libraries - an efficient small molecule discovery technology for the biomedical sciences.

    PubMed

    Kunig, Verena; Potowski, Marco; Gohla, Anne; Brunschweiger, Andreas

    2018-06-27

    DNA-encoded compound libraries are a highly attractive technology for the discovery of small molecule protein ligands. These compound collections consist of small molecules covalently connected to individual DNA sequences carrying readable information about the compound structure. DNA-tagging allows for efficient synthesis, handling and interrogation of vast numbers of chemically synthesized, drug-like compounds. They are screened on proteins by an efficient, generic assay based on Darwinian principles of selection. To date, selection of DNA-encoded libraries allowed for the identification of numerous bioactive compounds. Some of these compounds uncovered hitherto unknown allosteric binding sites on target proteins; several compounds proved their value as chemical biology probes unraveling complex biology; and the first examples of clinical candidates that trace their ancestry to a DNA-encoded library were reported. Thus, DNA-encoded libraries proved their value for the biomedical sciences as a generic technology for the identification of bioactive drug-like molecules numerous times. However, large scale experiments showed that even the selection of billions of compounds failed to deliver bioactive compounds for the majority of proteins in an unbiased panel of target proteins. This raises the question of compound library design.

  15. Population genetics inside a cell: Mutations and mitochondrial genome maintenance

    NASA Astrophysics Data System (ADS)

    Goyal, Sidhartha; Shraiman, Boris; Gottschling, Dan

    2012-02-01

    In realistic ecological and evolutionary systems natural selection acts on multiple levels, i.e. it acts on individuals as well as on collection of individuals. An understanding of evolutionary dynamics of such systems is limited in large part due to the lack of experimental systems that can challenge theoretical models. Mitochondrial genomes (mtDNA) are subjected to selection acting on cellular as well as organelle levels. It is well accepted that mtDNA in yeast Saccharomyces cerevisiae is unstable and can degrade over time scales comparable to yeast cell division time. We utilize a recent technology designed in Gottschling lab to extract DNA from populations of aged yeast cells and deep sequencing to characterize mtDNA variation in a population of young and old cells. In tandem, we developed a stochastic model that includes the essential features of mitochondrial biology that provides a null model for expected mtDNA variation. Overall, we find approximately 2% of the polymorphic loci that show significant increase in frequency as cells age providing direct evidence for organelle level selection. Such quantitative study of mtDNA dynamics is absolutely essential to understand the propagation of mtDNA mutations linked to a spectrum of age-related diseases in humans.

  16. DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses.

    PubMed

    Zepeda-Mendoza, Marie Lisandra; Bohmann, Kristine; Carmona Baez, Aldo; Gilbert, M Thomas P

    2016-05-03

    DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.

  17. Large-scale parallel genome assembler over cloud computing environment.

    PubMed

    Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

    2017-06-01

    The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.

  18. Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes

    PubMed Central

    Lou, Wangchao; Wang, Xiaoqing; Chen, Fan; Chen, Yixiao; Jiang, Bo; Zhang, Hua

    2014-01-01

    Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins. PMID:24475169

  19. Development and validation of a mixed-tissue oligonucleotide DNA microarray for Atlantic bluefin tuna, Thunnus thynnus (Linnaeus, 1758).

    PubMed

    Trumbić, Željka; Bekaert, Michaël; Taggart, John B; Bron, James E; Gharbi, Karim; Mladineo, Ivona

    2015-11-25

    The largest of the tuna species, Atlantic bluefin tuna (Thunnus thynnus), inhabits the North Atlantic Ocean and the Mediterranean Sea and is considered to be an endangered species, largely a consequence of overfishing. T. thynnus aquaculture, referred to as fattening or farming, is a capture based activity dependent on yearly renewal from the wild. Thus, the development of aquaculture practices independent of wild resources can provide an important contribution towards ensuring security and sustainability of this species in the longer-term. The development of such practices is today greatly assisted by large scale transcriptomic studies. We have used pyrosequencing technology to sequence a mixed-tissue normalised cDNA library, derived from adult T. thynnus. A total of 976,904 raw sequence reads were assembled into 33,105 unique transcripts having a mean length of 893 bases and an N50 of 870. Of these, 33.4% showed similarity to known proteins or gene transcripts and 86.6% of them were matched to the congeneric Pacific bluefin tuna (Thunnus orientalis) genome, compared to 70.3% for the more distantly related Nile tilapia (Oreochromis niloticus) genome. Transcript sequences were used to develop a novel 15 K Agilent oligonucleotide DNA microarray for T. thynnus and comparative tissue gene expression profiles were inferred for gill, heart, liver, ovaries and testes. Functional contrasts were strongest between gills and ovaries. Gills were particularly associated with immune system, signal transduction and cell communication, while ovaries displayed signatures of glycan biosynthesis, nucleotide metabolism, transcription, translation, replication and repair. Sequence data generated from a novel mixed-tissue T. thynnus cDNA library provide an important transcriptomic resource that can be further employed for study of various aspects of T. thynnus ecology and genomics, with strong applications in aquaculture. Tissue-specific gene expression profiles inferred through the use of novel oligo-microarray can serve in the design of new and more focused transcriptomic studies for future research of tuna physiology and assessment of the welfare in a production environment.

  20. Increased Sensitivity of Diagnostic Mutation Detection by Re-analysis Incorporating Local Reassembly of Sequence Reads.

    PubMed

    Watson, Christopher M; Camm, Nick; Crinnion, Laura A; Clokie, Samuel; Robinson, Rachel L; Adlard, Julian; Charlton, Ruth; Markham, Alexander F; Carr, Ian M; Bonthron, David T

    2017-12-01

    Diagnostic genetic testing programmes based on next-generation DNA sequencing have resulted in the accrual of large datasets of targeted raw sequence data. Most diagnostic laboratories process these data through an automated variant-calling pipeline. Validation of the chosen analytical methods typically depends on confirming the detection of known sequence variants. Despite improvements in short-read alignment methods, current pipelines are known to be comparatively poor at detecting large insertion/deletion mutations. We performed clinical validation of a local reassembly tool, ABRA (assembly-based realigner), through retrospective reanalysis of a cohort of more than 2000 hereditary cancer cases. ABRA enabled detection of a 96-bp deletion, 4-bp insertion mutation in PMS2 that had been initially identified using a comparative read-depth approach. We applied an updated pipeline incorporating ABRA to the entire cohort of 2000 cases and identified one previously undetected pathogenic variant, a 23-bp duplication in PTEN. We demonstrate the effect of read length on the ability to detect insertion/deletion variants by comparing HiSeq2500 (2 × 101-bp) and NextSeq500 (2 × 151-bp) sequence data for a range of variants and thereby show that the limitations of shorter read lengths can be mitigated using appropriate informatics tools. This work highlights the need for ongoing development of diagnostic pipelines to maximize test sensitivity. We also draw attention to the large differences in computational infrastructure required to perform day-to-day versus large-scale reprocessing tasks.

  1. Mutation detection using automated fluorescence-based sequencing.

    PubMed

    Montgomery, Kate T; Iartchouck, Oleg; Li, Li; Perera, Anoja; Yassin, Yosuf; Tamburino, Alex; Loomis, Stephanie; Kucherlapati, Raju

    2008-04-01

    The development of high-throughput DNA sequencing techniques has made direct DNA sequencing of PCR-amplified genomic DNA a rapid and economical approach to the identification of polymorphisms that may play a role in disease. Point mutations as well as small insertions or deletions are readily identified by DNA sequencing. The mutations may be heterozygous (occurring in one allele while the other allele retains the normal sequence) or homozygous (occurring in both alleles). Sequencing alone cannot discriminate between true homozygosity and apparent homozygosity due to the loss of one allele due to a large deletion. In this unit, strategies are presented for using PCR amplification and automated fluorescence-based sequencing to identify sequence variation. The size of the project and laboratory preference and experience will dictate how the data is managed and which software tools are used for analysis. A high-throughput protocol is given that has been used to search for mutations in over 200 different genes at the Harvard Medical School - Partners Center for Genetics and Genomics (HPCGG, http://www.hpcgg.org/). Copyright 2008 by John Wiley & Sons, Inc.

  2. Genetic Variation in the Acorn Barnacle from Allozymes to Population Genomics

    PubMed Central

    Flight, Patrick A.; Rand, David M.

    2012-01-01

    Understanding the patterns of genetic variation within and among populations is a central problem in population and evolutionary genetics. We examine this question in the acorn barnacle, Semibalanus balanoides, in which the allozyme loci Mpi and Gpi have been implicated in balancing selection due to varying selective pressures at different spatial scales. We review the patterns of genetic variation at the Mpi locus, compare this to levels of population differentiation at mtDNA and microsatellites, and place these data in the context of genome-wide variation from high-throughput sequencing of population samples spanning the North Atlantic. Despite considerable geographic variation in the patterns of selection at the Mpi allozyme, this locus shows rather low levels of population differentiation at ecological and trans-oceanic scales (FST ∼ 5%). Pooled population sequencing was performed on samples from Rhode Island (RI), Maine (ME), and Southwold, England (UK). Analysis of more than 650 million reads identified approximately 335,000 high-quality SNPs in 19 million base pairs of the S. balanoides genome. Much variation is shared across the Atlantic, but there are significant examples of strong population differentiation among samples from RI, ME, and UK. An FST outlier screen of more than 22,000 contigs provided a genome-wide context for interpretation of earlier studies on allozymes, mtDNA, and microsatellites. FST values for allozymes, mtDNA and microsatellites are close to the genome-wide average for random SNPs, with the exception of the trans-Atlantic FST for mtDNA. The majority of FST outliers were unique between individual pairs of populations, but some genes show shared patterns of excess differentiation. These data indicate that gene flow is high, that selection is strong on a subset of genes, and that a variety of genes are experiencing diversifying selection at large spatial scales. This survey of polymorphism in S. balanoides provides a number of genomic tools that promise to make this a powerful model for ecological genomics of the rocky intertidal. PMID:22767487

  3. Genome-wide DNA polymorphisms in two cultivars of mei (Prunus mume sieb. et zucc.).

    PubMed

    Sun, Lidan; Zhang, Qixiang; Xu, Zongda; Yang, Weiru; Guo, Yu; Lu, Jiuxing; Pan, Huitang; Cheng, Tangren; Cai, Ming

    2013-10-06

    Mei (Prunus mume Sieb. et Zucc.) is a famous ornamental plant and fruit crop grown in East Asian countries. Limited genetic resources, especially molecular markers, have hindered the progress of mei breeding projects. Here, we performed low-depth whole-genome sequencing of Prunus mume 'Fenban' and Prunus mume 'Kouzi Yudie' to identify high-quality polymorphic markers between the two cultivars on a large scale. A total of 1464.1 Mb and 1422.1 Mb of 'Fenban' and 'Kouzi Yudie' sequencing data were uniquely mapped to the mei reference genome with about 6-fold coverage, respectively. We detected a large number of putative polymorphic markers from the 196.9 Mb of sequencing data shared by the two cultivars, which together contained 200,627 SNPs, 4,900 InDels, and 7,063 SSRs. Among these markers, 38,773 SNPs, 174 InDels, and 418 SSRs were distributed in the 22.4 Mb CDS region, and 63.0% of these marker-containing CDS sequences were assigned to GO terms. Subsequently, 670 selected SNPs were validated using an Agilent's SureSelect solution phase hybridization assay. A subset of 599 SNPs was used to assess the genetic similarity of a panel of mei germplasm samples and a plum (P. salicina) cultivar, producing a set of informative diversity data. We also analyzed the frequency and distribution of detected InDels and SSRs in mei genome and validated their usefulness as DNA markers. These markers were successfully amplified in the cultivars and in their segregating progeny. A large set of high-quality polymorphic SNPs, InDels, and SSRs were identified in parallel between 'Fenban' and 'Kouzi Yudie' using low-depth whole-genome sequencing. The study presents extensive data on these polymorphic markers, which can be useful for constructing high-resolution genetic maps, performing genome-wide association studies, and designing genomic selection strategies in mei.

  4. What Advances Are Being Made in DNA Sequencing?

    MedlinePlus

    ... to identify genetic variations; both methods rely on new technologies that allow rapid sequencing of large amounts of ... describes the different sequencing technologies and what the new technologies have meant for the study of the genetic ...

  5. Correlation of Local Effects of DNA Sequence and Position of Beta-Alanine Inserts with Polyamide-DNA Complex Binding Affinities and Kinetics

    PubMed Central

    Wang, Shuo; Nanjunda, Rupesh; Aston, Karl; Bashkin, James K.; Wilson, W. David

    2012-01-01

    In order to better understand the effects of β-alanine (β) substitution and the number of heterocycles on DNA binding affinity and selectivity, the interactions of an eight-ring hairpin polyamide (PA) and two β derivatives as well as a six-heterocycle analog have been investigated with their cognate DNA sequence, 5′-TGGCTT-3′. Binding selectivity and the effects of β have been investigated with the cognate and five mutant DNAs. A set of powerful and complementary methods have been employed for both energetic and structural evaluations: UV-melting, biosensor-surface plasmon resonance, isothermal titration calorimetry, circular dichroism and a DNA ligation ladder global structure assay. The reduced number of heterocycles in the six-ring PA weakens the binding affinity; however, the smaller PA aggregates significantly less than the larger PAs, and allows us to obtain the binding thermodynamics. The PA-DNA binding enthalpy is large and negative with a large negative ΔCp, and is the primary driving component of the Gibbs free energy. The complete SPR binding results clearly show that β substitutions can substantially weaken the binding affinity of hairpin PAs in a position-dependent manner. More importantly, the changes in PA binding to the mutant DNAs further confirm the position-dependent effects on PA-DNA interaction affinity. Comparison of mutant DNA sequences also shows a different effect in recognition of T•A versus A•T base pairs. The effects of DNA mutations on binding of a single PA as well as the effects of the position of β substitution on binding tell a clear and very important story about sequence dependent binding of PAs to DNA. PMID:23167504

  6. On the Sequence-Directed Nature of Human Gene Mutation: The Role of Genomic Architecture and the Local DNA Sequence Environment in Mediating Gene Mutations Underlying Human Inherited Disease

    PubMed Central

    Cooper, David N.; Bacolla, Albino; Férec, Claude; Vasquez, Karen M.; Kehrer-Sawatzki, Hildegard; Chen, Jian-Min

    2011-01-01

    Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher-order features of the genomic architecture. The human genome is now recognized to contain ‘pervasive architectural flaws’ in that certain DNA sequences are inherently mutation-prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of non-canonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair, and may serve to increase mutation frequencies in generalized fashion (i.e. both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease. PMID:21853507

  7. Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq.

    PubMed

    Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G

    2014-11-29

    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.

  8. Characterization of (CA)n microsatellite repeats from large-insert clones.

    PubMed

    Litt, M; Browne, D

    2001-05-01

    The most laborious part of developing (CA)n microsatellite repeats as genetic markers is constructing DNA clones to permit determination of sequences flanking the microsatellites. When cosmids or large-insert phage clones are used as primary sources of (CA)n repeat markers, they have traditionally been subcloned into plasmid vectors such as pUC18 or M13 mp 18/19 cloning vectors to obtain fragments of suitable size for DNA sequencing. This unit presents an alternative approach whereby a set of degenerate sequencing primers that anneal directly to (CA)n microsatellites can be used to determine sequences that are inaccessible with vector-derived primers. Because the primers anneal to the repeat and not to the vector, they can be used with subclones containing inserts of several kilobases and should, in theory, always give sequence in the regions directly flanking the repeat. Degeneracy at the 3 end of each of these primers prevents elongation of primers that have annealed out-of-register. The most laborious part of developing (CA)n microsatellite repeats as genetic markers is constructing DNA clones to permit.

  9. Population genetics and molecular evolution of DNA sequences in transposable elements. I. A simulation framework.

    PubMed

    Kijima, T E; Innan, Hideki

    2013-11-01

    A population genetic simulation framework is developed to understand the behavior and molecular evolution of DNA sequences of transposable elements. Our model incorporates random transposition and excision of transposable element (TE) copies, two modes of selection against TEs, and degeneration of transpositional activity by point mutations. We first investigated the relationships between the behavior of the copy number of TEs and these parameters. Our results show that when selection is weak, the genome can maintain a relatively large number of TEs, but most of them are less active. In contrast, with strong selection, the genome can maintain only a limited number of TEs but the proportion of active copies is large. In such a case, there could be substantial fluctuations of the copy number over generations. We also explored how DNA sequences of TEs evolve through the simulations. In general, active copies form clusters around the original sequence, while less active copies have long branches specific to themselves, exhibiting a star-shaped phylogeny. It is demonstrated that the phylogeny of TE sequences could be informative to understand the dynamics of TE evolution.

  10. The application of magnetic bead hybridization for the recovery and STR amplification of degraded and inhibited forensic DNA.

    PubMed

    Wang, Jing; McCord, Bruce

    2011-06-01

    A common problem in the analysis of forensic DNA evidence is the presence of environmentally degraded and inhibited DNA. Such samples produce a variety of interpretational problems such as allele imbalance, allele dropout and sequence specific inhibition. In an attempt to develop methods to enhance the recovery of this type of evidence, magnetic bead hybridization has been applied to extract and preconcentrate DNA sequences containing short tandem repeat (STR) alleles of interest. In this work, genomic DNA was fragmented by heating, and sequences associated with STR alleles were selectively hybridized to allele-specific biotinylated probes. Each particular biotinylated probe-DNA complex was bound to streptavidin-coated magnetic beads using enabling enrichment of target DNA sequences. Experiments conducted using degraded DNA samples, as well as samples containing a large concentration of inhibitory substances, showed good specificity and recovery of missing alleles. Based on the favorable results obtained with these specific probes, this method should prove useful as a tool to improve the recovery of alleles from degraded and inhibited DNA samples. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Two myxozoans from the urinary tract of topsmelt, Atherinops affinis

    USGS Publications Warehouse

    Sanders, Justin L.; Jaramillo, Alejandra G.; Ashford, Jacob E.; Feist, Stephen W.; Lafferty, Kevin D.; Kent, Michael L.

    2015-01-01

    Two myxozoan species were observed in the kidney of topsmelt, Atherinops affinis, during a survey of parasites of estuarine fishes in the Carpinteria Salt Marsh Reserve, California. Fish collected on three dates in 2012 and 2013 were sectioned and examined histologically. Large extrasporogonic stages occurred in the renal interstitium of several fish from the first two collections (5/8, 11/20, respectively), and, in some fish, these replaced over 80% of the kidney. In addition, presporogonic and polysporogonic stages occurred in the lumen of the renal tubules, collecting and mesonephric ducts. The latter contained subspherical spores with up to 4 polar capsules, consistent with the genus Chloromyxum. For the third collection (15 May 2013, n=30), we portioned kidneys for examination by histology, wet mount, and DNA extraction for small subunit ribosomal gene sequencing. Histology showed the large extrasporogonic forms in the kidney interstitium of 3 fish, and 2 other fish with subspherical myxospores in the lumen of the renal tubules with smooth valves and two spherical polar capsules consistent with the genus Sphaerospora. Chloromyxum-type myxospores were observed in the renal tubules of one fish by wet mount. Sequencing of the kidney tissue from this fish yielded a partial SSU rDNA sequence of 1769 bp. Phylogenetic reconstruction suggested this organism to be a novel species of Chloromyxum, most similar to Chloromyxum careni (84% similarity). In addition, subspherical myxospores with smooth valves and two spherical polar capsules consistent with the genus Sphaerospora were observed in wet mounts of 2 fish. Sequencing of the kidney tissue from 1 fish yielded a partial SSU rDNA sequence of 1937 bp. Phylogenetic reconstruction suggests this organism to be a novel species of Sphaerospora most closely related to Sphaerospora epinepheli (93%). We conclude that these organisms represent novel species of the genera Chloromyxum and Sphaerospora based on host, location, and SSU rDNA sequence. We further conclude that the formation of large, histozoic extrasprogonic stages in the renal interstitium represent developmental stages of the Chloromyxum species for the following reasons: 1. Large extrasporogonic stages stages were only observed in fish with Chloromyxum-type spores developing within the renal tubules, 2. DNA sequence consistent with the Chloromyxum sp. was only detected in fish with the large extrasporogonic stages and 3.Sphaerospora species have extrasporogonic forms, but they are considerably smaller and are comprised of much fewer cells.

  12. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.

    PubMed

    Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.

  13. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

    PubMed Central

    Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204

  14. Learning a weighted sequence model of the nucleosome core and linker yields more accurate predictions in Saccharomyces cerevisiae and Homo sapiens.

    PubMed

    Reynolds, Sheila M; Bilmes, Jeff A; Noble, William Stafford

    2010-07-08

    DNA in eukaryotes is packaged into a chromatin complex, the most basic element of which is the nucleosome. The precise positioning of the nucleosome cores allows for selective access to the DNA, and the mechanisms that control this positioning are important pieces of the gene expression puzzle. We describe a large-scale nucleosome pattern that jointly characterizes the nucleosome core and the adjacent linkers and is predominantly characterized by long-range oscillations in the mono, di- and tri-nucleotide content of the DNA sequence, and we show that this pattern can be used to predict nucleosome positions in both Homo sapiens and Saccharomyces cerevisiae more accurately than previously published methods. Surprisingly, in both H. sapiens and S. cerevisiae, the most informative individual features are the mono-nucleotide patterns, although the inclusion of di- and tri-nucleotide features results in improved performance. Our approach combines a much longer pattern than has been previously used to predict nucleosome positioning from sequence-301 base pairs, centered at the position to be scored-with a novel discriminative classification approach that selectively weights the contributions from each of the input features. The resulting scores are relatively insensitive to local AT-content and can be used to accurately discriminate putative dyad positions from adjacent linker regions without requiring an additional dynamic programming step and without the attendant edge effects and assumptions about linker length modeling and overall nucleosome density. Our approach produces the best dyad-linker classification results published to date in H. sapiens, and outperforms two recently published models on a large set of S. cerevisiae nucleosome positions. Our results suggest that in both genomes, a comparable and relatively small fraction of nucleosomes are well-positioned and that these positions are predictable based on sequence alone. We believe that the bulk of the remaining nucleosomes follow a statistical positioning model.

  15. Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Hualan; Price, Morgan N.; Waters, Robert Jordan

    Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach for discovering the functions of bacterial genes. However, the development of a suitable TnSeq strategy for a given bacterium can be costly and time-consuming. To meet this challenge, we describe a part-based strategy for constructing libraries of hundreds of transposon delivery vectors, which we term “magic pools.” Within a magic pool, each transposon vector has a different combination of upstream sequences (promoters and ribosome binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows the tracking of each vector during mutagenesis experiments. Tomore » identify an efficient vector for a given bacterium, we mutagenize it with a magic pool and sequence the resulting insertions; we then use this efficient vector to generate a large mutant library. We used the magic pool strategy to construct transposon mutant libraries in five genera of bacteria, including three genera of the phylumBacteroidetes. IMPORTANCEMolecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a “magic pool.” The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA “parts,” we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.« less

  16. Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria

    DOE PAGES

    Liu, Hualan; Price, Morgan N.; Waters, Robert Jordan; ...

    2018-01-16

    Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach for discovering the functions of bacterial genes. However, the development of a suitable TnSeq strategy for a given bacterium can be costly and time-consuming. To meet this challenge, we describe a part-based strategy for constructing libraries of hundreds of transposon delivery vectors, which we term “magic pools.” Within a magic pool, each transposon vector has a different combination of upstream sequences (promoters and ribosome binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows the tracking of each vector during mutagenesis experiments. Tomore » identify an efficient vector for a given bacterium, we mutagenize it with a magic pool and sequence the resulting insertions; we then use this efficient vector to generate a large mutant library. We used the magic pool strategy to construct transposon mutant libraries in five genera of bacteria, including three genera of the phylumBacteroidetes. IMPORTANCEMolecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a “magic pool.” The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA “parts,” we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.« less

  17. Billions of basepairs of recently expanded, repetitive sequences are eliminated from the somatic genome during copepod development.

    PubMed

    Sun, Cheng; Wyngaard, Grace; Walton, D Brian; Wichman, Holly A; Mueller, Rachel Lockridge

    2014-03-11

    Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution--some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 - 75 Gb, 12-74 Gb of which are lost from pre-somatic cell lineages at germline--soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms.

  18. Billions of basepairs of recently expanded, repetitive sequences are eliminated from the somatic genome during copepod development

    PubMed Central

    2014-01-01

    Background Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution — some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 – 75 Gb, 12–74 Gb of which are lost from pre-somatic cell lineages at germline – soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Results Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Conclusions Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms. PMID:24618421

  19. Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis

    PubMed Central

    Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri

    2016-01-01

    Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774

  20. Single-molecule optical genome mapping of a human HapMap and a colorectal cancer cell line.

    PubMed

    Teo, Audrey S M; Verzotto, Davide; Yao, Fei; Nagarajan, Niranjan; Hillmer, Axel M

    2015-01-01

    Next-generation sequencing (NGS) technologies have changed our understanding of the variability of the human genome. However, the identification of genome structural variations based on NGS approaches with read lengths of 35-300 bases remains a challenge. Single-molecule optical mapping technologies allow the analysis of DNA molecules of up to 2 Mb and as such are suitable for the identification of large-scale genome structural variations, and for de novo genome assemblies when combined with short-read NGS data. Here we present optical mapping data for two human genomes: the HapMap cell line GM12878 and the colorectal cancer cell line HCT116. High molecular weight DNA was obtained by embedding GM12878 and HCT116 cells, respectively, in agarose plugs, followed by DNA extraction under mild conditions. Genomic DNA was digested with KpnI and 310,000 and 296,000 DNA molecules (≥ 150 kb and 10 restriction fragments), respectively, were analyzed per cell line using the Argus optical mapping system. Maps were aligned to the human reference by OPTIMA, a new glocal alignment method. Genome coverage of 6.8× and 5.7× was obtained, respectively; 2.9× and 1.7× more than the coverage obtained with previously available software. Optical mapping allows the resolution of large-scale structural variations of the genome, and the scaffold extension of NGS-based de novo assemblies. OPTIMA is an efficient new alignment method; our optical mapping data provide a resource for genome structure analyses of the human HapMap reference cell line GM12878, and the colorectal cancer cell line HCT116.

  1. Evolution via recombination: Cell-to-cell contact facilitates larger recombination events in Streptococcus pneumoniae.

    PubMed

    Cowley, Lauren A; Petersen, Fernanda C; Junges, Roger; Jimson D Jimenez, Med; Morrison, Donald A; Hanage, William P

    2018-06-01

    Homologous recombination in the genetic transformation model organism Streptococcus pneumoniae is thought to be important in the adaptation and evolution of this pathogen. While competent pneumococci are able to scavenge DNA added to laboratory cultures, large-scale transfers of multiple kb are rare under these conditions. We used whole genome sequencing (WGS) to map transfers in recombinants arising from contact of competent cells with non-competent 'target' cells, using strains with known genomes, distinguished by a total of ~16,000 SNPs. Experiments designed to explore the effect of environment on large scale recombination events used saturating purified donor DNA, short-term cell assemblages on Millipore filters, and mature biofilm mixed cultures. WGS of 22 recombinants for each environment mapped all SNPs that were identical between the recombinant and the donor but not the recipient. The mean recombination event size was found to be significantly larger in cell-to-cell contact cultures (4051 bp in filter assemblage and 3938 bp in biofilm co-culture versus 1815 bp with saturating DNA). Up to 5.8% of the genome was transferred, through 20 recombination events, to a single recipient, with the largest single event incorporating 29,971 bp. We also found that some recombination events are clustered, that these clusters are more likely to occur in cell-to-cell contact environments, and that they cause significantly increased linkage of genes as far apart as 60,000 bp. We conclude that pneumococcal evolution through homologous recombination is more likely to occur on a larger scale in environments that permit cell-to-cell contact.

  2. CpG PatternFinder: a Windows-based utility program for easy and rapid identification of the CpG methylation status of DNA.

    PubMed

    Xu, Yi-Hua; Manoharan, Herbert T; Pitot, Henry C

    2007-09-01

    The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.

  3. Lesion bypass activity of DNA polymerase θ (POLQ) is an intrinsic property of the pol domain and depends on unique sequence inserts.

    PubMed

    Hogg, Matthew; Seki, Mineaki; Wood, Richard D; Doublié, Sylvie; Wallace, Susan S

    2011-01-21

    DNA polymerase θ (POLQ, polθ) is a large, multidomain DNA polymerase encoded in higher eukaryotic genomes. It is important for maintaining genetic stability in cells and helping protect cells from DNA damage caused by ionizing radiation. POLQ contains an N-terminal helicase-like domain, a large central domain of indeterminate function, and a C-terminal polymerase domain with sequence similarity to the A-family of DNA polymerases. The enzyme has several unique properties, including low fidelity and the ability to insert and extend past abasic sites and thymine glycol lesions. It is not known whether the abasic site bypass activity is an intrinsic property of the polymerase domain or whether helicase activity is also required. Three "insertion" sequence elements present in POLQ are not found in any other A-family DNA polymerase, and it has been proposed that they may lend some unique properties to POLQ. Here, we analyzed the activity of the DNA polymerase in the absence of each sequence insertion. We found that the pol domain is capable of highly efficient bypass of abasic sites in the absence of the helicase-like or central domains. Insertion 1 increases the processivity of the polymerase but has little, if any, bearing on the translesion synthesis properties of the enzyme. However, removal of insertions 2 and 3 reduces activity on undamaged DNA and completely abrogates the ability of the enzyme to bypass abasic sites or thymine glycol lesions. Copyright © 2010 Elsevier Ltd. All rights reserved.

  4. Complete sequence analysis of 18S rDNA based on genomic DNA extraction from individual Demodex mites (Acari: Demodicidae).

    PubMed

    Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang

    2012-05-01

    The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes

    PubMed Central

    2014-01-01

    Background Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Results Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). Conclusions All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources. PMID:24460871

  6. A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety

    PubMed Central

    Cartwright, Dustin A.; Cestaro, Alessandro; Pruss, Dmitry; Pindo, Massimo; FitzGerald, Lisa M.; Vezzulli, Silvia; Reid, Julia; Malacarne, Giulia; Iliev, Diana; Coppola, Giuseppina; Wardell, Bryan; Micheletti, Diego; Macalma, Teresita; Facci, Marco; Mitchell, Jeff T.; Perazzolli, Michele; Eldredge, Glenn; Gatto, Pamela; Oyzerski, Rozan; Moretto, Marco; Gutin, Natalia; Stefanini, Marco; Chen, Yang; Segala, Cinzia; Davenport, Christine; Demattè, Lorenzo; Mraz, Amy; Battilana, Juri; Stormo, Keith; Costa, Fabrizio; Tao, Quanzhou; Si-Ammour, Azeddine; Harkins, Tim; Lackey, Angie; Perbost, Clotilde; Taillon, Bruce; Stella, Alessandra; Solovyev, Victor; Fawcett, Jeffrey A.; Sterck, Lieven; Vandepoele, Klaas; Grando, Stella M.; Toppo, Stefano; Moser, Claudio; Lanchbury, Jerry; Bogden, Robert; Skolnick, Mark; Sgaramella, Vittorio; Bhatnagar, Satish K.; Fontana, Paolo; Gutin, Alexander; Van de Peer, Yves; Salamini, Francesco; Viola, Roberto

    2007-01-01

    Background Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape. PMID:18094749

  7. Retroviral DNA Integration Directed by HIV Integration Protein in Vitro

    NASA Astrophysics Data System (ADS)

    Bushman, Frederic D.; Fujiwara, Tamio; Craigie, Robert

    1990-09-01

    Efficient retroviral growth requires integration of a DNA copy of the viral RNA genome into a chromosome of the host. As a first step in analyzing the mechanism of integration of human immunodeficiency virus (HIV) DNA, a cell-free system was established that models the integration reaction. The in vitro system depends on the HIV integration (IN) protein, which was partially purified from insect cells engineered to express IN protein in large quantities. Integration was detected in a biological assay that scores the insertion of a linear DNA containing HIV terminal sequences into a λ DNA target. Some integration products generated in this assay contained five-base pair duplications of the target DNA at the recombination junctions, a characteristic of HIV integration in vivo; the remaining products contained aberrant junctional sequences that may have been produced in a variation of the normal reaction. These results indicate that HIV IN protein is the only viral protein required to insert model HIV DNA sequences into a target DNA in vitro.

  8. MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

    PubMed

    Grimes, Susan M; Ji, Hanlee P

    2014-08-27

    Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

  9. Miniaturization Technologies for Efficient Single-Cell Library Preparation for Next-Generation Sequencing.

    PubMed

    Mora-Castilla, Sergio; To, Cuong; Vaezeslami, Soheila; Morey, Robert; Srinivasan, Srimeenakshi; Dumdie, Jennifer N; Cook-Andersen, Heidi; Jenkins, Joby; Laurent, Louise C

    2016-08-01

    As the cost of next-generation sequencing has decreased, library preparation costs have become a more significant proportion of the total cost, especially for high-throughput applications such as single-cell RNA profiling. Here, we have applied novel technologies to scale down reaction volumes for library preparation. Our system consisted of in vitro differentiated human embryonic stem cells representing two stages of pancreatic differentiation, for which we prepared multiple biological and technical replicates. We used the Fluidigm (San Francisco, CA) C1 single-cell Autoprep System for single-cell complementary DNA (cDNA) generation and an enzyme-based tagmentation system (Nextera XT; Illumina, San Diego, CA) with a nanoliter liquid handler (mosquito HTS; TTP Labtech, Royston, UK) for library preparation, reducing the reaction volume down to 2 µL and using as little as 20 pg of input cDNA. The resulting sequencing data were bioinformatically analyzed and correlated among the different library reaction volumes. Our results showed that decreasing the reaction volume did not interfere with the quality or the reproducibility of the sequencing data, and the transcriptional data from the scaled-down libraries allowed us to distinguish between single cells. Thus, we have developed a process to enable efficient and cost-effective high-throughput single-cell transcriptome sequencing. © 2016 Society for Laboratory Automation and Screening.

  10. Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.

    PubMed

    Necci, Marco; Piovesan, Damiano; Tosatto, Silvio C E

    2016-12-01

    Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. © 2016 The Protein Society.

  11. A New Perspective on Polyploid Fragaria (Strawberry) Genome Composition Based on Large-Scale, Multi-Locus Phylogenetic Analysis.

    PubMed

    Yang, Yilong; Davis, Thomas M

    2017-12-01

    The subgenomic compositions of the octoploid (2n = 8× = 56) strawberry (Fragaria) species, including the economically important cultivated species Fragaria x ananassa, have been a topic of long-standing interest. Phylogenomic approaches utilizing next-generation sequencing technologies offer a new window into species relationships and the subgenomic compositions of polyploids. We have conducted a large-scale phylogenetic analysis of Fragaria (strawberry) species using the Fluidigm Access Array system and 454 sequencing platform. About 24 single-copy or low-copy nuclear genes distributed across the genome were amplified and sequenced from 96 genomic DNA samples representing 16 Fragaria species from diploid (2×) to decaploid (10×), including the most extensive sampling of octoploid taxa yet reported. Individual gene trees were constructed by different tree-building methods. Mosaic genomic structures of diploid Fragaria species consisting of sequences at different phylogenetic positions were observed. Our findings support the presence in octoploid species of genetic signatures from at least five diploid ancestors (F. vesca, F. iinumae, F. bucharica, F. viridis, and at least one additional allele contributor of unknown identity), and questions the extent to which distinct subgenomes are preserved over evolutionary time in the allopolyploid Fragaria species. In addition, our data support divergence between the two wild octoploid species, F. virginiana and F. chiloensis. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  12. Is the extraction by Whatman FTA filter matrix technology and sequencing of large ribosomal subunit D1-D2 region sufficient for identification of clinical fungi?

    PubMed

    Kiraz, Nuri; Oz, Yasemin; Aslan, Huseyin; Erturan, Zayre; Ener, Beyza; Akdagli, Sevtap Arikan; Muslumanoglu, Hamza; Cetinkaya, Zafer

    2015-10-01

    Although conventional identification of pathogenic fungi is based on the combination of tests evaluating their morphological and biochemical characteristics, they can fail to identify the less common species or the differentiation of closely related species. In addition these tests are time consuming, labour-intensive and require experienced personnel. We evaluated the feasibility and sufficiency of DNA extraction by Whatman FTA filter matrix technology and DNA sequencing of D1-D2 region of the large ribosomal subunit gene for identification of clinical isolates of 21 yeast and 160 moulds in our clinical mycology laboratory. While the yeast isolates were identified at species level with 100% homology, 102 (63.75%) clinically important mould isolates were identified at species level, 56 (35%) isolates at genus level against fungal sequences existing in DNA databases and two (1.25%) isolates could not be identified. Consequently, Whatman FTA filter matrix technology was a useful method for extraction of fungal DNA; extremely rapid, practical and successful. Sequence analysis strategy of D1-D2 region of the large ribosomal subunit gene was found considerably sufficient in identification to genus level for the most clinical fungi. However, the identification to species level and especially discrimination of closely related species may require additional analysis. © 2015 Blackwell Verlag GmbH.

  13. Biompha-LAMP: A New Rapid Loop-Mediated Isothermal Amplification Assay for Detecting Schistosoma mansoni in Biomphalaria glabrata Snail Host.

    PubMed

    Gandasegui, Javier; Fernández-Soto, Pedro; Hernández-Goenaga, Juan; López-Abán, Julio; Vicente, Belén; Muro, Antonio

    2016-12-01

    Schistosomiasis remains one of the most common endemic parasitic diseases affecting over 230 million people worlwide. Schistosoma mansoni is the main species causing intestinal and hepatic schistosomiasis and the fresh water pulmonate snails of the genus Biomphalaria are best known for their role as intermediate hosts of the parasite. The development of new molecular monitoring assays for large-scale screening of snails from transmission sites to detect the presence of schistosomes is an important point to consider for snail control interventions related to schistosomiasis elimination. Our work was focussed on developing and evaluating a new LAMP assay combined with a simple DNA extraction method to detect S. mansoni in experimentally infected snails as a diagnostic tool for field conditions. A LAMP assay using a set of six primers targeting a sequence of S. mansoni ribosomal intergenic spacer 28S-18S rRNA was designed. The detection limit of the LAMP assay was 0.1 fg of S. mansoni DNA at 63°C for 50 minutes. LAMP was evaluated by examining S. mansoni DNA in B. glabrata snails experimentally exposed to miracidia at different times post-exposure: early prepatent period (before cercarial shedding), light infections (snails exposed to a low number of miracidia) and detection of infected snails in pooled samples (within a group of uninfected snails). DNA for LAMP assays was obtained by using a commercial DNA extraction kit or a simple heat NaOH extraction method. We detected S. mansoni DNA in all groups of snails by using no complicated requirement procedure for DNA obtaining. Our LAMP assay, named Biompha-LAMP, is specific, sensitive, rapid and potentially adaptable as a cost-effective method for screening of intermediate hosts infected with S. mansoni in both individual snails and pooled samples. The assay could be suitable for large-scale field surveys for schistosomes control campaigns in endemic areas.

  14. Stochastic properties of radiation-induced DSB: DSB distributions in large scale chromatin loops, the HPRT gene and within the visible volumes of DNA repair foci.

    PubMed

    Ponomarev, Artem L; Costes, Sylvain V; Cucinotta, Francis A

    2008-11-01

    We computed probabilities to have multiple double-strand breaks (DSB), which are produced in DNA on a regional scale, and not in close vicinity, in volumes matching the size of DNA damage foci, of a large chromatin loop, and in the physical volume of DNA containing the HPRT (human hypoxanthine phosphoribosyltransferase) locus. The model is based on a Monte Carlo description of DSB formation by heavy ions in the spatial context of the entire human genome contained within the cell nucleus, as well as at the gene sequence level. We showed that a finite physical volume corresponding to a visible DNA repair focus, believed to be associated with one DSB, can contain multiple DSB due to heavy ion track structure and the DNA supercoiled topography. A corrective distribution was introduced, which was a conditional probability to have excess DSB in a focus volume, given that there was already one present. The corrective distribution was calculated for 19.5 MeV/amu N ions, 3.77 MeV/amu alpha-particles, 1000 MeV/amu Fe ions, and X-rays. The corrected initial DSB yield from the experimental data on DNA repair foci was calculated. The DSB yield based on the corrective function converts the focus yield into the DSB yield, which is comparable with the DSB yield based on the earlier PFGE experiments. The distribution of DSB within the physical limits of the HPRT gene was analyzed by a similar method as well. This corrective procedure shows the applicability of the model and empowers the researcher with a tool to better analyze focus statistics. The model enables researchers to analyze the DSB yield based on focus statistics in real experimental situations that lack one-to-one focus-to-DSB correspondance.

  15. Biompha-LAMP: A New Rapid Loop-Mediated Isothermal Amplification Assay for Detecting Schistosoma mansoni in Biomphalaria glabrata Snail Host

    PubMed Central

    Hernández-Goenaga, Juan; López-Abán, Julio; Vicente, Belén; Muro, Antonio

    2016-01-01

    Background Schistosomiasis remains one of the most common endemic parasitic diseases affecting over 230 million people worlwide. Schistosoma mansoni is the main species causing intestinal and hepatic schistosomiasis and the fresh water pulmonate snails of the genus Biomphalaria are best known for their role as intermediate hosts of the parasite. The development of new molecular monitoring assays for large-scale screening of snails from transmission sites to detect the presence of schistosomes is an important point to consider for snail control interventions related to schistosomiasis elimination. Our work was focussed on developing and evaluating a new LAMP assay combined with a simple DNA extraction method to detect S. mansoni in experimentally infected snails as a diagnostic tool for field conditions. Methodology/Principal findings A LAMP assay using a set of six primers targeting a sequence of S. mansoni ribosomal intergenic spacer 28S-18S rRNA was designed. The detection limit of the LAMP assay was 0.1 fg of S. mansoni DNA at 63°C for 50 minutes. LAMP was evaluated by examining S. mansoni DNA in B. glabrata snails experimentally exposed to miracidia at different times post-exposure: early prepatent period (before cercarial shedding), light infections (snails exposed to a low number of miracidia) and detection of infected snails in pooled samples (within a group of uninfected snails). DNA for LAMP assays was obtained by using a commercial DNA extraction kit or a simple heat NaOH extraction method. We detected S. mansoni DNA in all groups of snails by using no complicated requirement procedure for DNA obtaining. Conclusions/Significance Our LAMP assay, named Biompha-LAMP, is specific, sensitive, rapid and potentially adaptable as a cost-effective method for screening of intermediate hosts infected with S. mansoni in both individual snails and pooled samples. The assay could be suitable for large-scale field surveys for schistosomes control campaigns in endemic areas. PMID:27941967

  16. Comparing COI and ITS as DNA barcode markers for mushrooms and allies (Agaricomycotina).

    PubMed

    Dentinger, Bryn T M; Didukh, Maryna Y; Moncalvo, Jean-Marc

    2011-01-01

    DNA barcoding is an approach to rapidly identify species using short, standard genetic markers. The mitochondrial cytochrome oxidase I gene (COI) has been proposed as the universal barcode locus, but its utility for barcoding in mushrooms (ca. 20,000 species) has not been established. We succeeded in generating 167 partial COI sequences (~450 bp) representing ~100 morphospecies from ~650 collections of Agaricomycotina using several sets of new primers. Large introns (~1500 bp) at variable locations were detected in ~5% of the sequences we obtained. We suspect that widespread presence of large introns is responsible for our low PCR success (~30%) with this locus. We also sequenced the nuclear internal transcribed spacer rDNA regions (ITS) to compare with COI. Among the small proportion of taxa for which COI could be sequenced, COI and ITS perform similarly as a barcode. However, in a densely sampled set of closely related taxa, COI was less divergent than ITS and failed to distinguish all terminal clades. Given our results and the wealth of ITS data already available in public databases, we recommend that COI be abandoned in favor of ITS as the primary DNA barcode locus in mushrooms.

  17. Comparing COI and ITS as DNA Barcode Markers for Mushrooms and Allies (Agaricomycotina)

    PubMed Central

    Dentinger, Bryn T. M.; Didukh, Maryna Y.; Moncalvo, Jean-Marc

    2011-01-01

    DNA barcoding is an approach to rapidly identify species using short, standard genetic markers. The mitochondrial cytochrome oxidase I gene (COI) has been proposed as the universal barcode locus, but its utility for barcoding in mushrooms (ca. 20,000 species) has not been established. We succeeded in generating 167 partial COI sequences (∼450 bp) representing ∼100 morphospecies from ∼650 collections of Agaricomycotina using several sets of new primers. Large introns (∼1500 bp) at variable locations were detected in ∼5% of the sequences we obtained. We suspect that widespread presence of large introns is responsible for our low PCR success (∼30%) with this locus. We also sequenced the nuclear internal transcribed spacer rDNA regions (ITS) to compare with COI. Among the small proportion of taxa for which COI could be sequenced, COI and ITS perform similarly as a barcode. However, in a densely sampled set of closely related taxa, COI was less divergent than ITS and failed to distinguish all terminal clades. Given our results and the wealth of ITS data already available in public databases, we recommend that COI be abandoned in favor of ITS as the primary DNA barcode locus in mushrooms. PMID:21966418

  18. Report on the Human Genome Initiative for the Office of Health and Environmental Research

    DOE R&D Accomplishments Database

    Tinoco, I.; Cahill, G.; Cantor, C.; Caskey, T.; Dulbecco, R.; Engelhardt, D. L.; Hood, L.; Lerman, L. S.; Mendelsohn, M. L.; Sinsheimer, R. L.; Smith, T.; Soll, D.; Stormo, G.; White, R. L.

    1987-04-01

    The report urges DOE and the Nation to commit to a large, multi-year, multidisciplinary, technological undertaking to order and sequence the human genome. This effort will first require significant innovation in general capability to manipulate DNA, major new analytical methods for ordering and sequencing, theoretical developments in computer science and mathematical biology, and great expansions in our ability to store and manipulate the information and to interface it with other large and diverse genetic databases. The actual ordering and sequencing involves the coordinated processing of some 3 billion bases from a reference human genome. Science is poised on the rudimentary edge of being able to read and understand human genes. A concerted, broadly based, scientific effort to provide new methods of sufficient power and scale should transform this activity from an inefficient one-gene-at-a-time, single laboratory effort into a coordinated, worldwide, comprehensive reading of "the book of man". The effort will be extraordinary in scope and magnitude, but so will be the benefit to biological understanding, new technology and the diagnosis and treatment of human disease.

  19. A Dual-Mode Large-Arrayed CMOS ISFET Sensor for Accurate and High-Throughput pH Sensing in Biomedical Diagnosis.

    PubMed

    Huang, Xiwei; Yu, Hao; Liu, Xu; Jiang, Yu; Yan, Mei; Wu, Dongping

    2015-09-01

    The existing ISFET-based DNA sequencing detects hydrogen ions released during the polymerization of DNA strands on microbeads, which are scattered into microwell array above the ISFET sensor with unknown distribution. However, false pH detection happens at empty microwells due to crosstalk from neighboring microbeads. In this paper, a dual-mode CMOS ISFET sensor is proposed to have accurate pH detection toward DNA sequencing. Dual-mode sensing, optical and chemical modes, is realized by integrating a CMOS image sensor (CIS) with ISFET pH sensor, and is fabricated in a standard 0.18-μm CIS process. With accurate determination of microbead physical locations with CIS pixel by contact imaging, the dual-mode sensor can correlate local pH for one DNA slice at one location-determined microbead, which can result in improved pH detection accuracy. Moreover, toward a high-throughput DNA sequencing, a correlated-double-sampling readout that supports large array for both modes is deployed to reduce pixel-to-pixel nonuniformity such as threshold voltage mismatch. The proposed CMOS dual-mode sensor is experimentally examined to show a well correlated pH map and optical image for microbeads with a pH sensitivity of 26.2 mV/pH, a fixed pattern noise (FPN) reduction from 4% to 0.3%, and a readout speed of 1200 frames/s. A dual-mode CMOS ISFET sensor with suppressed FPN for accurate large-arrayed pH sensing is proposed and demonstrated with state-of-the-art measured results toward accurate and high-throughput DNA sequencing. The developed dual-mode CMOS ISFET sensor has great potential for future personal genome diagnostics with high accuracy and low cost.

  20. Biological nanopore MspA for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Manrao, Elizabeth A.

    Unlocking the information hidden in the human genome provides insight into the inner workings of complex biological systems and can be used to greatly improve health-care. In order to allow for widespread sequencing, new technologies are required that provide fast and inexpensive readings of DNA. Nanopore sequencing is a third generation DNA sequencing technology that is currently being developed to fulfill this need. In nanopore sequencing, a voltage is applied across a small pore in an electrolyte solution and the resulting ionic current is recorded. When DNA passes through the channel, the ionic current is partially blocked. If the DNA bases uniquely modulate the ionic current flowing through the channel, the time trace of the current can be related to the sequence of DNA passing through the pore. There are two main challenges to realizing nanopore sequencing: identifying a pore with sensitivity to single nucleotides and controlling the translocation of DNA through the pore so that the small single nucleotide current signatures are distinguishable from background noise. In this dissertation, I explore the use of Mycobacterium smegmatis porin A (MspA) for nanopore sequencing. In order to determine MspA's sensitivity to single nucleotides, DNA strands of various compositions are held in the pore as the resulting ionic current is measured. DNA is immobilized in MspA by attaching it to a large molecule which acts as an anchor. This technique confirms the single nucleotide resolution of the pore and additionally shows that MspA is sensitive to epigenetic modifications and single nucleotide polymorphisms. The forces from the electric field within MspA, the effective charge of nucleotides, and elasticity of DNA are estimated using a Freely Jointed Chain model of single stranded DNA. These results offer insight into the interactions of DNA within the pore. With the nucleotide sensitivity of MspA confirmed, a method is introduced to controllably pass DNA through the pore. Using a DNA polymerase, DNA strands are stepped through MspA one nucleotide at a time. The steps are observable as distinct levels on the ionic-current time-trace and are related to the DNA sequence. These experiments overcome the two fundamental challenges to realizing MspA nanopore sequencing and pave the way to the development of a commercial technology.

  1. DNA barcodes for two scale insect families, mealybugs (Hemiptera: Pseudococcidae) and armored scales (Hemiptera: Diaspididae).

    PubMed

    Park, D-S; Suh, S-J; Hebert, P D N; Oh, H-W; Hong, K-J

    2011-08-01

    Although DNA barcode coverage has grown rapidly for many insect orders, there are some groups, such as scale insects, where sequence recovery has been difficult. However, using a recently developed primer set, we recovered barcode records from 373 specimens, providing coverage for 75 species from 31 genera in two families. Overall success was >90% for mealybugs and >80% for armored scale species. The G·C content was very low in most species, averaging just 16.3%. Sequence divergences (K2P) between congeneric species averaged 10.7%, while intra-specific divergences averaged 0.97%. However, the latter value was inflated by high intra-specific divergence in nine taxa, cases that may indicate species overlooked by current taxonomic treatments. Our study establishes the feasibility of developing a comprehensive barcode library for scale insects and indicates that its construction will both create an effective system for identifying scale insects and reveal taxonomic situations worthy of deeper analysis.

  2. Evaluating the feasibility of using candidate DNA barcodes in discriminating species of the large Asteraceae family

    PubMed Central

    2010-01-01

    Background Five DNA regions, namely, rbcL, matK, ITS, ITS2, and psbA-trnH, have been recommended as primary DNA barcodes for plants. Studies evaluating these regions for species identification in the large plant taxon, which includes a large number of closely related species, have rarely been reported. Results The feasibility of using the five proposed DNA regions was tested for discriminating plant species within Asteraceae, the largest family of flowering plants. Among these markers, ITS2 was the most useful in terms of universality, sequence variation, and identification capability in the Asteraceae family. The species discriminating power of ITS2 was also explored in a large pool of 3,490 Asteraceae sequences that represent 2,315 species belonging to 494 different genera. The result shows that ITS2 correctly identified 76.4% and 97.4% of plant samples at the species and genus levels, respectively. In addition, ITS2 displayed a variable ability to discriminate related species within different genera. Conclusions ITS2 is the best DNA barcode for the Asteraceae family. This approach significantly broadens the application of DNA barcoding to resolve classification problems in the family Asteraceae at the genera and species levels. PMID:20977734

  3. Impact of Lateral Transfers on the Genomes of Lepidoptera

    PubMed Central

    Drezen, Jean-Michel; Josse, Thibaut; Bézier, Annie; Gauthier, Jérémy; Huguet, Elisabeth

    2017-01-01

    Transfer of DNA sequences between species regardless of their evolutionary distance is very common in bacteria, but evidence that horizontal gene transfer (HGT) also occurs in multicellular organisms has been accumulating in the past few years. The actual extent of this phenomenon is underestimated due to frequent sequence filtering of “alien” DNA before genome assembly. However, recent studies based on genome sequencing have revealed, and experimentally verified, the presence of foreign DNA sequences in the genetic material of several species of Lepidoptera. Large DNA viruses, such as baculoviruses and the symbiotic viruses of parasitic wasps (bracoviruses), have the potential to mediate these transfers in Lepidoptera. In particular, using ultra-deep sequencing, newly integrated transposons have been identified within baculovirus genomes. Bacterial genes have also been acquired by genomes of Lepidoptera, as in other insects and nematodes. In addition, insertions of bracovirus sequences were present in the genomes of certain moth and butterfly lineages, that were likely corresponding to rearrangements of ancient integrations. The viral genes present in these sequences, sometimes of hymenopteran origin, have been co-opted by lepidopteran species to confer some protection against pathogens. PMID:29120392

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    McInerney, Peter; Adams, Paul; Hadi, Masood Z.

    As larger-scale cloning projects become more prevalent, there is an increasing need for comparisons among high fidelity DNA polymerases used for PCR amplification. All polymerases marketed for PCR applications are tested for fidelity properties (i.e., error rate determination) by vendors, and numerous literature reports have addressed PCR enzyme fidelity. Nonetheless, it is often difficult to make direct comparisons among different enzymes due to numerous methodological and analytical differences from study to study. We have measured the error rates for 6 DNA polymerases commonly used in PCR applications, including 3 polymerases typically used for cloning applications requiring high fidelity. Error ratemore » measurement values reported here were obtained by direct sequencing of cloned PCR products. The strategy employed here allows interrogation of error rate across a very large DNA sequence space, since 94 unique DNA targets were used as templates for PCR cloning. The six enzymes included in the study, Taq polymerase, AccuPrime-Taq High Fidelity, KOD Hot Start, cloned Pfu polymerase, Phusion Hot Start, and Pwo polymerase, we find the lowest error rates with Pfu , Phusion, and Pwo polymerases. Error rates are comparable for these 3 enzymes and are >10x lower than the error rate observed with Taq polymerase. Mutation spectra are reported, with the 3 high fidelity enzymes displaying broadly similar types of mutations. For these enzymes, transition mutations predominate, with little bias observed for type of transition.« less

  5. Precise and selective sensing of DNA-DNA hybridization by graphene/Si-nanowires diode-type biosensors.

    PubMed

    Kim, Jungkil; Park, Shin-Young; Kim, Sung; Lee, Dae Hun; Kim, Ju Hwan; Kim, Jong Min; Kang, Hee; Han, Joong-Soo; Park, Jun Woo; Lee, Hosun; Choi, Suk-Ho

    2016-08-18

    Single-Si-nanowire (NW)-based DNA sensors have been recently developed, but their sensitivity is very limited because of high noise signals, originating from small source-drain current of the single Si NW. Here, we demonstrate that chemical-vapor-deposition-grown large-scale graphene/surface-modified vertical-Si-NW-arrays junctions can be utilized as diode-type biosensors for highly-sensitive and -selective detection of specific oligonucleotides. For this, a twenty-seven-base-long synthetic oligonucleotide, which is a fragment of human DENND2D promoter sequence, is first decorated as a probe on the surface of vertical Si-NW arrays, and then the complementary oligonucleotide is hybridized to the probe. This hybridization gives rise to a doping effect on the surface of Si NWs, resulting in the increase of the current in the biosensor. The current of the biosensor increases from 19 to 120% as the concentration of the target DNA varies from 0.1 to 500 nM. In contrast, such biosensing does not come into play by the use of the oligonucleotide with incompatible or mismatched sequences. Similar results are observed from photoluminescence microscopic images and spectra. The biosensors show very-uniform current changes with standard deviations ranging ~1 to ~10% by ten-times endurance tests. These results are very promising for their applications in accurate, selective, and stable biosensing.

  6. Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux

    PubMed Central

    Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.

    2012-01-01

    We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ∼20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology. PMID:22848480

  7. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases.

    PubMed

    Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric

    2014-04-15

    Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

  8. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases

    PubMed Central

    2014-01-01

    Background Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. Results We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. Conclusions We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data. PMID:24735413

  9. A DNA-based pattern classifier with in vitro learning and associative recall for genomic characterization and biosensing without explicit sequence knowledge.

    PubMed

    Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo

    2014-01-01

    Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could access information from all organisms in a biological system without explicit genomic information. The Memory protocol has high potential for many applications, including in situ biomonitoring of ecosystems, screening for diseases, biosensing of pathological features in water and food supplies, and non-biological information processing of memory devices, among many.

  10. DNA G-Wire Formation Using an Artificial Peptide is Controlled by Protease Activity.

    PubMed

    Usui, Kenji; Okada, Arisa; Sakashita, Shungo; Shimooka, Masayuki; Tsuruoka, Takaaki; Nakano, Shu-Ichi; Miyoshi, Daisuke; Mashima, Tsukasa; Katahira, Masato; Hamada, Yoshio

    2017-11-16

    The development of a switching system for guanine nanowire (G-wire) formation by external signals is important for nanobiotechnological applications. Here, we demonstrate a DNA nanostructural switch (G-wire <--> particles) using a designed peptide and a protease. The peptide consists of a PNA sequence for inducing DNA to form DNA-PNA hybrid G-quadruplex structures, and a protease substrate sequence acting as a switching module that is dependent on the activity of a particular protease. Micro-scale analyses via TEM and AFM showed that G-rich DNA alone forms G-wires in the presence of Ca 2+ , and that the peptide disrupted this formation, resulting in the formation of particles. The addition of the protease and digestion of the peptide regenerated the G-wires. Macro-scale analyses by DLS, zeta potential, CD, and gel filtration were in agreement with the microscopic observations. These results imply that the secondary structure change (DNA G-quadruplex <--> DNA/PNA hybrid structure) induces a change in the well-formed nanostructure (G-wire <--> particles). Our findings demonstrate a control system for forming DNA G-wire structures dependent on protease activity using designed peptides. Such systems hold promise for regulating the formation of nanowire for various applications, including electronic circuits for use in nanobiotechnologies.

  11. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.

    PubMed

    Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J

    2011-03-07

    Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  12. Accurate, high-throughput typing of copy number variation using paralogue ratios from dispersed repeats

    PubMed Central

    Armour, John A. L.; Palla, Raquel; Zeeuwen, Patrick L. J. M.; den Heijer, Martin; Schalkwijk, Joost; Hollox, Edward J.

    2007-01-01

    Recent work has demonstrated an unexpected prevalence of copy number variation in the human genome, and has highlighted the part this variation may play in predisposition to common phenotypes. Some important genes vary in number over a high range (e.g. DEFB4, which commonly varies between two and seven copies), and have posed formidable technical challenges for accurate copy number typing, so that there are no simple, cheap, high-throughput approaches suitable for large-scale screening. We have developed a simple comparative PCR method based on dispersed repeat sequences, using a single pair of precisely designed primers to amplify products simultaneously from both test and reference loci, which are subsequently distinguished and quantified via internal sequence differences. We have validated the method for the measurement of copy number at DEFB4 by comparison of results from >800 DNA samples with copy number measurements by MAPH/REDVR, MLPA and array-CGH. The new Paralogue Ratio Test (PRT) method can require as little as 10 ng genomic DNA, appears to be comparable in accuracy to the other methods, and for the first time provides a rapid, simple and inexpensive method for copy number analysis, suitable for application to typing thousands of samples in large case-control association studies. PMID:17175532

  13. Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.

    PubMed

    Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal

    2002-12-15

    We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.

  14. Large-scale mitochondrial DNA analysis in Southeast Asia reveals evolutionary effects of cultural isolation in the multi-ethnic population of Myanmar.

    PubMed

    Summerer, Monika; Horst, Jürgen; Erhart, Gertraud; Weißensteiner, Hansi; Schönherr, Sebastian; Pacher, Dominic; Forer, Lukas; Horst, David; Manhart, Angelika; Horst, Basil; Sanguansermsri, Torpong; Kloss-Brandstätter, Anita

    2014-01-28

    Myanmar is the largest country in mainland Southeast Asia with a population of 55 million people subdivided into more than 100 ethnic groups. Ruled by changing kingdoms and dynasties and lying on the trade route between India and China, Myanmar was influenced by numerous cultures. Since its independence from British occupation, tensions between the ruling Bamar and ethnic minorities increased. Our aim was to search for genetic footprints of Myanmar's geographic, historic and sociocultural characteristics and to contribute to the picture of human colonization by describing and dating of new mitochondrial DNA (mtDNA) haplogroups. Therefore, we sequenced the mtDNA control region of 327 unrelated donors and the complete mitochondrial genome of 44 selected individuals according to highest quality standards. Phylogenetic analyses of the entire mtDNA genomes uncovered eight new haplogroups and three unclassified basal M-lineages. The multi-ethnic population and the complex history of Myanmar were reflected in its mtDNA heterogeneity. Population genetic analyses of Burmese control region sequences combined with population data from neighboring countries revealed that the Myanmar haplogroup distribution showed a typical Southeast Asian pattern, but also Northeast Asian and Indian influences. The population structure of the extraordinarily diverse Bamar differed from that of the Karen people who displayed signs of genetic isolation. Migration analyses indicated a considerable genetic exchange with an overall positive migration balance from Myanmar to neighboring countries. Age estimates of the newly described haplogroups point to the existence of evolutionary windows where climatic and cultural changes gave rise to mitochondrial haplogroup diversification in Asia.

  15. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia

    PubMed Central

    Shortt, Jonathan A.; Card, Daren C.; Schield, Drew R.; Liu, Yang; Zhong, Bo; Castoe, Todd A.

    2017-01-01

    Background In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. Methodology/Principal Findings We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. Conclusions/Significance This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other parasitic helminthes. PMID:28107347

  16. Ammonia-oxidizing bacteria dominate ammonia oxidation in a full-scale wastewater treatment plant revealed by DNA-based stable isotope probing.

    PubMed

    Pan, Kai-Ling; Gao, Jing-Feng; Li, Hong-Yu; Fan, Xiao-Yan; Li, Ding-Chang; Jiang, Hao

    2018-05-01

    A full-scale wastewater treatment plant (WWTP) with three separate treatment processes was selected to investigate the effects of seasonality and treatment process on the community structures of ammonia-oxidizing archaea (AOA) and bacteria (AOB). And then DNA-based stable isotope probing (DNA-SIP) was applied to explore the active ammonia oxidizers. The results of high-throughput sequencing indicated that treatment processes varied AOB communities rather than AOA communities. AOA slightly outnumbered AOB in most of the samples, whose abundance was significantly correlated with temperature. DNA-SIP results showed that the majority of AOB amoA gene was labeled by 13 C-substrate, while just a small amount of AOA amoA gene was labeled. As revealed by high-throughput sequencing of heavy DNA, Nitrosomonadaceae-like AOB, Nitrosomonas sp. NP1, Nitrosomonas oligotropha and Nitrosomonas marina were the active AOB, and Nitrososphaera viennensis dominated the active AOA. The results indicated that AOB, not AOA, dominated active ammonia oxidation in the test WWTP. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. Template-Directed Copolymerization, Random Walks along Disordered Tracks, and Fractals

    NASA Astrophysics Data System (ADS)

    Gaspard, Pierre

    2016-12-01

    In biology, template-directed copolymerization is the fundamental mechanism responsible for the synthesis of DNA, RNA, and proteins. More than 50 years have passed since the discovery of DNA structure and its role in coding genetic information. Yet, the kinetics and thermodynamics of information processing in DNA replication, transcription, and translation remain poorly understood. Challenging issues are the facts that DNA or RNA sequences constitute disordered media for the motion of polymerases or ribosomes while errors occur in copying the template. Here, it is shown that these issues can be addressed and sequence heterogeneity effects can be quantitatively understood within a framework revealing universal aspects of information processing at the molecular scale. In steady growth regimes, the local velocities of polymerases or ribosomes along the template are distributed as the continuous or fractal invariant set of a so-called iterated function system, which determines the copying error probabilities. The growth may become sublinear in time with a scaling exponent that can also be deduced from the iterated function system.

  18. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    PubMed

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  19. Organization of 'nanocrystal molecules' using DNA

    NASA Astrophysics Data System (ADS)

    Alivisatos, A. Paul; Johnsson, Kai P.; Peng, Xiaogang; Wilson, Troy E.; Loweth, Colin J.; Bruchez, Marcel P.; Schultz, Peter G.

    1996-08-01

    PATTERNING matter on the nanometre scale is an important objective of current materials chemistry and physics. It is driven by both the need to further miniaturize electronic components and the fact that at the nanometre scale, materials properties are strongly size-dependent and thus can be tuned sensitively1. In nanoscale crystals, quantum size effects and the large number of surface atoms influence the, chemical, electronic, magnetic and optical behaviour2-4. 'Top-down' (for example, lithographic) methods for nanoscale manipulation reach only to the upper end of the nanometre regime5; but whereas 'bottom-up' wet chemical techniques allow for the preparation of mono-disperse, defect-free crystallites just 1-10 nm in size6-10, ways to control the structure of nanocrystal assemblies are scarce. Here we describe a strategy for the synthesis of'nanocrystal molecules', in which discrete numbers of gold nanocrystals are organized into spatially defined structures based on Watson-Crick base-pairing interactions. We attach single-stranded DNA oligonucleotides of defined length and sequence to individual nanocrystals, and these assemble into dimers and trimers on addition of a complementary single-stranded DNA template. We anticipate that this approach should allow the construction of more complex two-and three-dimensional assemblies.

  20. Ubiquitous and gene-specific regulatory 5' sequences in a sea urchin histone DNA clone coding for histone protein variants.

    PubMed Central

    Busslinger, M; Portmann, R; Irminger, J C; Birnstiel, M L

    1980-01-01

    The DNA sequences of the entire structural H4, H3, H2A and H2B genes and of their 5' flanking regions have been determined in the histone DNA clone h19 of the sea urchin Psammechinus miliaris. In clone h19 the polarity of transcription and the relative arrangement of the histone genes is identical to that in clone h22 of the same species. The histone proteins encoded by h19 DNA differ in their primary structure from those encoded by clone h22 and have been compared to histone protein sequences of other sea urchin species as well as other eukaryotes. A comparative analysis of the 5' flanking DNA sequences of the structural histone genes in both clones revealed four ubiquitous sequence motifs; a pentameric element GATCC, followed at short distance by the Hogness box GTATAAATAG, a conserved sequence PyCATTCPu, in or near which the 5' ends of the mRNAs map in h22 DNA and lastly a sequence A, containing the initiation codon. These sequences are also found, sometimes in modified version, in front of other eukaryotic genes transcribed by polymerase II. When prelude sequences of isocoding histone genes in clone h19 and h22 are compared areas of homology are seen to extend beyond the ubiquitous sequence motifs towards the divergent AT-rich spacer and terminate between approximately 140 and 240 nucleotides away from the structural gene. These prelude regions contain quite large conservative sequence blocks which are specific for each type of histone genes. Images PMID:7443547

Top