Sample records for reference dna sequence

  1. The case for the continuing use of the revised Cambridge Reference Sequence (rCRS) and the standardization of notation in human mitochondrial DNA studies.

    PubMed

    Bandelt, Hans-Jürgen; Kloss-Brandstätter, Anita; Richards, Martin B; Yao, Yong-Gang; Logan, Ian

    2014-02-01

    Since the determination in 1981 of the sequence of the human mitochondrial DNA (mtDNA) genome, the Cambridge Reference Sequence (CRS), has been used as the reference sequence to annotate mtDNA in molecular anthropology, forensic science and medical genetics. The CRS was eventually upgraded to the revised version (rCRS) in 1999. This reference sequence is a convenient device for recording mtDNA variation, although it has often been misunderstood as a wild-type (WT) or consensus sequence by medical geneticists. Recently, there has been a proposal to replace the rCRS with the so-called Reconstructed Sapiens Reference Sequence (RSRS). Even if it had been estimated accurately, the RSRS would be a cumbersome substitute for the rCRS, as the new proposal fuses--and thus confuses--the two distinct concepts of ancestral lineage and reference point for human mtDNA. Instead, we prefer to maintain the rCRS and to report mtDNA profiles by employing the hitherto predominant circumfix style. Tree diagrams could display mutations by using either the profile notation (in conventional short forms where appropriate) or in a root-upwards way with two suffixes indicating ancestral and derived nucleotides. This would guard against misunderstandings about reporting mtDNA variation. It is therefore neither necessary nor sensible to change the present reference sequence, the rCRS, in any way. The proposed switch to RSRS would inevitably lead to notational chaos, mistakes and misinterpretations.

  2. Development of a reference material of a single DNA molecule for the quality control of PCR testing.

    PubMed

    Mano, Junichi; Hatano, Shuko; Futo, Satoshi; Yoshii, Junji; Nakae, Hiroki; Naito, Shigehiro; Takabatake, Reona; Kitta, Kazumi

    2014-09-02

    We developed a reference material of a single DNA molecule with a specific nucleotide sequence. The double-strand linear DNA which has PCR target sequences at the both ends was prepared as a reference DNA molecule, and we named the PCR targets on each side as confirmation sequence and standard sequence. The highly diluted solution of the reference molecule was dispensed into 96 wells of a plastic PCR plate to make the average number of molecules in a well below one. Subsequently, the presence or absence of the reference molecule in each well was checked by real-time PCR targeting for the confirmation sequence. After an enzymatic treatment of the reaction mixture in the positive wells for the digestion of PCR products, the resultant solution was used as the reference material of a single DNA molecule with the standard sequence. PCR analyses revealed that the prepared samples included only one reference molecule with high probability. The single-molecule reference material developed in this study will be useful for the absolute evaluation of a detection limit of PCR-based testing methods, the quality control of PCR analyses, performance evaluations of PCR reagents and instruments, and the preparation of an accurate calibration curve for real-time PCR quantitation.

  3. Next-Generation Sequencing Platforms

    NASA Astrophysics Data System (ADS)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  4. DNA Sequences over the Internet Provide Greater Speed and Accuracy for Health Sciences Reference Librarians.

    ERIC Educational Resources Information Center

    Harzbecker, Joseph, Jr.

    1993-01-01

    Describes the National Institute of Health's GenBank DNA sequence database and how it can be accessed through the Internet. A real reference question, which was answered successfully using the database, is reproduced to illustrate and elaborate on the potential of the Internet for information retrieval. (10 references) (KRN)

  5. Accelerating plant DNA barcode reference library construction using herbarium specimens: improved experimental techniques.

    PubMed

    Xu, Chao; Dong, Wenpan; Shi, Shuo; Cheng, Tao; Li, Changhao; Liu, Yanlei; Wu, Ping; Wu, Hongkun; Gao, Peng; Zhou, Shiliang

    2015-11-01

    A well-covered reference library is crucial for successful identification of species by DNA barcoding. The biggest difficulty in building such a reference library is the lack of materials of organisms. Herbarium collections are potentially an enormous resource of materials. In this study, we demonstrate that it is likely to build such reference libraries using the reconstructed (self-primed PCR amplified) DNA from the herbarium specimens. We used 179 rosaceous specimens to test the effects of DNA reconstruction, 420 randomly sampled specimens to estimate the usable percentage and another 223 specimens of true cherries (Cerasus, Rosaceae) to test the coverage of usable specimens to the species. The barcode rbcLb (the central four-sevenths of rbcL gene) and matK was each amplified in two halves and sequenced on Roche GS 454 FLX+. DNA from the herbarium specimens was typically shorter than 300 bp. DNA reconstruction enabled amplification fragments of 400-500 bp without bringing or inducing any sequence errors. About one-third of specimens in the national herbarium of China (PE) were proven usable after DNA reconstruction. The specimens in PE cover all Chinese true cherry species and 91.5% of vascular species listed in Flora of China. It is very possible to build well-covered reference libraries for DNA barcoding of vascular species in China. As exemplified in this study, DNA reconstruction and DNA-labelled next-generation sequencing can accelerate the construction of local reference libraries. By putting the local reference libraries together, a global library for DNA barcoding becomes closer to reality. © 2015 John Wiley & Sons Ltd.

  6. Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics.

    PubMed

    Straub, Shannon C K; Parks, Matthew; Weitemier, Kevin; Fishbein, Mark; Cronn, Richard C; Liston, Aaron

    2012-02-01

    Just as Sanger sequencing did more than 20 years ago, next-generation sequencing (NGS) is poised to revolutionize plant systematics. By combining multiplexing approaches with NGS throughput, systematists may no longer need to choose between more taxa or more characters. Here we describe a genome skimming (shallow sequencing) approach for plant systematics. Through simulations, we evaluated optimal sequencing depth and performance of single-end and paired-end short read sequences for assembly of nuclear ribosomal DNA (rDNA) and plastomes and addressed the effect of divergence on reference-guided plastome assembly. We also used simulations to identify potential phylogenetic markers from low-copy nuclear loci at different sequencing depths. We demonstrated the utility of genome skimming through phylogenetic analysis of the Sonoran Desert clade (SDC) of Asclepias (Apocynaceae). Paired-end reads performed better than single-end reads. Minimum sequencing depths for high quality rDNA and plastome assemblies were 40× and 30×, respectively. Divergence from the reference significantly affected plastome assembly, but relatively similar references are available for most seed plants. Deeper rDNA sequencing is necessary to characterize intragenomic polymorphism. The low-copy fraction of the nuclear genome was readily surveyed, even at low sequencing depths. Nearly 160000 bp of sequence from three organelles provided evidence of phylogenetic incongruence in the SDC. Adoption of NGS will facilitate progress in plant systematics, as whole plastome and rDNA cistrons, partial mitochondrial genomes, and low-copy nuclear markers can now be efficiently obtained for molecular phylogenetics studies.

  7. Comprehensive red blood cell and platelet antigen prediction from whole genome sequencing: proof of principle

    PubMed Central

    Westhoff, Connie M.; Uy, Jon Michael; Aguad, Maria; Smeland‐Wagman, Robin; Kaufman, Richard M.; Rehm, Heidi L.; Green, Robert C.; Silberstein, Leslie E.

    2015-01-01

    BACKGROUND There are 346 serologically defined red blood cell (RBC) antigens and 33 serologically defined platelet (PLT) antigens, most of which have known genetic changes in 45 RBC or six PLT genes that correlate with antigen expression. Polymorphic sites associated with antigen expression in the primary literature and reference databases are annotated according to nucleotide positions in cDNA. This makes antigen prediction from next‐generation sequencing data challenging, since it uses genomic coordinates. STUDY DESIGN AND METHODS The conventional cDNA reference sequences for all known RBC and PLT genes that correlate with antigen expression were aligned to the human reference genome. The alignments allowed conversion of conventional cDNA nucleotide positions to the corresponding genomic coordinates. RBC and PLT antigen prediction was then performed using the human reference genome and whole genome sequencing (WGS) data with serologic confirmation. RESULTS Some major differences and alignment issues were found when attempting to convert the conventional cDNA to human reference genome sequences for the following genes: ABO, A4GALT, RHD, RHCE, FUT3, ACKR1 (previously DARC), ACHE, FUT2, CR1, GCNT2, and RHAG. However, it was possible to create usable alignments, which facilitated the prediction of all RBC and PLT antigens with a known molecular basis from WGS data. Traditional serologic typing for 18 RBC antigens were in agreement with the WGS‐based antigen predictions, providing proof of principle for this approach. CONCLUSION Detailed mapping of conventional cDNA annotated RBC and PLT alleles can enable accurate prediction of RBC and PLT antigens from whole genomic sequencing data. PMID:26634332

  8. Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi, and land plants.

    PubMed

    Tanabe, Akifumi S; Toju, Hirokazu

    2013-01-01

    Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark for bacterial/archeal 16S, animal COX1, fungal internal transcribed spacer, and three plant chloroplast (rbcL, matK, and trnH-psbA) barcode loci that can be used to compare the performance of existing and new methods. The benchmark was performed under two alternative situations: query sequences were available in the corresponding reference sequence databases in one, but were not available in the other. In the former situation, the commonly used "1-nearest-neighbor" (1-NN) method, which assigns the taxonomic information of the most similar sequences in a reference database (i.e., BLAST-top-hit reference sequence) to a query, displays the highest rate and highest precision of successful taxonomic identification. However, in the latter situation, the 1-NN method produced extremely high rates of misidentification for all the barcode loci examined. In contrast, one of our new methods, the query-centric auto-k-nearest-neighbor (QCauto) method, consistently produced low rates of misidentification for all the loci examined in both situations. These results indicate that the 1-NN method is most suitable if the reference sequences of all potentially observable species are available in databases; otherwise, the QCauto method returns the most reliable identification results. The benchmark results also indicated that the taxon coverage of reference sequences is far from complete for genus or species level identification in all the barcode loci examined. Therefore, we need to accelerate the registration of reference barcode sequences to apply high-throughput DNA barcoding to genus or species level identification in biodiversity research.

  9. Two New Computational Methods for Universal DNA Barcoding: A Benchmark Using Barcode Sequences of Bacteria, Archaea, Animals, Fungi, and Land Plants

    PubMed Central

    Tanabe, Akifumi S.; Toju, Hirokazu

    2013-01-01

    Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark for bacterial/archeal 16S, animal COX1, fungal internal transcribed spacer, and three plant chloroplast (rbcL, matK, and trnH-psbA) barcode loci that can be used to compare the performance of existing and new methods. The benchmark was performed under two alternative situations: query sequences were available in the corresponding reference sequence databases in one, but were not available in the other. In the former situation, the commonly used “1-nearest-neighbor” (1-NN) method, which assigns the taxonomic information of the most similar sequences in a reference database (i.e., BLAST-top-hit reference sequence) to a query, displays the highest rate and highest precision of successful taxonomic identification. However, in the latter situation, the 1-NN method produced extremely high rates of misidentification for all the barcode loci examined. In contrast, one of our new methods, the query-centric auto-k-nearest-neighbor (QCauto) method, consistently produced low rates of misidentification for all the loci examined in both situations. These results indicate that the 1-NN method is most suitable if the reference sequences of all potentially observable species are available in databases; otherwise, the QCauto method returns the most reliable identification results. The benchmark results also indicated that the taxon coverage of reference sequences is far from complete for genus or species level identification in all the barcode loci examined. Therefore, we need to accelerate the registration of reference barcode sequences to apply high-throughput DNA barcoding to genus or species level identification in biodiversity research. PMID:24204702

  10. Basic quantitative polymerase chain reaction using real-time fluorescence measurements.

    PubMed

    Ares, Manuel

    2014-10-01

    This protocol uses quantitative polymerase chain reaction (qPCR) to measure the number of DNA molecules containing a specific contiguous sequence in a sample of interest (e.g., genomic DNA or cDNA generated by reverse transcription). The sample is subjected to fluorescence-based PCR amplification and, theoretically, during each cycle, two new duplex DNA molecules are produced for each duplex DNA molecule present in the sample. The progress of the reaction during PCR is evaluated by measuring the fluorescence of dsDNA-dye complexes in real time. In the early cycles, DNA duplication is not detected because inadequate amounts of DNA are made. At a certain threshold cycle, DNA-dye complexes double each cycle for 8-10 cycles, until the DNA concentration becomes so high and the primer concentration so low that the reassociation of the product strands blocks efficient synthesis of new DNA and the reaction plateaus. There are two types of measurements: (1) the relative change of the target sequence compared to a reference sequence and (2) the determination of molecule number in the starting sample. The first requires a reference sequence, and the second requires a sample of the target sequence with known numbers of the molecules of sequence to generate a standard curve. By identifying the threshold cycle at which a sample first begins to accumulate DNA-dye complexes exponentially, an estimation of the numbers of starting molecules in the sample can be extrapolated. © 2014 Cold Spring Harbor Laboratory Press.

  11. High-quality mtDNA control region sequences from 680 individuals sampled across the Netherlands to establish a national forensic mtDNA reference database.

    PubMed

    Chaitanya, Lakshmi; van Oven, Mannis; Brauer, Silke; Zimmermann, Bettina; Huber, Gabriela; Xavier, Catarina; Parson, Walther; de Knijff, Peter; Kayser, Manfred

    2016-03-01

    The use of mitochondrial DNA (mtDNA) for maternal lineage identification often marks the last resort when investigating forensic and missing-person cases involving highly degraded biological materials. As with all comparative DNA testing, a match between evidence and reference sample requires a statistical interpretation, for which high-quality mtDNA population frequency data are crucial. Here, we determined, under high quality standards, the complete mtDNA control-region sequences of 680 individuals from across the Netherlands sampled at 54 sites, covering the entire country with 10 geographic sub-regions. The complete mtDNA control region (nucleotide positions 16,024-16,569 and 1-576) was amplified with two PCR primers and sequenced with ten different sequencing primers using the EMPOP protocol. Haplotype diversity of the entire sample set was very high at 99.63% and, accordingly, the random-match probability was 0.37%. No population substructure within the Netherlands was detected with our dataset. Phylogenetic analyses were performed to determine mtDNA haplogroups. Inclusion of these high-quality data in the EMPOP database (accession number: EMP00666) will improve its overall data content and geographic coverage in the interest of all EMPOP users worldwide. Moreover, this dataset will serve as (the start of) a national reference database for mtDNA applications in forensic and missing person casework in the Netherlands. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  12. BOKP: A DNA Barcode Reference Library for Monitoring Herbal Drugs in the Korean Pharmacopeia

    PubMed Central

    Liu, Jinxin; Shi, Linchun; Song, Jingyuan; Sun, Wei; Han, Jianping; Liu, Xia; Hou, Dianyun; Yao, Hui; Li, Mingyue; Chen, Shilin

    2017-01-01

    Herbal drug authentication is an important task in traditional medicine; however, it is challenged by the limitations of traditional authentication methods and the lack of trained experts. DNA barcoding is conspicuous in almost all areas of the biological sciences and has already been added to the British pharmacopeia and Chinese pharmacopeia for routine herbal drug authentication. However, DNA barcoding for the Korean pharmacopeia still requires significant improvements. Here, we present a DNA barcode reference library for herbal drugs in the Korean pharmacopeia and developed a species identification engine named KP-IDE to facilitate the adoption of this DNA reference library for the herbal drug authentication. Using taxonomy records, specimen records, sequence records, and reference records, KP-IDE can identify an unknown specimen. Currently, there are 6,777 taxonomy records, 1,054 specimen records, 30,744 sequence records (ITS2 and psbA-trnH) and 285 reference records. Moreover, 27 herbal drug materials were collected from the Seoul Yangnyeongsi herbal medicine market to give an example for real herbal drugs authentications. Our study demonstrates the prospects of the DNA barcode reference library for the Korean pharmacopeia and provides future directions for the use of DNA barcoding for authenticating herbal drugs listed in other modern pharmacopeias. PMID:29326593

  13. Classification of Plant Associated Bacteria Using RIF, a Computationally Derived DNA Marker

    PubMed Central

    Schneider, Kevin L.; Marrero, Glorimar; Alvarez, Anne M.; Presting, Gernot G.

    2011-01-01

    A DNA marker that distinguishes plant associated bacteria at the species level and below was derived by comparing six sequenced genomes of Xanthomonas, a genus that contains many important phytopathogens. This DNA marker comprises a portion of the dnaA replication initiation factor (RIF). Unlike the rRNA genes, dnaA is a single copy gene in the vast majority of sequenced bacterial genomes, and amplification of RIF requires genus-specific primers. In silico analysis revealed that RIF has equal or greater ability to differentiate closely related species of Xanthomonas than the widely used ribosomal intergenic spacer region (ITS). Furthermore, in a set of 263 Xanthomonas, Ralstonia and Clavibacter strains, the RIF marker was directly sequenced in both directions with a success rate approximately 16% higher than that for ITS. RIF frameworks for Xanthomonas, Ralstonia and Clavibacter were constructed using 682 reference strains representing different species, subspecies, pathovars, races, hosts and geographic regions, and contain a total of 109 different RIF sequences. RIF sequences showed subspecific groupings but did not place strains of X. campestris or X. axonopodis into currently named pathovars nor R. solanacearum strains into their respective races, confirming previous conclusions that pathovar and race designations do not necessarily reflect genetic relationships. The RIF marker also was sequenced for 24 reference strains from three genera in the Enterobacteriaceae: Pectobacterium, Pantoea and Dickeya. RIF sequences of 70 previously uncharacterized strains of Ralstonia, Clavibacter, Pectobacterium and Dickeya matched, or were similar to, those of known reference strains, illustrating the utility of the frameworks to classify bacteria below the species level and rapidly match unknown isolates to reference strains. The RIF sequence frameworks are available at the online RIF database, RIFdb, and can be queried for diagnostic purposes with RIF sequences obtained from unknown strains in both chromatogram and FASTA format. PMID:21533033

  14. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    PubMed

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  15. Unraveling systematic inventory of Echinops (Asteraceae) with special reference to nrDNA ITS sequence-based molecular typing of Echinops abuzinadianus.

    PubMed

    Ali, M A; Al-Hemaid, F M; Lee, J; Hatamleh, A A; Gyulai, G; Rahman, M O

    2015-10-02

    The present study explored the systematic inventory of Echinops L. (Asteraceae) of Saudi Arabia, with special reference to the molecular typing of Echinops abuzinadianus Chaudhary, an endemic species to Saudi Arabia, based on the internal transcribed spacer (ITS) sequences (ITS1-5.8S-ITS2) of nuclear ribosomal DNA. A sequence similarity search using BLAST and a phylogenetic analysis of the ITS sequence of E. abuzinadianus revealed a high level of sequence similarity with E. glaberrimus DC. (section Ritropsis). The novel primary sequence and the secondary structure of ITS2 of E. abuzinadianus could potentially be used for molecular genotyping.

  16. Potential for DNA-based identification of Great Lakes fauna: match and mismatch between taxa inventories and DNA barcode libraries.

    PubMed

    Trebitz, Anett S; Hoffman, Joel C; Grant, George W; Billehus, Tyler M; Pilgrim, Erik M

    2015-07-22

    DNA-based identification of mixed-organism samples offers the potential to greatly reduce the need for resource-intensive morphological identification, which would be of value both to bioassessment and non-native species monitoring. The ability to assign species identities to DNA sequences found depends on the availability of comprehensive DNA reference libraries. Here, we compile inventories for aquatic metazoans extant in or threatening to invade the Laurentian Great Lakes and examine the availability of reference mitochondrial COI DNA sequences (barcodes) in the Barcode of Life Data System for them. We found barcode libraries largely complete for extant and threatening-to-invade vertebrates (100% of reptile, 99% of fish, and 92% of amphibian species had barcodes). In contrast, barcode libraries remain poorly developed for precisely those organisms where morphological identification is most challenging; 46% of extant invertebrates lacked reference barcodes with rates especially high among rotifers, oligochaetes, and mites. Lack of species-level identification for many aquatic invertebrates also is a barrier to matching DNA sequences with physical specimens. Attaining the potential for DNA-based identification of mixed-organism samples covering the breadth of aquatic fauna requires a concerted effort to build supporting barcode libraries and voucher collections.

  17. Potential for DNA-based identification of Great Lakes fauna: match and mismatch between taxa inventories and DNA barcode libraries

    NASA Astrophysics Data System (ADS)

    Trebitz, Anett S.; Hoffman, Joel C.; Grant, George W.; Billehus, Tyler M.; Pilgrim, Erik M.

    2015-07-01

    DNA-based identification of mixed-organism samples offers the potential to greatly reduce the need for resource-intensive morphological identification, which would be of value both to bioassessment and non-native species monitoring. The ability to assign species identities to DNA sequences found depends on the availability of comprehensive DNA reference libraries. Here, we compile inventories for aquatic metazoans extant in or threatening to invade the Laurentian Great Lakes and examine the availability of reference mitochondrial COI DNA sequences (barcodes) in the Barcode of Life Data System for them. We found barcode libraries largely complete for extant and threatening-to-invade vertebrates (100% of reptile, 99% of fish, and 92% of amphibian species had barcodes). In contrast, barcode libraries remain poorly developed for precisely those organisms where morphological identification is most challenging; 46% of extant invertebrates lacked reference barcodes with rates especially high among rotifers, oligochaetes, and mites. Lack of species-level identification for many aquatic invertebrates also is a barrier to matching DNA sequences with physical specimens. Attaining the potential for DNA-based identification of mixed-organism samples covering the breadth of aquatic fauna requires a concerted effort to build supporting barcode libraries and voucher collections.

  18. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor.

    PubMed

    Kohany, Oleksiy; Gentles, Andrew J; Hankus, Lukasz; Jurka, Jerzy

    2006-10-25

    Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Updating and maintenance of the database requires specialized tools, which we have created and made available for use with Repbase, and which may be useful as a template for other curated databases. We describe the software tools RepbaseSubmitter and Censor, which are designed to facilitate updating and screening the content of Repbase. RepbaseSubmitter is a java-based interface for formatting and annotating Repbase entries. It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, thus facilitating curation of Repbase sequences. In addition, it has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position. Censor is a tool to rapidly identify repetitive elements by comparison to known repeats. It uses WU-BLAST for speed and sensitivity, and can conduct DNA-DNA, DNA-protein, or translated DNA-translated DNA searches of genomic sequence. Defragmented output includes a map of repeats present in the query sequence, with the options to report masked query sequence(s), repeat sequences found in the query, and alignments. Censor and RepbaseSubmitter are available as both web-based services and downloadable versions. They can be found at http://www.girinst.org/repbase/submission.html (RepbaseSubmitter) and http://www.girinst.org/censor/index.php (Censor).

  19. Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads

    PubMed Central

    Gautier, Laurent; Lund, Ole

    2013-01-01

    Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data where a reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment. To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients: one running in a web browser, and one as a python script. Both are able to handle a large number of sequencing reads and from portable devices (the browser-based running on a tablet), perform its task within seconds, and consume an amount of bandwidth compatible with mobile broadband networks. Such client-server approaches could develop in the future, allowing a fully automated processing of sequencing data and routine instant quality check of sequencing runs from desktop sequencers. A web access is available at http://tapir.cbs.dtu.dk. The source code for a python command-line client, a server, and supplementary data are available at http://bit.ly/1aURxkc. PMID:24391826

  20. Low-bandwidth and non-compute intensive remote identification of microbes from raw sequencing reads.

    PubMed

    Gautier, Laurent; Lund, Ole

    2013-01-01

    Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data where a reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment. To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients: one running in a web browser, and one as a python script. Both are able to handle a large number of sequencing reads and from portable devices (the browser-based running on a tablet), perform its task within seconds, and consume an amount of bandwidth compatible with mobile broadband networks. Such client-server approaches could develop in the future, allowing a fully automated processing of sequencing data and routine instant quality check of sequencing runs from desktop sequencers. A web access is available at http://tapir.cbs.dtu.dk. The source code for a python command-line client, a server, and supplementary data are available at http://bit.ly/1aURxkc.

  1. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.

    PubMed

    Decelle, Johan; Romac, Sarah; Stern, Rowena F; Bendif, El Mahdi; Zingone, Adriana; Audic, Stéphane; Guiry, Michael D; Guillou, Laure; Tessier, Désiré; Le Gall, Florence; Gourvil, Priscillia; Dos Santos, Adriana L; Probert, Ian; Vaulot, Daniel; de Vargas, Colomban; Christen, Richard

    2015-11-01

    Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic ecosystems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. To interpret environmental sequences, metabarcoding necessarily relies on taxonomically curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. A total of 1867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (http://phytoref.fr), is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing. © 2015 John Wiley & Sons Ltd.

  2. CpG PatternFinder: a Windows-based utility program for easy and rapid identification of the CpG methylation status of DNA.

    PubMed

    Xu, Yi-Hua; Manoharan, Herbert T; Pitot, Henry C

    2007-09-01

    The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.

  3. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    PubMed Central

    Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC

    2006-01-01

    Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935

  4. Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly

    PubMed Central

    Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka

    2010-01-01

    Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877

  5. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

  6. Sequence-Dependent Persistence Length of Long DNA

    NASA Astrophysics Data System (ADS)

    Chuang, Hui-Min; Reifenberger, Jeffrey G.; Cao, Han; Dorfman, Kevin D.

    2017-12-01

    Using a high-throughput genome-mapping approach, we obtained circa 50 million measurements of the extension of internal human DNA segments in a 41 nm ×41 nm nanochannel. The underlying DNA sequences, obtained by mapping to the reference human genome, are 2.5-393 kilobase pairs long and contain percent GC contents between 32.5% and 60%. Using Odijk's theory for a channel-confined wormlike chain, these data reveal that the DNA persistence length increases by almost 20% as the percent GC content increases. The increased persistence length is rationalized by a model, containing no adjustable parameters, that treats the DNA as a statistical terpolymer with a sequence-dependent intrinsic persistence length and a sequence-independent electrostatic persistence length.

  7. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    PubMed Central

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  8. Plans and progress for building a Great Lakes fauna DNA barcode reference library

    EPA Science Inventory

    DNA reference libraries provide researchers with an important tool for assessing regional biodiversity by allowing unknown genetic sequences to be assigned identities, while also providing a means for taxonomists to validate identifications. Expanding the representation of Great...

  9. Developmental validation of a Nextera XT mitogenome Illumina MiSeq sequencing method for high-quality samples.

    PubMed

    Peck, Michelle A; Sturk-Andreaggi, Kimberly; Thomas, Jacqueline T; Oliver, Robert S; Barritt-Ross, Suzanne; Marshall, Charla

    2018-05-01

    Generating mitochondrial genome (mitogenome) data from reference samples in a rapid and efficient manner is critical to harnessing the greater power of discrimination of the entire mitochondrial DNA (mtDNA) marker. The method of long-range target enrichment, Nextera XT library preparation, and Illumina sequencing on the MiSeq is a well-established technique for generating mitogenome data from high-quality samples. To this end, a validation was conducted for this mitogenome method processing up to 24 samples simultaneously along with analysis in the CLC Genomics Workbench and utilizing the AQME (AFDIL-QIAGEN mtDNA Expert) tool to generate forensic profiles. This validation followed the Federal Bureau of Investigation's Quality Assurance Standards (QAS) for forensic DNA testing laboratories and the Scientific Working Group on DNA Analysis Methods (SWGDAM) validation guidelines. The evaluation of control DNA, non-probative samples, blank controls, mixtures, and nonhuman samples demonstrated the validity of this method. Specifically, the sensitivity was established at ≥25 pg of nuclear DNA input for accurate mitogenome profile generation. Unreproducible low-level variants were observed in samples with low amplicon yields. Further, variant quality was shown to be a useful metric for identifying sequencing error and crosstalk. Success of this method was demonstrated with a variety of reference sample substrates and extract types. These studies further demonstrate the advantages of using NGS techniques by highlighting the quantitative nature of heteroplasmy detection. The results presented herein from more than 175 samples processed in ten sequencing runs, show this mitogenome sequencing method and analysis strategy to be valid for the generation of reference data. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Oligonucleotide recombination enabled site-specific mutagenesis in bacteria

    USDA-ARS?s Scientific Manuscript database

    Recombineering refers to a strategy for engineering DNA sequences using a specialized mode of homologous recombination. This technology can be used for rapidly constructing precise changes in bacterial genome sequences in vivo. Oligo recombination is one type of recombineering that uses ssDNA olig...

  11. What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual

    USDA-ARS?s Scientific Manuscript database

    BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain u...

  12. Colony-PCR Is a Rapid Method for DNA Amplification of Hyphomycetes

    PubMed Central

    Walch, Georg; Knapp, Maria; Rainer, Georg; Peintner, Ursula

    2016-01-01

    Fungal pure cultures identified with both classical morphological methods and through barcoding sequences are a basic requirement for reliable reference sequences in public databases. Improved techniques for an accelerated DNA barcode reference library construction will result in considerably improved sequence databases covering a wider taxonomic range. Fast, cheap, and reliable methods for obtaining DNA sequences from fungal isolates are, therefore, a valuable tool for the scientific community. Direct colony PCR was already successfully established for yeasts, but has not been evaluated for a wide range of anamorphic soil fungi up to now, and a direct amplification protocol for hyphomycetes without tissue pre-treatment has not been published so far. Here, we present a colony PCR technique directly from fungal hyphae without previous DNA extraction or other prior manipulation. Seven hundred eighty-eight fungal strains from 48 genera were tested with a success rate of 86%. PCR success varied considerably: DNA of fungi belonging to the genera Cladosporium, Geomyces, Fusarium, and Mortierella could be amplified with high success. DNA of soil-borne yeasts was always successfully amplified. Absidia, Mucor, Trichoderma, and Penicillium isolates had noticeably lower PCR success. PMID:29376929

  13. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.

    PubMed

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus; Morling, Niels

    2016-01-01

    Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates automation of DNA sequencing.

  14. Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

    PubMed

    Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

  15. Genomic Signal Processing Methods for Computation of Alignment-Free Distances from DNA Sequences

    PubMed Central

    Borrayo, Ernesto; Mendizabal-Ruiz, E. Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P.; Morales, J. Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments. PMID:25393409

  16. Successful isolation and PCR amplification of DNA from National Institute of Standards and Technology herbal dietary supplement standard reference material powders and extracts.

    PubMed

    Cimino, Matthew T

    2010-03-01

    Twenty-four herbal dietary supplement powder and extract reference standards provided by the National Institute of Standards and Technology (NIST) were investigated using three different commercially available DNA extraction kits to evaluate DNA availability for downstream nucleotide-based applications. The material included samples of Camellia, Citrus, Ephedra, Ginkgo, Hypericum, Serenoa, And Vaccinium. Protocols from Qiagen, MoBio, and Phytopure were used to isolate and purify DNA from the NIST standards. The resulting DNA concentration was quantified using SYBR Green fluorometry. Each of the 24 samples yielded DNA, though the concentration of DNA from each approach was notably different. The Phytopure method consistently yielded more DNA. The average yield ratio was 22 : 3 : 1 (ng/microL; Phytopure : Qiagen : MoBio). Amplification of the internal transcribed spacer II region using PCR was ultimately successful in 22 of the 24 samples. Direct sequencing chromatograms of the amplified material suggested that most of the samples were comprised of mixtures. However, the sequencing chromatograms of 12 of the 24 samples were sufficient to confirm the identity of the target material. The successful extraction, amplification, and sequencing of DNA from these herbal dietary supplement extracts and powders supports a continued effort to explore nucleotide sequence-based tools for the authentication and identification of plants in dietary supplements. (c) Georg Thieme Verlag KG Stuttgart . New York.

  17. What are Whole Exome Sequencing and Whole Genome Sequencing?

    MedlinePlus

    ... the future. For more information about DNA sequencing technologies and their use: Genetics Home Reference discusses whether ... University in St. Louis describes the different sequencing technologies and what the new technologies have meant for ...

  18. Nucleotide Sequence Database Comparison for Routine Dermatophyte Identification by Internal Transcribed Spacer 2 Genetic Region DNA Barcoding.

    PubMed

    Normand, A C; Packeu, A; Cassagne, C; Hendrickx, M; Ranque, S; Piarroux, R

    2018-05-01

    Conventional dermatophyte identification is based on morphological features. However, recent studies have proposed to use the nucleotide sequences of the rRNA internal transcribed spacer (ITS) region as an identification barcode of all fungi, including dermatophytes. Several nucleotide databases are available to compare sequences and thus identify isolates; however, these databases often contain mislabeled sequences that impair sequence-based identification. We evaluated five of these databases on a clinical isolate panel. We selected 292 clinical dermatophyte strains that were prospectively subjected to an ITS2 nucleotide sequence analysis. Sequences were analyzed against the databases, and the results were compared to clusters obtained via DNA alignment of sequence segments. The DNA tree served as the identification standard throughout the study. According to the ITS2 sequence identification, the majority of strains (255/292) belonged to the genus Trichophyton , mainly T. rubrum complex ( n = 184), T. interdigitale ( n = 40), T. tonsurans ( n = 26), and T. benhamiae ( n = 5). Other genera included Microsporum (e.g., M. canis [ n = 21], M. audouinii [ n = 10], Nannizzia gypsea [ n = 3], and Epidermophyton [ n = 3]). Species-level identification of T. rubrum complex isolates was an issue. Overall, ITS DNA sequencing is a reliable tool to identify dermatophyte species given that a comprehensive and correctly labeled database is consulted. Since many inaccurate identification results exist in the DNA databases used for this study, reference databases must be verified frequently and amended in line with the current revisions of fungal taxonomy. Before describing a new species or adding a new DNA reference to the available databases, its position in the phylogenetic tree must be verified. Copyright © 2018 American Society for Microbiology.

  19. Few mitochondrial DNA sequences are inserted into the turkey (Meleagris gallopavo) nuclear genome: evolutionary analyses and informativity in the domestic lineage.

    PubMed

    Schiavo, G; Strillacci, M G; Ribani, A; Bovo, S; Roman-Ponce, S I; Cerolini, S; Bertolini, F; Bagnato, A; Fontanesi, L

    2018-06-01

    Mitochondrial DNA (mtDNA) insertions have been detected in the nuclear genome of many eukaryotes. These sequences are pseudogenes originated by horizontal transfer of mtDNA fragments into the nuclear genome, producing nuclear DNA sequences of mitochondrial origin (numt). In this study we determined the frequency and distribution of mtDNA-originated pseudogenes in the turkey (Meleagris gallopavo) nuclear genome. The turkey reference genome (Turkey_2.01) was aligned with the reference linearized mtDNA sequence using last. A total of 32 numt sequences (corresponding to 18 numt regions derived by unique insertional events) were identified in the turkey nuclear genome (size ranging from 66 to 1415 bp; identity against the modern turkey mtDNA corresponding region ranging from 62% to 100%). Numts were distributed in nine chromosomes and in one scaffold. They derived from parts of 10 mtDNA protein-coding genes, ribosomal genes, the control region and 10 tRNA genes. Seven numt regions reported in the turkey genome were identified in orthologues positions in the Gallus gallus genome and therefore were present in the ancestral genome that in the Cretaceous originated the lineages of the modern crown Galliformes. Five recently integrated turkey numts were validated by PCR in 168 turkeys of six different domestic populations. None of the analysed numts were polymorphic (i.e. absence of the inserted sequence, as reported in numts of recent integration in other species), suggesting that the reticulate speciation model is not useful for explaining the origin of the domesticated turkey lineage. © 2018 Stichting International Foundation for Animal Genetics.

  20. Arrays of nucleic acid probes on biological chips

    DOEpatents

    Chee, Mark; Cronin, Maureen T.; Fodor, Stephen P. A.; Huang, Xiaohua X.; Hubbell, Earl A.; Lipshutz, Robert J.; Lobban, Peter E.; Morris, MacDonald S.; Sheldon, Edward L.

    1998-11-17

    DNA chips containing arrays of oligonucleotide probes can be used to determine whether a target nucleic acid has a nucleotide sequence identical to or different from a specific reference sequence. The array of probes comprises probes exactly complementary to the reference sequence, as well as probes that differ by one or more bases from the exactly complementary probes.

  1. Palindromic Sequence Artifacts Generated during Next Generation Sequencing Library Preparation from Historic and Ancient DNA

    PubMed Central

    Star, Bastiaan; Nederbragt, Alexander J.; Hansen, Marianne H. S.; Skage, Morten; Gilfillan, Gregor D.; Bradbury, Ian R.; Pampoulie, Christophe; Stenseth, Nils Chr; Jakobsen, Kjetill S.; Jentoft, Sissel

    2014-01-01

    Degradation-specific processes and variation in laboratory protocols can bias the DNA sequence composition from samples of ancient or historic origin. Here, we identify a novel artifact in sequences from historic samples of Atlantic cod (Gadus morhua), which forms interrupted palindromes consisting of reverse complementary sequence at the 5′ and 3′-ends of sequencing reads. The palindromic sequences themselves have specific properties – the bases at the 5′-end align well to the reference genome, whereas extensive misalignments exists among the bases at the terminal 3′-end. The terminal 3′ bases are artificial extensions likely caused by the occurrence of hairpin loops in single stranded DNA (ssDNA), which can be ligated and amplified in particular library creation protocols. We propose that such hairpin loops allow the inclusion of erroneous nucleotides, specifically at the 3′-end of DNA strands, with the 5′-end of the same strand providing the template. We also find these palindromes in previously published ancient DNA (aDNA) datasets, albeit at varying and substantially lower frequencies. This artifact can negatively affect the yield of endogenous DNA in these types of samples and introduces sequence bias. PMID:24608104

  2. Reference karyotype and cytomolecular map for loblolly pine (Pinus taeda L.)

    Treesearch

    M. Nurul Islam-faridi; C. Dana Nelson; Thomas L. Kubisiak

    2007-01-01

    A reference karyotype is presented for loblolly pine (Pinus taeda L., subgenus Pinus , section Pinus, subsection Australes), based on fluorescent in situ hybridization (FISH), using 18s-28s rDNA, 5s rDNA, and Arabidopsis-type telomere repeat sequence (A-type TRS). Well...

  3. DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.

    PubMed

    Parson, W; Gusmão, L; Hares, D R; Irwin, J A; Mayr, W R; Morling, N; Pokorak, E; Prinz, M; Salas, A; Schneider, P M; Parsons, T J

    2014-11-01

    The DNA Commission of the International Society of Forensic Genetics (ISFG) regularly publishes guidelines and recommendations concerning the application of DNA polymorphisms to the question of human identification. Previous recommendations published in 2000 addressed the analysis and interpretation of mitochondrial DNA (mtDNA) in forensic casework. While the foundations set forth in the earlier recommendations still apply, new approaches to the quality control, alignment and nomenclature of mitochondrial sequences, as well as the establishment of mtDNA reference population databases, have been developed. Here, we describe these developments and discuss their application to both mtDNA casework and mtDNA reference population databasing applications. While the generation of mtDNA for forensic casework has always been guided by specific standards, it is now well-established that data of the same quality are required for the mtDNA reference population data used to assess the statistical weight of the evidence. As a result, we introduce guidelines regarding sequence generation, as well as quality control measures based on the known worldwide mtDNA phylogeny, that can be applied to ensure the highest quality population data possible. For both casework and reference population databasing applications, the alignment and nomenclature of haplotypes is revised here and the phylogenetic alignment proffered as acceptable standard. In addition, the interpretation of heteroplasmy in the forensic context is updated, and the utility of alignment-free database searches for unbiased probability estimates is highlighted. Finally, we discuss statistical issues and define minimal standards for mtDNA database searches. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  4. Molecular Targeting of Prostate Cancer During Androgen Ablation: Inhibition of CHES1/FOXN3

    DTIC Science & Technology

    2013-05-01

    the DNA sequences (~25^6 reads/sample) were mapped to the human genome reference sequence (hg19...tumor the AR has a genomic abnormality, placing the novel sequence 3’ of the transcriptional start site. However, it is unclear if a genomic alteration...exon/intron organization of the CHES1 gene was determined by BLAST analysis of the human genome using the 1,473-bp CHES1 cDNA sequence

  5. Isolates of viral hemorrhagic septicemia virus from North America and Europe can be detected and distinguished by DNA probes

    USGS Publications Warehouse

    Batts, W.N.; Arakawa, C.K.; Bernard, J.; Winton, J.R.

    1993-01-01

    Biotinylated DNA probes were constructed to hybndize with speclfic sequences within the messenger RNA (mRNA) of the nucleoprotein (N) gene of vlral hemorrhagic septicemia virus (VHSV) reference strains from Europe (07-71) and North Arnenca (Makah) Probes were synthesized that were complementary to (1) a 29-nucleotide sequence near the center of the N gene conlmon to both the 07-71 and Makah reference strains of the virus (2) a unique 28- nucleotide sequence that followed the open readng frame of the Makah N gene mRNA most of which was absent In the 07-71 strain, and (3) a 22-nucleobde sequence wthin the 07-71 N gene that had 6 nllsmatches \

  6. High-throughput sequencing of three Lemnoideae (duckweeds) chloroplast genomes from total DNA.

    PubMed

    Wang, Wenqin; Messing, Joachim

    2011-01-01

    Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  7. High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA

    PubMed Central

    Wang, Wenqin; Messing, Joachim

    2011-01-01

    Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804

  8. DNA barcode goes two-dimensions: DNA QR code web server.

    PubMed

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  9. TaxI: a software tool for DNA barcoding using distance methods

    PubMed Central

    Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

    2005-01-01

    DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755

  10. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

    PubMed

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R

    2009-07-01

    The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/

  11. Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.

    PubMed

    Liu, Shanlin; Yang, Chentao; Zhou, Chengran; Zhou, Xin

    2017-12-01

    Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn't show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. © The Authors 2017. Published by Oxford University Press.

  12. Cloned plasmid DNA fragments as calibrators for controlling GMOs: different real-time duplex quantitative PCR methods.

    PubMed

    Taverniers, Isabel; Van Bockstaele, Erik; De Loose, Marc

    2004-03-01

    Analytical real-time PCR technology is a powerful tool for implementation of the GMO labeling regulations enforced in the EU. The quality of analytical measurement data obtained by quantitative real-time PCR depends on the correct use of calibrator and reference materials (RMs). For GMO methods of analysis, the choice of appropriate RMs is currently under debate. So far, genomic DNA solutions from certified reference materials (CRMs) are most often used as calibrators for GMO quantification by means of real-time PCR. However, due to some intrinsic features of these CRMs, errors may be expected in the estimations of DNA sequence quantities. In this paper, two new real-time PCR methods are presented for Roundup Ready soybean, in which two types of plasmid DNA fragments are used as calibrators. Single-target plasmids (STPs) diluted in a background of genomic DNA were used in the first method. Multiple-target plasmids (MTPs) containing both sequences in one molecule were used as calibrators for the second method. Both methods simultaneously detect a promoter 35S sequence as GMO-specific target and a lectin gene sequence as endogenous reference target in a duplex PCR. For the estimation of relative GMO percentages both "delta C(T)" and "standard curve" approaches are tested. Delta C(T) methods are based on direct comparison of measured C(T) values of both the GMO-specific target and the endogenous target. Standard curve methods measure absolute amounts of target copies or haploid genome equivalents. A duplex delta C(T) method with STP calibrators performed at least as well as a similar method with genomic DNA calibrators from commercial CRMs. Besides this, high quality results were obtained with a standard curve method using MTP calibrators. This paper demonstrates that plasmid DNA molecules containing either one or multiple target sequences form perfect alternative calibrators for GMO quantification and are especially suitable for duplex PCR reactions.

  13. PMS2 gene mutational analysis: direct cDNA sequencing to circumvent pseudogene interference.

    PubMed

    Wimmer, Katharina; Wernstedt, Annekatrin

    2014-01-01

    The presence of highly homologous pseudocopies can compromise the mutation analysis of a gene of interest. In particular, when using PCR-based strategies, pseudogene co-amplification has to be effectively prevented. This is often achieved by using primers designed to be parental gene specific according to the reference sequence and by applying stringent PCR conditions. However, there are cases in which this approach is of limited utility. For example, it has been shown that the PMS2 gene exchanges sequences with one of its pseudogenes, named PMS2CL. This results in functional PMS2 alleles containing pseudogene-derived sequences at their 3'-end and in nonfunctional PMS2CL pseudogene alleles that contain gene-derived sequences. Hence, the paralogues cannot be distinguished according to the reference sequence. This shortcoming can be effectively circumvented by using direct cDNA sequencing. This approach is based on the selective amplification of PMS2 transcripts in two overlapping 1.6-kb RT-PCR products. In addition to avoiding pseudogene co-amplification and allele dropout, this method has also the advantage that it allows to effectively identify deletions, splice mutations, and de novo retrotransposon insertions that escape the detection of most DNA-based mutation analysis protocols.

  14. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    PubMed Central

    2011-01-01

    Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061

  15. Simple and efficient identification of rare recessive pathologically important sequence variants from next generation exome sequence data.

    PubMed

    Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T

    2013-07-01

    Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. © 2013 WILEY PERIODICALS, INC.

  16. IDLN-MSP: Idiolocal normalization of real-time methylation-specific PCR for genetic imbalanced DNA specimens.

    PubMed

    Santourlidis, Simeon; Ghanjati, Foued; Beermann, Agnes; Hermanns, Thomas; Poyet, Cédric

    2016-02-01

    Sensitive, accurate, and reliable measurements of tumor cell-specific DNA methylation changes are of fundamental importance in cancer diagnosis, prognosis, and monitoring. Real-time methylation-specific PCR (MSP) using intercalating dyes is an established method of choice for this purpose. Here we present a simple but crucial adaptation of this widely applied method that overcomes a major obstacle: genetic abnormalities in the DNA samples, such as aneuploidy or copy number variations, that could result in inaccurate results due to improper normalization if the copy numbers of the target and reference sequences are not the same. In our idiolocal normalization (IDLN) method, the locus for the normalizing, methylation-independent reference amplification is chosen close to the locus of the methylation-dependent target amplification. This ensures that the copy numbers of both the target and reference sequences will be identical in most cases if they are close enough to each other, resulting in accurate normalization and reliable comparative measurements of DNA methylation in clinical samples when using real-time MSP.

  17. Using herbarium-derived DNAs to assemble a large-scale DNA barcode library for the vascular plants of Canada.

    PubMed

    Kuzmina, Maria L; Braukmann, Thomas W A; Fazekas, Aron J; Graham, Sean W; Dewaard, Stephanie L; Rodrigues, Anuar; Bennett, Bruce A; Dickinson, Timothy A; Saarela, Jeffery M; Catling, Paul M; Newmaster, Steven G; Percy, Diana M; Fenneman, Erin; Lauron-Moreau, Aurélien; Ford, Bruce; Gillespie, Lynn; Subramanyam, Ragupathy; Whitton, Jeannette; Jennings, Linda; Metsger, Deborah; Warne, Connor P; Brown, Allison; Sears, Elizabeth; Dewaard, Jeremy R; Zakharov, Evgeny V; Hebert, Paul D N

    2017-12-01

    Constructing complete, accurate plant DNA barcode reference libraries can be logistically challenging for large-scale floras. Here we demonstrate the promise and challenges of using herbarium collections for building a DNA barcode reference library for the vascular plant flora of Canada. Our study examined 20,816 specimens representing 5076 of 5190 vascular plant species in Canada (98%). For 98% of the specimens, at least one of the DNA barcode regions was recovered from the plastid loci rbcL and matK and from the nuclear ITS2 region. We used beta regression to quantify the effects of age, type of preservation, and taxonomic affiliation (family) on DNA sequence recovery. Specimen age and method of preservation had significant effects on sequence recovery for all markers, but influenced some families more (e.g., Boraginaceae) than others (e.g., Asteraceae). Our DNA barcode library represents an unparalleled resource for metagenomic and ecological genetic research working on temperate and arctic biomes. An observed decline in sequence recovery with specimen age may be associated with poor primer matches, intragenomic variation (for ITS2), or inhibitory secondary compounds in some taxa.

  18. Using herbarium-derived DNAs to assemble a large-scale DNA barcode library for the vascular plants of Canada1

    PubMed Central

    Kuzmina, Maria L.; Braukmann, Thomas W. A.; Fazekas, Aron J.; Graham, Sean W.; Dewaard, Stephanie L.; Rodrigues, Anuar; Bennett, Bruce A.; Dickinson, Timothy A.; Saarela, Jeffery M.; Catling, Paul M.; Newmaster, Steven G.; Percy, Diana M.; Fenneman, Erin; Lauron-Moreau, Aurélien; Ford, Bruce; Gillespie, Lynn; Subramanyam, Ragupathy; Whitton, Jeannette; Jennings, Linda; Metsger, Deborah; Warne, Connor P.; Brown, Allison; Sears, Elizabeth; Dewaard, Jeremy R.; Zakharov, Evgeny V.; Hebert, Paul D. N.

    2017-01-01

    Premise of the study: Constructing complete, accurate plant DNA barcode reference libraries can be logistically challenging for large-scale floras. Here we demonstrate the promise and challenges of using herbarium collections for building a DNA barcode reference library for the vascular plant flora of Canada. Methods: Our study examined 20,816 specimens representing 5076 of 5190 vascular plant species in Canada (98%). For 98% of the specimens, at least one of the DNA barcode regions was recovered from the plastid loci rbcL and matK and from the nuclear ITS2 region. We used beta regression to quantify the effects of age, type of preservation, and taxonomic affiliation (family) on DNA sequence recovery. Results: Specimen age and method of preservation had significant effects on sequence recovery for all markers, but influenced some families more (e.g., Boraginaceae) than others (e.g., Asteraceae). Discussion: Our DNA barcode library represents an unparalleled resource for metagenomic and ecological genetic research working on temperate and arctic biomes. An observed decline in sequence recovery with specimen age may be associated with poor primer matches, intragenomic variation (for ITS2), or inhibitory secondary compounds in some taxa. PMID:29299394

  19. Sequencing degraded DNA from non-destructively sampled museum specimens for RAD-tagging and low-coverage shotgun phylogenetics.

    PubMed

    Tin, Mandy Man-Ying; Economo, Evan Philip; Mikheyev, Alexander Sergeyevich

    2014-01-01

    Ancient and archival DNA samples are valuable resources for the study of diverse historical processes. In particular, museum specimens provide access to biotas distant in time and space, and can provide insights into ecological and evolutionary changes over time. However, archival specimens are difficult to handle; they are often fragile and irreplaceable, and typically contain only short segments of denatured DNA. Here we present a set of tools for processing such samples for state-of-the-art genetic analysis. First, we report a protocol for minimally destructive DNA extraction of insect museum specimens, which produced sequenceable DNA from all of the samples assayed. The 11 specimens analyzed had fragmented DNA, rarely exceeding 100 bp in length, and could not be amplified by conventional PCR targeting the mitochondrial cytochrome oxidase I gene. Our approach made these samples amenable to analysis with commonly used next-generation sequencing-based molecular analytic tools, including RAD-tagging and shotgun genome re-sequencing. First, we used museum ant specimens from three species, each with its own reference genome, for RAD-tag mapping. Were able to use the degraded DNA sequences, which were sequenced in full, to identify duplicate reads and filter them prior to base calling. Second, we re-sequenced six Hawaiian Drosophila species, with millions of years of divergence, but with only a single available reference genome. Despite a shallow coverage of 0.37 ± 0.42 per base, we could recover a sufficient number of overlapping SNPs to fully resolve the species tree, which was consistent with earlier karyotypic studies, and previous molecular studies, at least in the regions of the tree that these studies could resolve. Although developed for use with degraded DNA, all of these techniques are readily applicable to more recent tissue, and are suitable for liquid handling automation.

  20. Fast single-pass alignment and variant calling using sequencing data

    USDA-ARS?s Scientific Manuscript database

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  1. A 28,000 Years Old Cro-Magnon mtDNA Sequence Differs from All Potentially Contaminating Modern Sequences

    PubMed Central

    Caramelli, David; Milani, Lucio; Vai, Stefania; Modi, Alessandra; Pecchioli, Elena; Girardi, Matteo; Pilli, Elena; Lari, Martina; Lippi, Barbara; Ronchitelli, Annamaria; Mallegni, Francesco; Casoli, Antonella; Bertorelle, Giorgio; Barbujani, Guido

    2008-01-01

    Background DNA sequences from ancient speciments may in fact result from undetected contamination of the ancient specimens by modern DNA, and the problem is particularly challenging in studies of human fossils. Doubts on the authenticity of the available sequences have so far hampered genetic comparisons between anatomically archaic (Neandertal) and early modern (Cro-Magnoid) Europeans. Methodology/Principal Findings We typed the mitochondrial DNA (mtDNA) hypervariable region I in a 28,000 years old Cro-Magnoid individual from the Paglicci cave, in Italy (Paglicci 23) and in all the people who had contact with the sample since its discovery in 2003. The Paglicci 23 sequence, determined through the analysis of 152 clones, is the Cambridge reference sequence, and cannot possibly reflect contamination because it differs from all potentially contaminating modern sequences. Conclusions/Significance: The Paglicci 23 individual carried a mtDNA sequence that is still common in Europe, and which radically differs from those of the almost contemporary Neandertals, demonstrating a genealogical continuity across 28,000 years, from Cro-Magnoid to modern Europeans. Because all potential sources of modern DNA contamination are known, the Paglicci 23 sample will offer a unique opportunity to get insight for the first time into the nuclear genes of early modern Europeans. PMID:18628960

  2. DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server

    PubMed Central

    Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113

  3. DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique

    PubMed Central

    Li, Pinghao; Wang, Shuang; Kim, Jihoon; Xiong, Hongkai; Ohno-Machado, Lucila; Jiang, Xiaoqian

    2013-01-01

    Genome data are becoming increasingly important for modern medicine. As the rate of increase in DNA sequencing outstrips the rate of increase in disk storage capacity, the storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We propose a two-pass lossless genome compression algorithm, which highlights the synthesis of complementary contextual models, to improve the compression performance. The proposed framework could handle genome compression with and without reference sequences, and demonstrated performance advantages over best existing algorithms. The method for reference-free compression led to bit rates of 1.720 and 1.838 bits per base for bacteria and yeast, which were approximately 3.7% and 2.6% better than the state-of-the-art algorithms. Regarding performance with reference, we tested on the first Korean personal genome sequence data set, and our proposed method demonstrated a 189-fold compression rate, reducing the raw file size from 2986.8 MB to 15.8 MB at a comparable decompression cost with existing algorithms. DNAcompact is freely available at https://sourceforge.net/projects/dnacompact/for research purpose. PMID:24282536

  4. An atlas of DNA methylation in diverse bovine tissues

    USDA-ARS?s Scientific Manuscript database

    We launched an effort to produce a reference cattle DNA methylation resource to improve animal production. We will employ experimental pipelines built around next generation sequencing technologies to map DNA methylation in cultured cells and primary tissues systems frequently involved in animal pro...

  5. STRBase: a short tandem repeat DNA database for the human identity testing community

    PubMed Central

    Ruitberg, Christian M.; Reeder, Dennis J.; Butler, John M.

    2001-01-01

    The National Institute of Standards and Technology (NIST) has compiled and maintained a Short Tandem Repeat DNA Internet Database (http://www.cstl.nist.gov/biotech/strbase/) since 1997 commonly referred to as STRBase. This database is an information resource for the forensic DNA typing community with details on commonly used short tandem repeat (STR) DNA markers. STRBase consolidates and organizes the abundant literature on this subject to facilitate on-going efforts in DNA typing. Observed alleles and annotated sequence for each STR locus are described along with a review of STR analysis technologies. Additionally, commercially available STR multiplex kits are described, published polymerase chain reaction (PCR) primer sequences are reported, and validation studies conducted by a number of forensic laboratories are listed. To supplement the technical information, addresses for scientists and hyperlinks to organizations working in this area are available, along with the comprehensive reference list of over 1300 publications on STRs used for DNA typing purposes. PMID:11125125

  6. Unique nucleotide sequence-guided assembly of repetitive DNA parts for synthetic biology applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Torella, JP; Lienert, F; Boehm, CR

    2014-08-07

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts, and they hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies-for example, repeated terminator and insulator sequences-that complicate recombination-based assembly. We and others have recently developed DNA assembly methods, which we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked withmore » UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly assembled constructs, or into high-quality combinatorial libraries in only 2-3 d. If the DNA parts must be generated from scratch, an additional 2-5 d are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques.« less

  7. Unique nucleotide sequence (UNS)-guided assembly of repetitive DNA parts for synthetic biology applications

    PubMed Central

    Torella, Joseph P.; Lienert, Florian; Boehm, Christian R.; Chen, Jan-Hung; Way, Jeffrey C.; Silver, Pamela A.

    2016-01-01

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts and hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies — for example repeated terminator and insulator sequences — that complicate recombination-based assembly. We and others have recently developed DNA assembly methods that we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly-assembled constructs, or into high-quality combinatorial libraries in only 2–3 days. If the DNA parts must be generated from scratch, an additional 2–5 days are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques. PMID:25101822

  8. Evaluation of droplet digital PCR for characterizing plasmid reference material used for quantifying ammonia oxidizers and denitrifiers.

    PubMed

    Dong, Lianhua; Meng, Ying; Wang, Jing; Liu, Yingying

    2014-02-01

    DNA reference materials of certified value have a critical function in many analytical processes of DNA measurement. Quantification of amoA genes in ammonia oxidizing bacteria (AOB) and archaea (AOA), and of nirS and nosZ genes in the denitrifiers is very important for determining their distribution and abundance in the natural environment. A plasmid reference material containing nirS, nosZ, amoA-AOB, and amoA-AOA is developed to provide a DNA standard with copy number concentration for ensuring comparability and reliability of quantification of these genes. Droplet digital PCR (ddPCR) was evaluated for characterization of the plasmid reference material. The result revealed that restriction endonuclease digestion of plasmids can improve amplification efficiency and minimize the measurement bias of ddPCR. Compared with the conformation of the plasmid, the size of the DNA fragment containing the target sequence and the location of the restriction site relative to the target sequence are not significant factors affecting plasmid quantification by ddPCR. Liquid chromatography-isotope dilution mass spectrometry (LC-IDMS) was used to provide independent data for quantifying the plasmid reference material. The copy number concentration of the digested plasmid determined by ddPCR agreed well with that determined by LC-IDMS, improving both the accuracy and reliability of the plasmid reference material. The reference value, with its expanded uncertainty (k = 2), of the plasmid reference material was determined to be (5.19 ± 0.41) × 10(9) copies μL(-1) by averaging the results of two independent measurements. Consideration of the factors revealed in this study can improve the reliability and accuracy of ddPCR; thus, this method has the potential to accurately quantify DNA reference materials.

  9. [Genome-scale sequence data processing and epigenetic analysis of DNA methylation].

    PubMed

    Wang, Ting-Zhang; Shan, Gao; Xu, Jian-Hong; Xue, Qing-Zhong

    2013-06-01

    A new approach recently developed for detecting cytosine DNA methylation (mC) and analyzing the genome-scale DNA methylation profiling, is called BS-Seq which is based on bisulfite conversion of genomic DNA combined with next-generation sequencing. The method can not only provide an insight into the difference of genome-scale DNA methylation among different organisms, but also reveal the conservation of DNA methylation in all contexts and nucleotide preference for different genomic regions, including genes, exons, and repetitive DNA sequences. It will be helpful to under-stand the epigenetic impacts of cytosine DNA methylation on the regulation of gene expression and maintaining silence of repetitive sequences, such as transposable elements. In this paper, we introduce the preprocessing steps of DNA methylation data, by which cytosine (C) and guanine (G) in the reference sequence are transferred to thymine (T) and adenine (A), and cytosine in reads is transferred to thymine, respectively. We also comprehensively review the main content of the DNA methylation analysis on the genomic scale: (1) the cytosine methylation under the context of different sequences; (2) the distribution of genomic methylcytosine; (3) DNA methylation context and the preference for the nucleotides; (4) DNA- protein interaction sites of DNA methylation; (5) degree of methylation of cytosine in the different structural elements of genes. DNA methylation analysis technique provides a powerful tool for the epigenome study in human and other species, and genes and environment interaction, and founds the theoretical basis for further development of disease diagnostics and therapeutics in human.

  10. Filling reference gaps via assembling DNA barcodes using high-throughput sequencing—moving toward barcoding the world

    PubMed Central

    Zhou, Chengran

    2017-01-01

    Abstract Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)–based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn’t show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. PMID:29077841

  11. Accelerated construction of a regional DNA-barcode reference library: Caddisflies (Trichoptera) in the Great Smoky Mountains National Park

    USGS Publications Warehouse

    Zhou, X.; Robinson, J.L.; Geraci, C.J.; Parker, C.R.; Flint, O.S.; Etnier, D.A.; Ruiter, D.; DeWalt, R.E.; Jacobus, L.M.; Hebert, P.D.N.

    2011-01-01

    Deoxyribonucleic acid (DNA) barcoding is an effective tool for species identification and lifestage association in a wide range of animal taxa. We developed a strategy for rapid construction of a regional DNA-barcode reference library and used the caddisflies (Trichoptera) of the Great Smoky Mountains National Park (GSMNP) as a model. Nearly 1000 cytochrome c oxidase subunit I (COI) sequences, representing 209 caddisfly species previously recorded from GSMNP, were obtained from the global Trichoptera Barcode of Life campaign. Most of these sequences were collected from outside the GSMNP area. Another 645 COI sequences, representing 80 species, were obtained from specimens collected in a 3-d bioblitz (short-term, intense sampling program) in GSMNP. The joint collections provided barcode coverage for 212 species, 91% of the GSMNP fauna. Inclusion of samples from other localities greatly expedited construction of the regional DNA-barcode reference library. This strategy increased intraspecific divergence and decreased average distances to nearest neighboring species, but the DNA-barcode library was able to differentiate 93% of the GSMNP Trichoptera species examined. Global barcoding projects will aid construction of regional DNA-barcode libraries, but local surveys make crucial contributions to progress by contributing rare or endemic species and full-length barcodes generated from high-quality DNA. DNA taxonomy is not a goal of our present work, but the investigation of COI divergence patterns in caddisflies is providing new insights into broader biodiversity patterns in this group and has directed attention to various issues, ranging from the need to re-evaluate species taxonomy with integrated morphological and molecular evidence to the necessity of an appropriate interpretation of barcode analyses and its implications in understanding species diversity (in contrast to a simple claim for barcoding failure).

  12. Evaluation of partial 16S ribosomal DNA sequencing for identification of nocardia species by using the MicroSeq 500 system with an expanded database.

    PubMed

    Cloud, Joann L; Conville, Patricia S; Croft, Ann; Harmsen, Dag; Witebsky, Frank G; Carroll, Karen C

    2004-02-01

    Identification of clinically significant nocardiae to the species level is important in patient diagnosis and treatment. A study was performed to evaluate Nocardia species identification obtained by partial 16S ribosomal DNA (rDNA) sequencing by the MicroSeq 500 system with an expanded database. The expanded portion of the database was developed from partial 5' 16S rDNA sequences derived from 28 reference strains (from the American Type Culture Collection and the Japanese Collection of Microorganisms). The expanded MicroSeq 500 system was compared to (i). conventional identification obtained from a combination of growth characteristics with biochemical and drug susceptibility tests; (ii). molecular techniques involving restriction enzyme analysis (REA) of portions of the 16S rRNA and 65-kDa heat shock protein genes; and (iii). when necessary, sequencing of a 999-bp fragment of the 16S rRNA gene. An unknown isolate was identified as a particular species if the sequence obtained by partial 16S rDNA sequencing by the expanded MicroSeq 500 system was 99.0% similar to that of the reference strain. Ninety-four nocardiae representing 10 separate species were isolated from patient specimens and examined by using the three different methods. Sequencing of partial 16S rDNA by the expanded MicroSeq 500 system resulted in only 72% agreement with conventional methods for species identification and 90% agreement with the alternative molecular methods. Molecular methods for identification of Nocardia species provide more accurate and rapid results than the conventional methods using biochemical and susceptibility testing. With an expanded database, the MicroSeq 500 system for partial 16S rDNA was able to correctly identify the human pathogens N. brasiliensis, N. cyriacigeorgica, N. farcinica, N. nova, N. otitidiscaviarum, and N. veterana.

  13. Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.

    PubMed

    Just, Rebecca S; Scheible, Melissa K; Fast, Spence A; Sturk-Andreaggi, Kimberly; Higginbotham, Jennifer L; Lyons, Elizabeth A; Bush, Jocelyn M; Peck, Michelle A; Ring, Joseph D; Diegoli, Toni M; Röck, Alexander W; Huber, Gabriela E; Nagl, Simone; Strobl, Christina; Zimmermann, Bettina; Parson, Walther; Irwin, Jodi A

    2014-05-01

    Forensic mitochondrial DNA (mtDNA) testing requires appropriate, high quality reference population data for estimating the rarity of questioned haplotypes and, in turn, the strength of the mtDNA evidence. Available reference databases (SWGDAM, EMPOP) currently include information from the mtDNA control region; however, novel methods that quickly and easily recover mtDNA coding region data are becoming increasingly available. Though these assays promise to both facilitate the acquisition of mitochondrial genome (mtGenome) data and maximize the general utility of mtDNA testing in forensics, the appropriate reference data and database tools required for their routine application in forensic casework are lacking. To address this deficiency, we have undertaken an effort to: (1) increase the large-scale availability of high-quality entire mtGenome reference population data, and (2) improve the information technology infrastructure required to access/search mtGenome data and employ them in forensic casework. Here, we describe the application of a data generation and analysis workflow to the development of more than 400 complete, forensic-quality mtGenomes from low DNA quantity blood serum specimens as part of a U.S. National Institute of Justice funded reference population databasing initiative. We discuss the minor modifications made to a published mtGenome Sanger sequencing protocol to maintain a high rate of throughput while minimizing manual reprocessing with these low template samples. The successful use of this semi-automated strategy on forensic-like samples provides practical insight into the feasibility of producing complete mtGenome data in a routine casework environment, and demonstrates that large (>2kb) mtDNA fragments can regularly be recovered from high quality but very low DNA quantity specimens. Further, the detailed empirical data we provide on the amplification success rates across a range of DNA input quantities will be useful moving forward as PCR-based strategies for mtDNA enrichment are considered for targeted next-generation sequencing workflows. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  14. Sequence verification of synthetic DNA by assembly of sequencing reads

    PubMed Central

    Wilson, Mandy L.; Cai, Yizhi; Hanlon, Regina; Taylor, Samantha; Chevreux, Bastien; Setubal, João C.; Tyler, Brett M.; Peccoud, Jean

    2013-01-01

    Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.org. PMID:23042248

  15. Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries

    PubMed Central

    Carpenter, Meredith L.; Buenrostro, Jason D.; Valdiosera, Cristina; Schroeder, Hannes; Allentoft, Morten E.; Sikora, Martin; Rasmussen, Morten; Gravel, Simon; Guillén, Sonia; Nekhrizov, Georgi; Leshtakov, Krasimir; Dimitrova, Diana; Theodossiev, Nikola; Pettener, Davide; Luiselli, Donata; Sandoval, Karla; Moreno-Estrada, Andrés; Li, Yingrui; Wang, Jun; Gilbert, M. Thomas P.; Willerslev, Eske; Greenleaf, William J.; Bustamante, Carlos D.

    2013-01-01

    Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062–147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217–73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples. PMID:24568772

  16. Human Contamination in Public Genome Assemblies.

    PubMed

    Kryukov, Kirill; Imanishi, Tadashi

    2016-01-01

    Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases.

  17. Utility of 16S rDNA Sequencing for Identification of Rare Pathogenic Bacteria.

    PubMed

    Loong, Shih Keng; Khor, Chee Sieng; Jafar, Faizatul Lela; AbuBakar, Sazaly

    2016-11-01

    Phenotypic identification systems are established methods for laboratory identification of bacteria causing human infections. Here, the utility of phenotypic identification systems was compared against 16S rDNA identification method on clinical isolates obtained during a 5-year study period, with special emphasis on isolates that gave unsatisfactory identification. One hundred and eighty-seven clinical bacteria isolates were tested with commercial phenotypic identification systems and 16S rDNA sequencing. Isolate identities determined using phenotypic identification systems and 16S rDNA sequencing were compared for similarity at genus and species level, with 16S rDNA sequencing as the reference method. Phenotypic identification systems identified ~46% (86/187) of the isolates with identity similar to that identified using 16S rDNA sequencing. Approximately 39% (73/187) and ~15% (28/187) of the isolates showed different genus identity and could not be identified using the phenotypic identification systems, respectively. Both methods succeeded in determining the species identities of 55 isolates; however, only ~69% (38/55) of the isolates matched at species level. 16S rDNA sequencing could not determine the species of ~20% (37/187) of the isolates. The 16S rDNA sequencing is a useful method over the phenotypic identification systems for the identification of rare and difficult to identify bacteria species. The 16S rDNA sequencing method, however, does have limitation for species-level identification of some bacteria highlighting the need for better bacterial pathogen identification tools. © 2016 Wiley Periodicals, Inc.

  18. mtDNAmanager: a Web-based tool for the management and quality analysis of mitochondrial DNA control-region sequences

    PubMed Central

    Lee, Hwan Young; Song, Injee; Ha, Eunho; Cho, Sung-Bae; Yang, Woo Ick; Shin, Kyoung-Jin

    2008-01-01

    Background For the past few years, scientific controversy has surrounded the large number of errors in forensic and literature mitochondrial DNA (mtDNA) data. However, recent research has shown that using mtDNA phylogeny and referring to known mtDNA haplotypes can be useful for checking the quality of sequence data. Results We developed a Web-based bioinformatics resource "mtDNAmanager" that offers a convenient interface supporting the management and quality analysis of mtDNA sequence data. The mtDNAmanager performs computations on mtDNA control-region sequences to estimate the most-probable mtDNA haplogroups and retrieves similar sequences from a selected database. By the phased designation of the most-probable haplogroups (both expected and estimated haplogroups), mtDNAmanager enables users to systematically detect errors whilst allowing for confirmation of the presence of clear key diagnostic mutations and accompanying mutations. The query tools of mtDNAmanager also facilitate database screening with two options of "match" and "include the queried nucleotide polymorphism". In addition, mtDNAmanager provides Web interfaces for users to manage and analyse their own data in batch mode. Conclusion The mtDNAmanager will provide systematic routines for mtDNA sequence data management and analysis via easily accessible Web interfaces, and thus should be very useful for population, medical and forensic studies that employ mtDNA analysis. mtDNAmanager can be accessed at . PMID:19014619

  19. Assessing DNA Barcodes for Species Identification in North American Reptiles and Amphibians in Natural History Collections.

    PubMed

    Chambers, E Anne; Hebert, Paul D N

    2016-01-01

    High rates of species discovery and loss have led to the urgent need for more rapid assessment of species diversity in the herpetofauna. DNA barcoding allows for the preliminary identification of species based on sequence divergence. Prior DNA barcoding work on reptiles and amphibians has revealed higher biodiversity counts than previously estimated due to cases of cryptic and undiscovered species. Past studies have provided DNA barcodes for just 14% of the North American herpetofauna, revealing the need for expanded coverage. This study extends the DNA barcode reference library for North American herpetofauna, assesses the utility of this approach in aiding species delimitation, and examines the correspondence between current species boundaries and sequence clusters designated by the BIN system. Sequences were obtained from 730 specimens, representing 274 species (43%) from the North American herpetofauna. Mean intraspecific divergences were 1% and 3%, while average congeneric sequence divergences were 16% and 14% in amphibians and reptiles, respectively. BIN assignments corresponded with current species boundaries in 79% of amphibians, 100% of turtles, and 60% of squamates. Deep divergences (>2%) were noted in 35% of squamate and 16% of amphibian species, and low divergences (<2%) occurred in 12% of reptiles and 23% of amphibians, patterns reflected in BIN assignments. Sequence recovery declined with specimen age, and variation in recovery success was noted among collections. Within collections, barcodes effectively flagged seven mislabeled tissues, and barcode fragments were recovered from five formalin-fixed specimens. This study demonstrates that DNA barcodes can effectively flag errors in museum collections, while BIN splits and merges reveal taxa belonging to deeply diverged or hybridizing lineages. This study is the first effort to compile a reference library of DNA barcodes for herpetofauna on a continental scale.

  20. Assessing DNA Barcodes for Species Identification in North American Reptiles and Amphibians in Natural History Collections

    PubMed Central

    Chambers, E. Anne; Hebert, Paul D. N.

    2016-01-01

    Background High rates of species discovery and loss have led to the urgent need for more rapid assessment of species diversity in the herpetofauna. DNA barcoding allows for the preliminary identification of species based on sequence divergence. Prior DNA barcoding work on reptiles and amphibians has revealed higher biodiversity counts than previously estimated due to cases of cryptic and undiscovered species. Past studies have provided DNA barcodes for just 14% of the North American herpetofauna, revealing the need for expanded coverage. Methodology/Principal Findings This study extends the DNA barcode reference library for North American herpetofauna, assesses the utility of this approach in aiding species delimitation, and examines the correspondence between current species boundaries and sequence clusters designated by the BIN system. Sequences were obtained from 730 specimens, representing 274 species (43%) from the North American herpetofauna. Mean intraspecific divergences were 1% and 3%, while average congeneric sequence divergences were 16% and 14% in amphibians and reptiles, respectively. BIN assignments corresponded with current species boundaries in 79% of amphibians, 100% of turtles, and 60% of squamates. Deep divergences (>2%) were noted in 35% of squamate and 16% of amphibian species, and low divergences (<2%) occurred in 12% of reptiles and 23% of amphibians, patterns reflected in BIN assignments. Sequence recovery declined with specimen age, and variation in recovery success was noted among collections. Within collections, barcodes effectively flagged seven mislabeled tissues, and barcode fragments were recovered from five formalin-fixed specimens. Conclusions/Significance This study demonstrates that DNA barcodes can effectively flag errors in museum collections, while BIN splits and merges reveal taxa belonging to deeply diverged or hybridizing lineages. This study is the first effort to compile a reference library of DNA barcodes for herpetofauna on a continental scale. PMID:27116180

  1. mtDNA sequence diversity of Hazara ethnic group from Pakistan.

    PubMed

    Rakha, Allah; Fatima; Peng, Min-Sheng; Adan, Atif; Bi, Rui; Yasmin, Memona; Yao, Yong-Gang

    2017-09-01

    The present study was undertaken to investigate mitochondrial DNA (mtDNA) control region sequences of Hazaras from Pakistan, so as to generate mtDNA reference database for forensic casework in Pakistan and to analyze phylogenetic relationship of this particular ethnic group with geographically proximal populations. Complete mtDNA control region (nt 16024-576) sequences were generated through Sanger Sequencing for 319 Hazara individuals from Quetta, Baluchistan. The population sample set showed a total of 189 distinct haplotypes, belonging mainly to West Eurasian (51.72%), East & Southeast Asian (29.78%) and South Asian (18.50%) haplogroups. Compared with other populations from Pakistan, the Hazara population had a relatively high haplotype diversity (0.9945) and a lower random match probability (0.0085). The dataset has been incorporated into EMPOP database under accession number EMP00680. The data herein comprises the largest, and likely most thoroughly examined, control region mtDNA dataset from Hazaras of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Complete mtDNA sequencing reveals mutations m.9185T>C and m.13513G>A in three patients with Leigh syndrome.

    PubMed

    Pelnena, Dita; Burnyte, Birute; Jankevics, Eriks; Lace, Baiba; Dagyte, Evelina; Grigalioniene, Kristina; Utkus, Algirdas; Krumina, Zita; Rozentale, Jolanta; Adomaitiene, Irina; Stavusis, Janis; Pliss, Liana; Inashkina, Inna

    2017-12-12

    The most common mitochondrial disorder in children is Leigh syndrome, which is a progressive and genetically heterogeneous neurodegenerative disorder caused by mutations in nuclear genes or mitochondrial DNA (mtDNA). In the present study, a novel and robust method of complete mtDNA sequencing, which allows amplification of the whole mitochondrial genome, was tested. Complete mtDNA sequencing was performed in a cohort of patients with suspected mitochondrial mutations. Patients from Latvia and Lithuania (n = 92 and n = 57, respectively) referred by clinical geneticists were included. The de novo point mutations m.9185T>C and m.13513G>A, respectively, were detected in two patients with lactic acidosis and neurodegenerative lesions. In one patient with neurodegenerative lesions, the mutation m.9185T>C was identified. These mutations are associated with Leigh syndrome. The present data suggest that full-length mtDNA sequencing is recommended as a supplement to nuclear gene testing and enzymatic assays to enhance mitochondrial disease diagnostics.

  3. Local alignment of two-base encoded DNA sequence

    PubMed Central

    Homer, Nils; Merriman, Barry; Nelson, Stanley F

    2009-01-01

    Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732

  4. Diagnostics of Neisseriaceae and Moraxellaceae by Ribosomal DNA Sequencing: Ribosomal Differentiation of Medical Microorganisms

    PubMed Central

    Harmsen, Dag; Singer, Christian; Rothgänger, Jörg; Tønjum, Tone; Sybren de Hoog, Gerrit; Shah, Haroun; Albert, Jürgen; Frosch, Matthias

    2001-01-01

    Fast and reliable identification of microbial isolates is a fundamental goal of clinical microbiology. However, in the case of some fastidious gram-negative bacterial species, classical phenotype identification based on either metabolic, enzymatic, or serological methods is difficult, time-consuming, and/or inadequate. 16S or 23S ribosomal DNA (rDNA) bacterial sequencing will most often result in accurate speciation of isolates. Therefore, the objective of this study was to find a hypervariable rDNA stretch, flanked by strongly conserved regions, which is suitable for molecular species identification of members of the Neisseriaceae and Moraxellaceae. The inter- and intrageneric relationships were investigated using comparative sequence analysis of PCR-amplified partial 16S and 23S rDNAs from a total of 94 strains. When compared to the type species of the genera Acinetobacter, Moraxella, and Neisseria, an average of 30 polymorphic positions was observed within the partial 16S rDNA investigated (corresponding to Escherichia coli positions 54 to 510) for each species and an average of 11 polymorphic positions was observed within the 202 nucleotides of the 23S rDNA gene (positions 1400 to 1600). Neisseria macacae and Neisseria mucosa subsp. mucosa (ATCC 19696) had identical 16S and 23S rDNA sequences. Species clusters were heterogeneous in both genes in the case of Acinetobacter lwoffii, Moraxella lacunata, and N. mucosa. Neisseria meningitidis isolates failed to cluster only in the 23S rDNA subset. Our data showed that the 16S rDNA region is more suitable than the partial 23S rDNA for the molecular diagnosis of Neisseriaceae and Moraxellaceae and that a reference database should include more than one strain of each species. All sequence chromatograms and taxonomic and disease-related information are available as part of our ribosomal differentiation of medical microorganisms (RIDOM) web-based service (http://www.ridom.hygiene.uni-wuerzburg.de/). Users can submit a sequence and conduct a similarity search against the RIDOM reference database for microbial identification purposes. PMID:11230407

  5. A Hybrid Semi-Digital Transimpedance Amplifier With Noise Cancellation Technique for Nanopore-Based DNA Sequencing.

    PubMed

    Hsu, Chung-Lun; Jiang, Haowei; Venkatesh, A G; Hall, Drew A

    2015-10-01

    Over the past two decades, nanopores have been a promising technology for next generation deoxyribonucleic acid (DNA) sequencing. Here, we present a hybrid semi-digital transimpedance amplifier (HSD-TIA) to sense the minute current signatures introduced by single-stranded DNA (ssDNA) translocating through a nanopore, while discharging the baseline current using a semi-digital feedback loop. The amplifier achieves fast settling by adaptively tuning a DC compensation current when a step input is detected. A noise cancellation technique reduces the total input-referred current noise caused by the parasitic input capacitance. Measurement results show the performance of the amplifier with 31.6 M Ω mid-band gain, 950 kHz bandwidth, and 8.5 fA/ √Hz input-referred current noise, a 2× noise reduction due to the noise cancellation technique. The settling response is demonstrated by observing the insertion of a protein nanopore in a lipid bilayer. Using the nanopore, the HSD-TIA was able to measure ssDNA translocation events.

  6. HmtDB 2016: data update, a better performing query system and human mitochondrial DNA haplogroup predictor

    PubMed Central

    Clima, Rosanna; Preste, Roberto; Calabrese, Claudia; Diroma, Maria Angela; Santorsola, Mariangela; Scioscia, Gaetano; Simone, Domenico; Shen, Lishuang; Gasparre, Giuseppe; Attimonelli, Marcella

    2017-01-01

    The HmtDB resource hosts a database of human mitochondrial genome sequences from individuals with healthy and disease phenotypes. The database is intended to support both population geneticists as well as clinicians undertaking the task to assess the pathogenicity of specific mtDNA mutations. The wide application of next-generation sequencing (NGS) has provided an enormous volume of high-resolution data at a low price, increasing the availability of human mitochondrial sequencing data, which called for a cogent and significant expansion of HmtDB data content that has more than tripled in the current release. We here describe additional novel features, including: (i) a complete, user-friendly restyling of the web interface, (ii) links to the command-line stand-alone and web versions of the MToolBox package, an up-to-date tool to reconstruct and analyze human mitochondrial DNA from NGS data and (iii) the implementation of the Reconstructed Sapiens Reference Sequence (RSRS) as mitochondrial reference sequence. The overall update renders HmtDB an even more handy and useful resource as it enables a more rapid data access, processing and analysis. HmtDB is accessible at http://www.hmtdb.uniba.it/. PMID:27899581

  7. Are special read alignment strategies necessary and cost-effective when handling sequencing reads from patient-derived tumor xenografts?

    PubMed

    Tso, Kai-Yuen; Lee, Sau Dan; Lo, Kwok-Wai; Yip, Kevin Y

    2014-12-23

    Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data. We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing. Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.

  8. Sequence analysis of cultivated strawberry (Fragaria × ananassa Duch.) using microdissected single somatic chromosomes.

    PubMed

    Yanagi, Tomohiro; Shirasawa, Kenta; Terachi, Mayuko; Isobe, Sachiko

    2017-01-01

    Cultivated strawberry ( Fragaria  ×  ananassa Duch.) has homoeologous chromosomes because of allo-octoploidy. For example, two homoeologous chromosomes that belong to different sub-genome of allopolyploids have similar base sequences. Thus, when conducting de novo assembly of DNA sequences, it is difficult to determine whether these sequences are derived from the same chromosome. To avoid the difficulties associated with homoeologous chromosomes and demonstrate the possibility of sequencing allopolyploids using single chromosomes, we conducted sequence analysis using microdissected single somatic chromosomes of cultivated strawberry. Three hundred and ten somatic chromosomes of the Japanese octoploid strawberry 'Reiko' were individually selected under a light microscope using a microdissection system. DNA from 288 of the dissected chromosomes was successfully amplified using a DNA amplification kit. Using next-generation sequencing, we decoded the base sequences of the amplified DNA segments, and on the basis of mapping, we identified DNA sequences from 144 samples that were best matched to the reference genomes of the octoploid strawberry, F.  ×  ananassa , and the diploid strawberry, F. vesca . The 144 samples were classified into seven pseudo-molecules of F. vesca . The coverage rates of the DNA sequences from the single chromosome onto all pseudo-molecular sequences varied from 3 to 29.9%. We demonstrated an efficient method for sequence analysis of allopolyploid plants using microdissected single chromosomes. On the basis of our results, we believe that whole-genome analysis of allopolyploid plants can be enhanced using methodology that employs microdissected single chromosomes.

  9. A simple method for semi-random DNA amplicon fragmentation using the methylation-dependent restriction enzyme MspJI.

    PubMed

    Shinozuka, Hiroshi; Cogan, Noel O I; Shinozuka, Maiko; Marshall, Alexis; Kay, Pippa; Lin, Yi-Han; Spangenberg, German C; Forster, John W

    2015-04-11

    Fragmentation at random nucleotide locations is an essential process for preparation of DNA libraries to be used on massively parallel short-read DNA sequencing platforms. Although instruments for physical shearing, such as the Covaris S2 focused-ultrasonicator system, and products for enzymatic shearing, such as the Nextera technology and NEBNext dsDNA Fragmentase kit, are commercially available, a simple and inexpensive method is desirable for high-throughput sequencing library preparation. MspJI is a recently characterised restriction enzyme which recognises the sequence motif CNNR (where R = G or A) when the first base is modified to 5-methylcytosine or 5-hydroxymethylcytosine. A semi-random enzymatic DNA amplicon fragmentation method was developed based on the unique cleavage properties of MspJI. In this method, random incorporation of 5-methyl-2'-deoxycytidine-5'-triphosphate is achieved through DNA amplification with DNA polymerase, followed by DNA digestion with MspJI. Due to the recognition sequence of the enzyme, DNA amplicons are fragmented in a relatively sequence-independent manner. The size range of the resulting fragments was capable of control through optimisation of 5-methyl-2'-deoxycytidine-5'-triphosphate concentration in the reaction mixture. A library suitable for sequencing using the Illumina MiSeq platform was prepared and processed using the proposed method. Alignment of generated short reads to a reference sequence demonstrated a relatively high level of random fragmentation. The proposed method may be performed with standard laboratory equipment. Although the uniformity of coverage was slightly inferior to the Covaris physical shearing procedure, due to efficiencies of cost and labour, the method may be more suitable than existing approaches for implementation in large-scale sequencing activities, such as bacterial artificial chromosome (BAC)-based genome sequence assembly, pan-genomic studies and locus-targeted genotyping-by-sequencing.

  10. Scanning the human genome at kilobase resolution.

    PubMed

    Chen, Jun; Kim, Yeong C; Jung, Yong-Chul; Xuan, Zhenyu; Dworkin, Geoff; Zhang, Yanming; Zhang, Michael Q; Wang, San Ming

    2008-05-01

    Normal genome variation and pathogenic genome alteration frequently affect small regions in the genome. Identifying those genomic changes remains a technical challenge. We report here the development of the DGS (Ditag Genome Scanning) technique for high-resolution analysis of genome structure. The basic features of DGS include (1) use of high-frequent restriction enzymes to fractionate the genome into small fragments; (2) collection of two tags from two ends of a given DNA fragment to form a ditag to represent the fragment; (3) application of the 454 sequencing system to reach a comprehensive ditag sequence collection; (4) determination of the genome origin of ditags by mapping to reference ditags from known genome sequences; (5) use of ditag sequences directly as the sense and antisense PCR primers to amplify the original DNA fragment. To study the relationship between ditags and genome structure, we performed a computational study by using the human genome reference sequences as a model, and analyzed the ditags experimentally collected from the well-characterized normal human DNA GM15510 and the leukemic human DNA of Kasumi-1 cells. Our studies show that DGS provides a kilobase resolution for studying genome structure with high specificity and high genome coverage. DGS can be applied to validate genome assembly, to compare genome similarity and variation in normal populations, and to identify genomic abnormality including insertion, inversion, deletion, translocation, and amplification in pathological genomes such as cancer genomes.

  11. Mapping the binding site of aflatoxin B/sub 1/ in DNA: systematic analysis of the reactivity of aflatoxin B/sub 1/ with guanines in different DNA sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Benasutti, M.; Ejadi, S.; Whitlow, M.D.

    The mutagenic and carcinogenic chemical aflatoxin B/sub 1/ (AFB/sub 1/) reacts almost exclusively at the N(7)-position of guanine following activation to its reactive form, the 8,9-epoxide (AFB/sub 1/ oxide). In general N(7)-guanine adducts yield DNA strand breaks when heated in base, a property that serves as the basis for the Maxam-Gilbert DNA sequencing reaction specific for guanine. Using DNA sequencing methods, other workers have shown that AFB/sub 1/ oxide gives strand breaks at positions of guanines; however, the guanine bands varied in intensity. This phenomenon has been used to infer that AFB/sub 1/ oxide prefers to react with guanines inmore » some sequence contexts more than in others and has been referred to as sequence specificity of binding. Herein, data on the reaction of AFB/sub 1/ oxide with several synthetic DNA polymers with different sequences are presented, and (following hydrolysis) adduct levels are determine by high-pressure liquid chromatography. These results reveal that for AFB/sub 1/ oxide (1) the N(7)-guanine adduct is the major adduct found in all of the DNA polymers, (2) adduct levels vary in different sequences, and, thus, sequence specificity is also observed by this more direct method, and (3) the intensity of bands in DNA sequencing gels is likely to reflect adduct levels formed at the N(7)-position of guanine. Knowing this, a reinvestigation of the reactivity of guanines in different DNA sequences using DNA sequencing methods was undertaken. Methods are developed to determine the X (5'-side) base and the Y (3'-side) base are most influential in determining guanine reactivity. These rules in conjunction with molecular modeling studies were used to assess the binding sites that might be utilized by AFB/sub 1/ oxide in its reaction with DNA.« less

  12. Reference System of DNA and Protein Sequences on CD-ROM

    NASA Astrophysics Data System (ADS)

    Nasu, Hisanori; Ito, Toshiaki

    DNASIS-DBREF31 is a database for DNA and Protein sequences in the form of optical Compact Disk (CD) ROM, developed and commercialized by Hitachi Software Engineering Co., Ltd. Both nucleic acid base sequences and protein amino acid sequences can be retrieved from a single CD-ROM. Existing database is offered in the form of on-line service, floppy disks, or magnetic tape, all of which have some problems or other, such as usability or storage capacity. DNASIS-DBREF31 newly adopt a CD-ROM as a database device to realize a mass storage and personal use of the database.

  13. Median network analysis of defectively sequenced entire mitochondrial genomes from early and contemporary disease studies.

    PubMed

    Bandelt, Hans-Jürgen; Yao, Yong-Gang; Bravi, Claudio M; Salas, Antonio; Kivisild, Toomas

    2009-03-01

    Sequence analysis of the mitochondrial genome has become a routine method in the study of mitochondrial diseases. Quite often, the sequencing efforts in the search of pathogenic or disease-associated mutations are affected by technical and interpretive problems, caused by sample mix-up, contamination, biochemical problems, incomplete sequencing, misdocumentation and insufficient reference to previously published data. To assess data quality in case studies of mitochondrial diseases, it is recommended to compare any mtDNA sequence under consideration to their phylogenetically closest lineages available in the Web. The median network method has proven useful for visualizing potential problems with the data. We contrast some early reports of complete mtDNA sequences to more recent total mtDNA sequencing efforts in studies of various mitochondrial diseases. We conclude that the quality of complete mtDNA sequences generated in the medical field in the past few years is somewhat unsatisfactory and may even fall behind that of pioneer manual sequencing in the early nineties. Our study provides a paradigm for an a posteriori evaluation of sequence quality and for detection of potential problems with inferring a pathogenic status of a particular mutation.

  14. DNA Barcoding for Species Identification of Insect Skins: A Test on Chironomidae (Diptera) Pupal Exuviae

    PubMed Central

    Ekrem, Torbjørn; Stur, Elisabeth

    2017-01-01

    Abstract Chironomidae (Diptera) pupal exuviae samples are commonly used for biological monitoring of aquatic habitats. DNA barcoding has proved useful for species identification of chironomid life stages containing cellular tissue, but the barcoding success of chironomid pupal exuviae is unknown. We assessed whether standard DNA barcoding could be efficiently used for species identification of chironomid pupal exuviae when compared with morphological techniques and if there were differences in performance between temperate and tropical ecosystems, subfamilies, and tribes. PCR, sequence, and identification success differed significantly between geographic regions and taxonomic groups. For Norway, 27 out of 190 (14.2%) of pupal exuviae resulted in high-quality chironomid sequences that match species. For Costa Rica, 69 out of 190 (36.3%) Costa Rican pupal exuviae resulted in high-quality sequences, but none matched known species. Standard DNA barcoding of chironomid pupal exuviae had limited success in species identification of unknown specimens due to contaminations and lack of matching references in available barcode libraries, especially from Costa Rica. Therefore, we recommend future biodiversity studies that focus their efforts on understudied regions, to simultaneously use morphological and molecular identification techniques to identify all life stages of chironomids and populate the barcode reference library with identified sequences.

  15. The value of new genome references.

    PubMed

    Worley, Kim C; Richards, Stephen; Rogers, Jeffrey

    2017-09-15

    Genomic information has become a ubiquitous and almost essential aspect of biological research. Over the last 10-15 years, the cost of generating sequence data from DNA or RNA samples has dramatically declined and our ability to interpret those data increased just as remarkably. Although it is still possible for biologists to conduct interesting and valuable research on species for which genomic data are not available, the impact of having access to a high quality whole genome reference assembly for a given species is nothing short of transformational. Research on a species for which we have no DNA or RNA sequence data is restricted in fundamental ways. In contrast, even access to an initial draft quality genome (see below for definitions) opens a wide range of opportunities that are simply not available without that reference genome assembly. Although a complete discussion of the impact of genome sequencing and assembly is beyond the scope of this short paper, the goal of this review is to summarize the most common and highest impact contributions that whole genome sequencing and assembly has had on comparative and evolutionary biology. Copyright © 2016. Published by Elsevier Inc.

  16. Characterization of Microbial Population Structures in Recreational Waters and Primary Sources of Fecal Pollution with a Next-Generation Sequencing Approach

    EPA Science Inventory

    The invention of new approaches to DNA sequencing commonly referred to as next generation sequencing technologies is revolutionizing the study of microbial diversity. In this chapter, we discuss the characterization of microbial population structures in recreational waters and p...

  17. An expanded mammal mitogenome dataset from Southeast Asia

    PubMed Central

    Ramos-Madrigal, Jazmín; Peñaloza, Fernando; Liu, Shanlin; Mikkel-Holger, S. Sinding; Riddhi, P. Patel; Martins, Renata; Lenz, Dorina; Fickel, Jörns; Roos, Christian; Shamsir, Mohd Shahir; Azman, Mohammad Shahfiz; Burton, K. Lim; Stephen, J. Rossiter; Wilting, Andreas

    2017-01-01

    Abstract Southeast (SE) Asia is 1 of the most biodiverse regions in the world, and it holds approximately 20% of all mammal species. Despite this, the majority of SE Asia's genetic diversity is still poorly characterized. The growing interest in using environmental DNA to assess and monitor SE Asian species, in particular threatened mammals—has created the urgent need to expand the available reference database of mitochondrial barcode and complete mitogenome sequences. We have partially addressed this need by generating 72 new mitogenome sequences reconstructed from DNA isolated from a range of historical and modern tissue samples. Approximately 55 gigabases of raw sequence were generated. From this data, we assembled 72 complete mitogenome sequences, with an average depth of coverage of ×102.9 and ×55.2 for modern samples and historical samples, respectively. This dataset represents 52 species, of which 30 species had no previous mitogenome data available. The mitogenomes were geotagged to their sampling location, where known, to display a detailed geographical distribution of the species. Our new database of 52 taxa will strongly enhance the utility of environmental DNA approaches for monitoring mammals in SE Asia as it greatly increases the likelihoods that identification of metabarcoding sequencing reads can be assigned to reference sequences. This magnifies the confidence in species detections and thus allows more robust surveys and monitoring programmes of SE Asia's threatened mammal biodiversity. The extensive collections of historical samples from SE Asia in western and SE Asian museums should serve as additional valuable material to further enrich this reference database. PMID:28873965

  18. An expanded mammal mitogenome dataset from Southeast Asia.

    PubMed

    Mohd Salleh, Faezah; Ramos-Madrigal, Jazmín; Peñaloza, Fernando; Liu, Shanlin; Mikkel-Holger, S Sinding; Riddhi, P Patel; Martins, Renata; Lenz, Dorina; Fickel, Jörns; Roos, Christian; Shamsir, Mohd Shahir; Azman, Mohammad Shahfiz; Burton, K Lim; Stephen, J Rossiter; Wilting, Andreas; Gilbert, M Thomas P

    2017-08-01

    Southeast (SE) Asia is 1 of the most biodiverse regions in the world, and it holds approximately 20% of all mammal species. Despite this, the majority of SE Asia's genetic diversity is still poorly characterized. The growing interest in using environmental DNA to assess and monitor SE Asian species, in particular threatened mammals-has created the urgent need to expand the available reference database of mitochondrial barcode and complete mitogenome sequences. We have partially addressed this need by generating 72 new mitogenome sequences reconstructed from DNA isolated from a range of historical and modern tissue samples. Approximately 55 gigabases of raw sequence were generated. From this data, we assembled 72 complete mitogenome sequences, with an average depth of coverage of ×102.9 and ×55.2 for modern samples and historical samples, respectively. This dataset represents 52 species, of which 30 species had no previous mitogenome data available. The mitogenomes were geotagged to their sampling location, where known, to display a detailed geographical distribution of the species. Our new database of 52 taxa will strongly enhance the utility of environmental DNA approaches for monitoring mammals in SE Asia as it greatly increases the likelihoods that identification of metabarcoding sequencing reads can be assigned to reference sequences. This magnifies the confidence in species detections and thus allows more robust surveys and monitoring programmes of SE Asia's threatened mammal biodiversity. The extensive collections of historical samples from SE Asia in western and SE Asian museums should serve as additional valuable material to further enrich this reference database. © The Author 2017. Published by Oxford University Press.

  19. Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi.

    PubMed

    Schoch, Conrad L; Robbertse, Barbara; Robert, Vincent; Vu, Duong; Cardinali, Gianluigi; Irinyi, Laszlo; Meyer, Wieland; Nilsson, R Henrik; Hughes, Karen; Miller, Andrew N; Kirk, Paul M; Abarenkov, Kessy; Aime, M Catherine; Ariyawansa, Hiran A; Bidartondo, Martin; Boekhout, Teun; Buyck, Bart; Cai, Qing; Chen, Jie; Crespo, Ana; Crous, Pedro W; Damm, Ulrike; De Beer, Z Wilhelm; Dentinger, Bryn T M; Divakar, Pradeep K; Dueñas, Margarita; Feau, Nicolas; Fliegerova, Katerina; García, Miguel A; Ge, Zai-Wei; Griffith, Gareth W; Groenewald, Johannes Z; Groenewald, Marizeth; Grube, Martin; Gryzenhout, Marieka; Gueidan, Cécile; Guo, Liangdong; Hambleton, Sarah; Hamelin, Richard; Hansen, Karen; Hofstetter, Valérie; Hong, Seung-Beom; Houbraken, Jos; Hyde, Kevin D; Inderbitzin, Patrik; Johnston, Peter R; Karunarathna, Samantha C; Kõljalg, Urmas; Kovács, Gábor M; Kraichak, Ekaphan; Krizsan, Krisztina; Kurtzman, Cletus P; Larsson, Karl-Henrik; Leavitt, Steven; Letcher, Peter M; Liimatainen, Kare; Liu, Jian-Kui; Lodge, D Jean; Luangsa-ard, Janet Jennifer; Lumbsch, H Thorsten; Maharachchikumbura, Sajeewa S N; Manamgoda, Dimuthu; Martín, María P; Minnis, Andrew M; Moncalvo, Jean-Marc; Mulè, Giuseppina; Nakasone, Karen K; Niskanen, Tuula; Olariaga, Ibai; Papp, Tamás; Petkovits, Tamás; Pino-Bodas, Raquel; Powell, Martha J; Raja, Huzefa A; Redecker, Dirk; Sarmiento-Ramirez, J M; Seifert, Keith A; Shrestha, Bhushan; Stenroos, Soili; Stielow, Benjamin; Suh, Sung-Oui; Tanaka, Kazuaki; Tedersoo, Leho; Telleria, M Teresa; Udayanga, Dhanushka; Untereiner, Wendy A; Diéguez Uribeondo, Javier; Subbarao, Krishna V; Vágvölgyi, Csaba; Visagie, Cobus; Voigt, Kerstin; Walker, Donald M; Weir, Bevan S; Weiß, Michael; Wijayawardene, Nalin N; Wingfield, Michael J; Xu, J P; Yang, Zhu L; Zhang, Ning; Zhuang, Wen-Ying; Federhen, Scott

    2014-01-01

    DNA phylogenetic comparisons have shown that morphology-based species recognition often underestimates fungal diversity. Therefore, the need for accurate DNA sequence data, tied to both correct taxonomic names and clearly annotated specimen data, has never been greater. Furthermore, the growing number of molecular ecology and microbiome projects using high-throughput sequencing require fast and effective methods for en masse species assignments. In this article, we focus on selecting and re-annotating a set of marker reference sequences that represent each currently accepted order of Fungi. The particular focus is on sequences from the internal transcribed spacer region in the nuclear ribosomal cistron, derived from type specimens and/or ex-type cultures. Re-annotated and verified sequences were deposited in a curated public database at the National Center for Biotechnology Information (NCBI), namely the RefSeq Targeted Loci (RTL) database, and will be visible during routine sequence similarity searches with NR_prefixed accession numbers. A set of standards and protocols is proposed to improve the data quality of new sequences, and we suggest how type and other reference sequences can be used to improve identification of Fungi. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353. Published by Oxford University Press 2013. This work is written by US Government employees and is in the public domain in the US.

  20. Moorea BIOCODE barcode library as a tool for understanding predator-prey interactions: insights into the diet of common predatory coral reef fishes

    NASA Astrophysics Data System (ADS)

    Leray, M.; Boehm, J. T.; Mills, S. C.; Meyer, C. P.

    2012-06-01

    Identifying species involved in consumer-resource interactions is one of the main limitations in the construction of food webs. DNA barcoding of prey items in predator guts provides a valuable tool for characterizing trophic interactions, but the method relies on the availability of reference sequences to which prey sequences can be matched. In this study, we demonstrate that the COI sequence library of the Moorea BIOCODE project, an ecosystem-level barcode initiative, enables the identification of a large proportion of semi-digested fish, crustacean and mollusks found in the guts of three Hawkfish and two Squirrelfish species. While most prey remains lacked diagnostic morphological characters, 94% of the prey found in 67 fishes had >98% sequence similarity with BIOCODE reference sequences. Using this species-level prey identification, we demonstrate how DNA barcoding can provide insights into resource partitioning, predator feeding behaviors and the consequences of predation on ecosystem function.

  1. Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

    PubMed

    Buschmann, Tilo; Zhang, Rong; Brash, Douglas E; Bystrykh, Leonid V

    2014-08-07

    DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements. In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples. Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.

  2. Mitochondrial DNA control region sequences from Nairobi (Kenya): inferring phylogenetic parameters for the establishment of a forensic database.

    PubMed

    Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J

    2004-10-01

    Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.

  3. Assessing the utility of eDNA as a tool to survey reef-fish communities in the Red Sea

    NASA Astrophysics Data System (ADS)

    DiBattista, Joseph D.; Coker, Darren J.; Sinclair-Taylor, Tane H.; Stat, Michael; Berumen, Michael L.; Bunce, Michael

    2017-12-01

    Relatively small volumes of water may contain sufficient environmental DNA (eDNA) to detect target aquatic organisms via genetic sequencing. We therefore assessed the utility of eDNA to document the diversity of coral reef fishes in the central Red Sea. DNA from seawater samples was extracted, amplified using fish-specific 16S mitochondrial DNA primers, and sequenced using a metabarcoding workflow. DNA sequences were assigned to taxa using available genetic repositories or custom genetic databases generated from reference fishes. Our approach revealed a diversity of conspicuous, cryptobenthic, and commercially relevant reef fish at the genus level, with select genera in the family Labridae over-represented. Our approach, however, failed to capture a significant fraction of the fish fauna known to inhabit the Red Sea, which we attribute to limited spatial sampling, amplification stochasticity, and an apparent lack of sequencing depth. Given an increase in fish species descriptions, completeness of taxonomic checklists, and improvement in species-level assignment with custom genetic databases as shown here, we suggest that the Red Sea region may be ideal for further testing of the eDNA approach.

  4. From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes

    PubMed Central

    2014-01-01

    Background Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Results Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). Conclusions All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources. PMID:24460871

  5. Single haplotype assembly of the human genome from a hydatidiform mole.

    PubMed

    Steinberg, Karyn Meltz; Schneider, Valerie A; Graves-Lindsay, Tina A; Fulton, Robert S; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C; Church, Deanna M; Eichler, Evan E; Wilson, Richard K

    2014-12-01

    A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. © 2014 Steinberg et al.; Published by Cold Spring Harbor Laboratory Press.

  6. Single haplotype assembly of the human genome from a hydatidiform mole

    PubMed Central

    Steinberg, Karyn Meltz; Schneider, Valerie A.; Graves-Lindsay, Tina A.; Fulton, Robert S.; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A.; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C.; Church, Deanna M.; Eichler, Evan E.; Wilson, Richard K.

    2014-01-01

    A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. PMID:25373144

  7. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing

    PubMed Central

    Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis. PMID:26505622

  8. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    PubMed

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis.

  9. Identification of Genomic Insertion and Flanking Sequence of G2-EPSPS and GAT Transgenes in Soybean Using Whole Genome Sequencing Method.

    PubMed

    Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan

    2016-01-01

    Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.

  10. The problems and promise of DNA barcodes for species diagnosis of primate biomaterials

    PubMed Central

    Lorenz, Joseph G; Jackson, Whitney E; Beck, Jeanne C; Hanner, Robert

    2005-01-01

    The Integrated Primate Biomaterials and Information Resource (www.IPBIR.org) provides essential research reagents to the scientific community by establishing, verifying, maintaining, and distributing DNA and RNA derived from primate cell cultures. The IPBIR uses mitochondrial cytochrome c oxidase subunit I sequences to verify the identity of samples for quality control purposes in the accession, cell culture, DNA extraction processes and prior to shipping to end users. As a result, IPBIR is accumulating a database of ‘DNA barcodes’ for many species of primates. However, this quality control process is complicated by taxon specific patterns of ‘universal primer’ failure, as well as the amplification or co-amplification of nuclear pseudogenes of mitochondrial origins. To overcome these difficulties, taxon specific primers have been developed, and reverse transcriptase PCR is utilized to exclude these extraneous sequences from amplification. DNA barcoding of primates has applications to conservation and law enforcement. Depositing barcode sequences in a public database, along with primer sequences, trace files and associated quality scores, makes this species identification technique widely accessible. Reference DNA barcode sequences should be derived from, and linked to, specimens of known provenance in web-accessible collections in order to validate this system of molecular diagnostics. PMID:16214744

  11. Identification of Belgian mosquito species (Diptera: Culicidae) by DNA barcoding.

    PubMed

    Versteirt, V; Nagy, Z T; Roelants, P; Denis, L; Breman, F C; Damiens, D; Dekoninck, W; Backeljau, T; Coosemans, M; Van Bortel, W

    2015-03-01

    Since its introduction in 2003, DNA barcoding has proven to be a promising method for the identification of many taxa, including mosquitoes (Diptera: Culicidae). Many mosquito species are potential vectors of pathogens, and correct identification in all life stages is essential for effective mosquito monitoring and control. To use DNA barcoding for species identification, a reliable and comprehensive reference database of verified DNA sequences is required. Hence, DNA sequence diversity of mosquitoes in Belgium was assessed using a 658 bp fragment of the mitochondrial cytochrome oxidase I (COI) gene, and a reference data set was established. Most species appeared as well-supported clusters. Intraspecific Kimura 2-parameter (K2P) distances averaged 0.7%, and the maximum observed K2P distance was 6.2% for Aedes koreicus. A small overlap between intra- and interspecific K2P distances for congeneric sequences was observed. Overall, the identification success using best match and the best close match criteria were high, that is above 98%. No clear genetic division was found between the closely related species Aedes annulipes and Aedes cantans, which can be confused using morphological identification only. The members of the Anopheles maculipennis complex, that is Anopheles maculipennis s.s. and An. messeae, were weakly supported as monophyletic taxa. This study showed that DNA barcoding offers a reliable framework for mosquito species identification in Belgium except for some closely related species. © 2014 John Wiley & Sons Ltd.

  12. Genome sequence determination and metagenomic characterization of a Dehalococcoides mixed culture grown on cis-1,2-dichloroethene.

    PubMed

    Yohda, Masafumi; Yagi, Osami; Takechi, Ayane; Kitajima, Mizuki; Matsuda, Hisashi; Miyamura, Naoaki; Aizawa, Tomoko; Nakajima, Mutsuyasu; Sunairi, Michio; Daiba, Akito; Miyajima, Takashi; Teruya, Morimi; Teruya, Kuniko; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Juan, Ayaka; Nakano, Kazuma; Aoyama, Misako; Terabayashi, Yasunobu; Satou, Kazuhito; Hirano, Takashi

    2015-07-01

    A Dehalococcoides-containing bacterial consortium that performed dechlorination of 0.20 mM cis-1,2-dichloroethene to ethene in 14 days was obtained from the sediment mud of the lotus field. To obtain detailed information of the consortium, the metagenome was analyzed using the short-read next-generation sequencer SOLiD 3. Matching the obtained sequence tags with the reference genome sequences indicated that the Dehalococcoides sp. in the consortium was highly homologous to Dehalococcoides mccartyi CBDB1 and BAV1. Sequence comparison with the reference sequence constructed from 16S rRNA gene sequences in a public database showed the presence of Sedimentibacter, Sulfurospirillum, Clostridium, Desulfovibrio, Parabacteroides, Alistipes, Eubacterium, Peptostreptococcus and Proteocatella in addition to Dehalococcoides sp. After further enrichment, the members of the consortium were narrowed down to almost three species. Finally, the full-length circular genome sequence of the Dehalococcoides sp. in the consortium, D. mccartyi IBARAKI, was determined by analyzing the metagenome with the single-molecule DNA sequencer PacBio RS. The accuracy of the sequence was confirmed by matching it to the tag sequences obtained by SOLiD 3. The genome is 1,451,062 nt and the number of CDS is 1566, which includes 3 rRNA genes and 47 tRNA genes. There exist twenty-eight RDase genes that are accompanied by the genes for anchor proteins. The genome exhibits significant sequence identity with other Dehalococcoides spp. throughout the genome, but there exists significant difference in the distribution RDase genes. The combination of a short-read next-generation DNA sequencer and a long-read single-molecule DNA sequencer gives detailed information of a bacterial consortium. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  13. 16S-23S rDNA intergenic spacer region polymorphism of Lactococcus garvieae, Lactococcus raffinolactis and Lactococcus lactis as revealed by PCR and nucleotide sequence analysis.

    PubMed

    Blaiotta, Giuseppe; Pepe, Olimpia; Mauriello, Gianluigi; Villani, Francesco; Andolfi, Rosamaria; Moschetti, Giancarlo

    2002-12-01

    The intergenic spacer region (ISR) between the 16S and 23S rRNA genes was tested as a tool for differentiating lactococci commonly isolated in a dairy environment. 17 reference strains, representing 11 different species belonging to the genera Lactococcus, Streptococcus, Lactobacillus, Enterococcus and Leuconostoc, and 127 wild streptococcal strains isolated during the whole fermentation process of "Fior di Latte" cheese were analyzed. After 16S-23S rDNA ISR amplification by PCR, species or genus-specific patterns were obtained for most of the reference strains tested. Moreover, results obtained after nucleotide analysis show that the 16S-23S rDNA ISR sequences vary greatly, in size and sequence, among Lactococcus garvieae, Lactococcus raffinolactis, Lactococcus lactis as well as other streptococci from dairy environments. Because of the high degree of inter-specific polymorphism observed, 16S-23S rDNA ISR can be considered a good potential target for selecting species-specific molecular assays, such as PCR primer or probes, for a rapid and extremely reliable differentiation of dairy lactococcal isolates.

  14. Quantum sequencing: opportunities and challenges

    NASA Astrophysics Data System (ADS)

    di Ventra, Massimiliano

    Personalized or precision medicine refers to the ability of tailoring drugs to the specific genome and transcriptome of each individual. It is however not yet feasible due the high costs and slow speed of present DNA sequencing methods. I will discuss a sequencing protocol that requires the measurement of the distributions of transverse tunneling currents during the translocation of single-stranded DNA into nanochannels. I will show that such a quantum sequencing approach can reach unprecedented speeds, without requiring any chemical preparation, amplification or labeling. I will discuss recent experiments that support these theoretical predictions, the advantages of this approach over other sequencing methods, and stress the challenges that need to be overcome to render it commercially viable.

  15. The campaign to DNA barcode all fishes, FISH-BOL.

    PubMed

    Ward, R D; Hanner, R; Hebert, P D N

    2009-02-01

    FISH-BOL, the Fish Barcode of Life campaign, is an international research collaboration that is assembling a standardized reference DNA sequence library for all fishes. Analysis is targeting a 648 base pair region of the mitochondrial cytochrome c oxidase I (COI) gene. More than 5000 species have already been DNA barcoded, with an average of five specimens per species, typically vouchers with authoritative identifications. The barcode sequence from any fish, fillet, fin, egg or larva can be matched against these reference sequences using BOLD; the Barcode of Life Data System (http://www.barcodinglife.org). The benefits of barcoding fishes include facilitating species identification, highlighting cases of range expansion for known species, flagging previously overlooked species and enabling identifications where traditional methods cannot be applied. Results thus far indicate that barcodes separate c. 98 and 93% of already described marine and freshwater fish species, respectively. Several specimens with divergent barcode sequences have been confirmed by integrative taxonomic analysis as new species. Past concerns in relation to the use of fish barcoding for species discrimination are discussed. These include hybridization, recent radiations, regional differentiation in barcode sequences and nuclear copies of the barcode region. However, current results indicate these issues are of little concern for the great majority of specimens.

  16. HmtDB 2016: data update, a better performing query system and human mitochondrial DNA haplogroup predictor.

    PubMed

    Clima, Rosanna; Preste, Roberto; Calabrese, Claudia; Diroma, Maria Angela; Santorsola, Mariangela; Scioscia, Gaetano; Simone, Domenico; Shen, Lishuang; Gasparre, Giuseppe; Attimonelli, Marcella

    2017-01-04

    The HmtDB resource hosts a database of human mitochondrial genome sequences from individuals with healthy and disease phenotypes. The database is intended to support both population geneticists as well as clinicians undertaking the task to assess the pathogenicity of specific mtDNA mutations. The wide application of next-generation sequencing (NGS) has provided an enormous volume of high-resolution data at a low price, increasing the availability of human mitochondrial sequencing data, which called for a cogent and significant expansion of HmtDB data content that has more than tripled in the current release. We here describe additional novel features, including: (i) a complete, user-friendly restyling of the web interface, (ii) links to the command-line stand-alone and web versions of the MToolBox package, an up-to-date tool to reconstruct and analyze human mitochondrial DNA from NGS data and (iii) the implementation of the Reconstructed Sapiens Reference Sequence (RSRS) as mitochondrial reference sequence. The overall update renders HmtDB an even more handy and useful resource as it enables a more rapid data access, processing and analysis. HmtDB is accessible at http://www.hmtdb.uniba.it/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. The genetic diversity of Epstein-Barr virus in the setting of transplantation relative to non-transplant settings: A feasibility study.

    PubMed

    Allen, Upton D; Hu, Pingzhao; Pereira, Sergio L; Robinson, Joan L; Paton, Tara A; Beyene, Joseph; Khodai-Booran, Nasser; Dipchand, Anne; Hébert, Diane; Ng, Vicky; Nalpathamkalam, Thomas; Read, Stanley

    2016-02-01

    This study examines EBV strains from transplant patients and patients with IM by sequencing major EBV genes. We also used NGS to detect EBV DNA within total genomic DNA, and to evaluate its genetic variation. Sanger sequencing of major EBV genes was used to compare SNVs from samples taken from transplant patients vs. patients with IM. We sequenced EBV DNA from a healthy EBV-seropositive individual on a HiSeq 2000 instrument. Data were mapped to the EBV reference genomes (AG876 and B95-8). The number of EBNA2 SNVs was higher than for EBNA1 and the other genes sequenced within comparable reference coordinates. For EBNA2, there was a median of 15 SNV among transplant samples compared with 10 among IM samples (p = 0.036). EBNA1 showed little variation between samples. For NGS, we identified 640 and 892 variants at an unadjusted p value of 5 × 10(-8) for AG876 and B95-8 genomes, respectively. We used complementary sequence strategies to examine EBV genetic diversity and its application to transplantation. The results provide the framework for further characterization of EBV strains and related outcomes after organ transplantation. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  18. Characterization of NIST human mitochondrial DNA SRM-2392 and SRM-2392-I standard reference materials by next generation sequencing.

    PubMed

    Riman, Sarah; Kiesler, Kevin M; Borsuk, Lisa A; Vallone, Peter M

    2017-07-01

    Standard Reference Materials SRM 2392 and 2392-I are intended to provide quality control when amplifying and sequencing human mitochondrial genome sequences. The National Institute of Standards and Technology (NIST) offers these SRMs to laboratories performing DNA-based forensic human identification, molecular diagnosis of mitochondrial diseases, mutation detection, evolutionary anthropology, and genetic genealogy. The entire mtGenome (∼16569bp) of SRM 2392 and 2392-I have previously been characterized at NIST by Sanger sequencing. Herein, we used the sensitivity, specificity, and accuracy offered by next generation sequencing (NGS) to: (1) re-sequence the certified values of the SRM 2392 and 2392-I; (2) confirm Sanger data with a high coverage new sequencing technology; (3) detect lower level heteroplasmies (<20%); and thus (4) support mitochondrial sequencing communities in the adoption of NGS methods. To obtain a consensus sequence for the SRMs as well as identify and control any bias, sequencing was performed using two NGS platforms and data was analyzed using different bioinformatics pipelines. Our results confirm five low level heteroplasmy sites that were not previously observed with Sanger sequencing: three sites in the GM09947A template in SRM 2392 and two sites in the HL-60 template in SRM 2392-I. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. ABI Base Recall: Automatic Correction and Ends Trimming of DNA Sequences.

    PubMed

    Elyazghi, Zakaria; Yazouli, Loubna El; Sadki, Khalid; Radouani, Fouzia

    2017-12-01

    Automated DNA sequencers produce chromatogram files in ABI format. When viewing chromatograms, some ambiguities are shown at various sites along the DNA sequences, because the program implemented in the sequencing machine and used to call bases cannot always precisely determine the right nucleotide, especially when it is represented by either a broad peak or a set of overlaying peaks. In such cases, a letter other than A, C, G, or T is recorded, most commonly N. Thus, DNA sequencing chromatograms need manual examination: checking for mis-calls and truncating the sequence when errors become too frequent. The purpose of this paper is to develop a program allowing the automatic correction of these ambiguities. This application is a Web-based program powered by Shiny and runs under R platform for an easy exploitation. As a part of the interface, we added the automatic ends clipping option, alignment against reference sequences, and BLAST. To develop and test our tool, we collected several bacterial DNA sequences from different laboratories within Institut Pasteur du Maroc and performed both manual and automatic correction. The comparison between the two methods was carried out. As a result, we note that our program, ABI base recall, accomplishes good correction with a high accuracy. Indeed, it increases the rate of identity and coverage and minimizes the number of mismatches and gaps, hence it provides solution to sequencing ambiguities and saves biologists' time and labor.

  20. Transposon-like properties of the major, long repetitive sequence family in the genome of Physarum polycephalum

    PubMed Central

    Pearston, Douglas H.; Gordon, Mairi; Hardman, Norman

    1985-01-01

    A family of long, highly-repetitive sequences, referred to previously as `HpaII-repeats', dominates the genome of the eukaryotic slime mould Physarum polycephalum. These sequences are found exclusively in scrambled clusters. They account for about one-half of the total complement of repetitive DNA in Physarum, and represent the major sequence component found in hypermethylated, 20-50 kb segments of Physarum genomic DNA that fail to be cleaved using the restriction endonuclease HpaII. The structure of this abundant repetitive element was investigated by analysing cloned segments derived from the hypermethylated genomic DNA compartment. We show that the `HpaII-repeat' forms part of a larger repetitive DNA structure, ∼8.6 kb in length, with several structural features in common with recognised eukaryotic transposable genetic elements. Scrambled clusters of the sequence probably arise as a result of transposition-like events, during which the element preferentially recombines in either orientation with target sites located in other copies of the same repeated sequence. The target sites for transposition/recombination are not related in sequence but in all cases studied they are potentially capable of promoting the formation of small `cruciforms' or `Z-DNA' structures which might be recognised during the recombination process. ImagesFig. 3.Fig. 4. PMID:16453652

  1. A high-throughput Sanger strategy for human mitochondrial genome sequencing

    PubMed Central

    2013-01-01

    Background A population reference database of complete human mitochondrial genome (mtGenome) sequences is needed to enable the use of mitochondrial DNA (mtDNA) coding region data in forensic casework applications. However, the development of entire mtGenome haplotypes to forensic data quality standards is difficult and laborious. A Sanger-based amplification and sequencing strategy that is designed for automated processing, yet routinely produces high quality sequences, is needed to facilitate high-volume production of these mtGenome data sets. Results We developed a robust 8-amplicon Sanger sequencing strategy that regularly produces complete, forensic-quality mtGenome haplotypes in the first pass of data generation. The protocol works equally well on samples representing diverse mtDNA haplogroups and DNA input quantities ranging from 50 pg to 1 ng, and can be applied to specimens of varying DNA quality. The complete workflow was specifically designed for implementation on robotic instrumentation, which increases throughput and reduces both the opportunities for error inherent to manual processing and the cost of generating full mtGenome sequences. Conclusions The described strategy will assist efforts to generate complete mtGenome haplotypes which meet the highest data quality expectations for forensic genetic and other applications. Additionally, high-quality data produced using this protocol can be used to assess mtDNA data developed using newer technologies and chemistries. Further, the amplification strategy can be used to enrich for mtDNA as a first step in sample preparation for targeted next-generation sequencing. PMID:24341507

  2. msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data.

    PubMed

    Mayne, Benjamin T; Leemaqz, Shalem Y; Buckberry, Sam; Rodriguez Lopez, Carlos M; Roberts, Claire T; Bianco-Miotto, Tina; Breen, James

    2018-02-01

    Genotyping-by-sequencing (GBS) or restriction-site associated DNA marker sequencing (RAD-seq) is a practical and cost-effective method for analysing large genomes from high diversity species. This method of sequencing, coupled with methylation-sensitive enzymes (often referred to as methylation-sensitive restriction enzyme sequencing or MRE-seq), is an effective tool to study DNA methylation in parts of the genome that are inaccessible in other sequencing techniques or are not annotated in microarray technologies. Current software tools do not fulfil all methylation-sensitive restriction sequencing assays for determining differences in DNA methylation between samples. To fill this computational need, we present msgbsR, an R package that contains tools for the analysis of methylation-sensitive restriction enzyme sequencing experiments. msgbsR can be used to identify and quantify read counts at methylated sites directly from alignment files (BAM files) and enables verification of restriction enzyme cut sites with the correct recognition sequence of the individual enzyme. In addition, msgbsR assesses DNA methylation based on read coverage, similar to RNA sequencing experiments, rather than methylation proportion and is a useful tool in analysing differential methylation on large populations. The package is fully documented and available freely online as a Bioconductor package ( https://bioconductor.org/packages/release/bioc/html/msgbsR.html ).

  3. Effect of endogenous reference genes on digital PCR assessment of genetically engineered canola events.

    PubMed

    Demeke, Tigst; Eng, Monika

    2018-05-01

    Droplet digital PCR (ddPCR) has been used for absolute quantification of genetically engineered (GE) events. Absolute quantification of GE events by duplex ddPCR requires the use of appropriate primers and probes for target and reference gene sequences in order to accurately determine the amount of GE materials. Single copy reference genes are generally preferred for absolute quantification of GE events by ddPCR. Study has not been conducted on a comparison of reference genes for absolute quantification of GE canola events by ddPCR. The suitability of four endogenous reference sequences ( HMG-I/Y , FatA(A), CruA and Ccf) for absolute quantification of GE canola events by ddPCR was investigated. The effect of DNA extraction methods and DNA quality on the assessment of reference gene copy numbers was also investigated. ddPCR results were affected by the use of single vs. two copy reference genes. The single copy, FatA(A), reference gene was found to be stable and suitable for absolute quantification of GE canola events by ddPCR. For the copy numbers measured, the HMG-I/Y reference gene was less consistent than FatA(A) reference gene. The expected ddPCR values were underestimated when CruA and Ccf (two copy endogenous Cruciferin sequences) were used because of high number of copies. It is important to make an adjustment if two copy reference genes are used for ddPCR in order to obtain accurate results. On the other hand, real-time quantitative PCR results were not affected by the use of single vs. two copy reference genes.

  4. Triazole-linked DNA as a primer surrogate in the synthesis of first-strand cDNA.

    PubMed

    Fujino, Tomoko; Yasumoto, Ken-ichi; Yamazaki, Naomi; Hasome, Ai; Sogawa, Kazuhiro; Isobe, Hiroyuki

    2011-11-04

    A phosphate-eliminated nonnatural oligonucleotide serves as a primer surrogate in reverse transcription reaction of mRNA. Despite of the nonnatural triazole linkages in the surrogate, the reverse transcriptase effectively elongated cDNA sequences on the 3'-downstream of the primer by transcription of the complementary sequence of mRNA. A structure-activity comparison with the reference natural oligonucleotides shows the superior priming activity of the surrogate containing triazole-linkages. The nonnatural linkages also protect the transcribed cDNA from digestion reactions with 5'-exonuclease and enable us to remove noise transcripts of unknown origins. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Molecular characterization of the probiotic strain Bacillus cereus var. toyoi NCIMB 40112 and differentiation from food poisoning strains.

    PubMed

    Klein, Günter

    2011-07-01

    Bacillus cereus var. toyoi strain NCIMB 40112 (Toyocerin), a probiotic authorized in the European Union as feed additive for swine, bovines, poultry, and rabbits, was characterized by DNA fingerprinting applying pulsed-field gel electrophoresis and multilocus sequence typing and was compared with reference strains (of clinical and environmental origins). The probiotic strain was clearly characterized by pulsed-field gel electrophoresis using the restriction enzymes Apa I and Sma I resulting in unique DNA patterns. The comparison to the clinical reference strain B. cereus DSM 4312 was done with the same restriction enzymes, and again a clear differentiation of the two strains was possible by the resulting DNA patterns. The use of the restriction enzymes Apa I and Sma I is recommended for further studies. Furthermore, multilocus sequence typing analysis revealed a sequence type (ST 111) that was different from all known STs of B. cereus strains from food poisoning incidents. Thus, a strain characterization and differentiation from food poisoning strains for the probiotic strain was possible. Copyright ©, International Association for Food Protection

  6. Prunus transcription factors: breeding perspectives

    PubMed Central

    Bianchi, Valmor J.; Rubio, Manuel; Trainotti, Livio; Verde, Ignazio; Bonghi, Claudio; Martínez-Gómez, Pedro

    2015-01-01

    Many plant processes depend on differential gene expression, which is generally controlled by complex proteins called transcription factors (TFs). In peach, 1533 TFs have been identified, accounting for about 5.5% of the 27,852 protein-coding genes. These TFs are the reference for the rest of the Prunus species. TF studies in Prunus have been performed on the gene expression analysis of different agronomic traits, including control of the flowering process, fruit quality, and biotic and abiotic stress resistance. These studies, using quantitative RT-PCR, have mainly been performed in peach, and to a lesser extent in other species, including almond, apricot, black cherry, Fuji cherry, Japanese apricot, plum, and sour and sweet cherry. Other tools have also been used in TF studies, including cDNA-AFLP, LC-ESI-MS, RNA, and DNA blotting or mapping. More recently, new tools assayed include microarray and high-throughput DNA sequencing (DNA-Seq) and RNA sequencing (RNA-Seq). New functional genomics opportunities include genome resequencing and the well-known synteny among Prunus genomes and transcriptomes. These new functional studies should be applied in breeding programs in the development of molecular markers. With the genome sequences available, some strategies that have been used in model systems (such as SNP genotyping assays and genotyping-by-sequencing) may be applicable in the functional analysis of Prunus TFs as well. In addition, the knowledge of the gene functions and position in the peach reference genome of the TFs represents an additional advantage. These facts could greatly facilitate the isolation of genes via QTL (quantitative trait loci) map-based cloning in the different Prunus species, following the association of these TFs with the identified QTLs using the peach reference genome. PMID:26124770

  7. Description of polymerase chain reaction and sequencing DNA Mycobacterium tuberculosis from specimen sputum of tuberculosis patients in Medan

    NASA Astrophysics Data System (ADS)

    Lily; Siregar, Y.; Ilyas, S.

    2018-03-01

    This study purposed to describe the product Polymerase Chain Reaction (PCR) and sequencing of DNA Mycobacterium (M.) tuberculosis from sputum of tuberculosis (TB) patients in Medan. Sputum was collected from patients that diagnosed with pulmonary TB by a physician. Specimen processed by PCR method of Li et al. and sequencing at Macrogen Laboratory. All of 12 product PCR were showed brightness bands at 126 base pair (bp). These results indicated similarity to the study of Li et al. Sequencing analysis showed the presence of a mutation and non-mutation groups of M. tuberculosis. The reference and outcome berange of the mutation and non-mutation of M. tuberculosis were 56-107, 59-85, 60-120 and 63-94, respectively. The percentage bp difference between the outcome and references for mutation and non-mutation were 3.448-6.569and 3.278-7.428%, respectively. In conclusion, the successful amplification of PCR products in a 1.5% agarose gel electrophoresis where all 12 sputa contained rpoB-positive M. tuberculosis and 0.644% difference was found between the outcome with reference bp of the mutation and non-mutation M. tuberculosis groups.

  8. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  9. DNA sequence similarity recognition by hybridization to short oligomers

    DOEpatents

    Milosavljevic, Aleksandar

    1999-01-01

    Methods are disclosed for the comparison of nucleic acid sequences. Data is generated by hybridizing sets of oligomers with target nucleic acids. The data thus generated is manipulated simultaneously with respect to both (i) matching between oligomers and (ii) matching between oligomers and putative reference sequences available in databases. Using data compression methods to manipulate this mutual information, sequences for the target can be constructed.

  10. Variation of 45S rDNA intergenic spacers in Arabidopsis thaliana.

    PubMed

    Havlová, Kateřina; Dvořáčková, Martina; Peiro, Ramon; Abia, David; Mozgová, Iva; Vansáčová, Lenka; Gutierrez, Crisanto; Fajkus, Jiří

    2016-11-01

    Approximately seven hundred 45S rRNA genes (rDNA) in the Arabidopsis thaliana genome are organised in two 4 Mbp-long arrays of tandem repeats arranged in head-to-tail fashion separated by an intergenic spacer (IGS). These arrays make up 5 % of the A. thaliana genome. IGS are rapidly evolving sequences and frequent rearrangements inside the rDNA loci have generated considerable interspecific and even intra-individual variability which allows to distinguish among otherwise highly conserved rRNA genes. The IGS has not been comprehensively described despite its potential importance in regulation of rDNA transcription and replication. Here we describe the detailed sequence variation in the complete IGS of A. thaliana WT plants and provide the reference/consensus IGS sequence, as well as genomic DNA analysis. We further investigate mutants dysfunctional in chromatin assembly factor-1 (CAF-1) (fas1 and fas2 mutants), which are known to have a reduced number of rDNA copies, and plant lines with restored CAF-1 function (segregated from a fas1xfas2 genetic background) showing major rDNA rearrangements. The systematic rDNA loss in CAF-1 mutants leads to the decreased variability of the IGS and to the occurrence of distinct IGS variants. We present for the first time a comprehensive and representative set of complete IGS sequences, obtained by conventional cloning and by Pacific Biosciences sequencing. Our data expands the knowledge of the A. thaliana IGS sequence arrangement and variability, which has not been available in full and in detail until now. This is also the first study combining IGS sequencing data with RFLP analysis of genomic DNA.

  11. Diff-seq: A high throughput sequencing-based mismatch detection assay for DNA variant enrichment and discovery

    PubMed Central

    Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin

    2018-01-01

    Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139

  12. Improved multiple displacement amplification (iMDA) and ultraclean reagents.

    PubMed

    Motley, S Timothy; Picuri, John M; Crowder, Chris D; Minich, Jeremiah J; Hofstadler, Steven A; Eshoo, Mark W

    2014-06-06

    Next-generation sequencing sample preparation requires nanogram to microgram quantities of DNA; however, many relevant samples are comprised of only a few cells. Genomic analysis of these samples requires a whole genome amplification method that is unbiased and free of exogenous DNA contamination. To address these challenges we have developed protocols for the production of DNA-free consumables including reagents and have improved upon multiple displacement amplification (iMDA). A specialized ethylene oxide treatment was developed that renders free DNA and DNA present within Gram positive bacterial cells undetectable by qPCR. To reduce DNA contamination in amplification reagents, a combination of ion exchange chromatography, filtration, and lot testing protocols were developed. Our multiple displacement amplification protocol employs a second strand-displacing DNA polymerase, improved buffers, improved reaction conditions and DNA free reagents. The iMDA protocol, when used in combination with DNA-free laboratory consumables and reagents, significantly improved efficiency and accuracy of amplification and sequencing of specimens with moderate to low levels of DNA. The sensitivity and specificity of sequencing of amplified DNA prepared using iMDA was compared to that of DNA obtained with two commercial whole genome amplification kits using 10 fg (~1-2 bacterial cells worth) of bacterial genomic DNA as a template. Analysis showed >99% of the iMDA reads mapped to the template organism whereas only 0.02% of the reads from the commercial kits mapped to the template. To assess the ability of iMDA to achieve balanced genomic coverage, a non-stochastic amount of bacterial genomic DNA (1 pg) was amplified and sequenced, and data obtained were compared to sequencing data obtained directly from genomic DNA. The iMDA DNA and genomic DNA sequencing had comparable coverage 99.98% of the reference genome at ≥1X coverage and 99.9% at ≥5X coverage while maintaining both balance and representation of the genome. The iMDA protocol in combination with DNA-free laboratory consumables, significantly improved the ability to sequence specimens with low levels of DNA. iMDA has broad utility in metagenomics, diagnostics, ancient DNA analysis, pre-implantation embryo screening, single-cell genomics, whole genome sequencing of unculturable organisms, and forensic applications for both human and microbial targets.

  13. Clinical Application of Picodroplet Digital PCR Technology for Rapid Detection of EGFR T790M in Next-Generation Sequencing Libraries and DNA from Limited Tumor Samples.

    PubMed

    Borsu, Laetitia; Intrieri, Julie; Thampi, Linta; Yu, Helena; Riely, Gregory; Nafa, Khedoudja; Chandramohan, Raghu; Ladanyi, Marc; Arcila, Maria E

    2016-11-01

    Although next-generation sequencing (NGS) is a robust technology for comprehensive assessment of EGFR-mutant lung adenocarcinomas with acquired resistance to tyrosine kinase inhibitors, it may not provide sufficiently rapid and sensitive detection of the EGFR T790M mutation, the most clinically relevant resistance biomarker. Here, we describe a digital PCR (dPCR) assay for rapid T790M detection on aliquots of NGS libraries prepared for comprehensive profiling, fully maximizing broad genomic analysis on limited samples. Tumor DNAs from patients with EGFR-mutant lung adenocarcinomas and acquired resistance to epidermal growth factor receptor inhibitors were prepared for Memorial Sloan-Kettering-Integrated Mutation Profiling of Actionable Cancer Targets sequencing, a hybrid capture-based assay interrogating 410 cancer-related genes. Precapture library aliquots were used for rapid EGFR T790M testing by dPCR, and results were compared with NGS and locked nucleic acid-PCR Sanger sequencing (reference high sensitivity method). Seventy resistance samples showed 99% concordance with the reference high sensitivity method in accuracy studies. Input as low as 2.5 ng provided a sensitivity of 1% and improved further with increasing DNA input. dPCR on libraries required less DNA and showed better performance than direct genomic DNA. dPCR on NGS libraries is a robust and rapid approach to EGFR T790M testing, allowing most economical utilization of limited material for comprehensive assessment. The same assay can also be performed directly on any limited DNA source and cell-free DNA. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  14. The Status, Quality, and Expansion of the NIH Full-Length cDNA Project: The Mammalian Gene Collection (MGC)

    PubMed Central

    2004-01-01

    The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5′-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline. PMID:15489334

  15. A genomic landscape of mitochondrial DNA insertions in the pig nuclear genome provides evolutionary signatures of interspecies admixture.

    PubMed

    Schiavo, Giuseppina; Hoffmann, Orsolya Ivett; Ribani, Anisa; Utzeri, Valerio Joe; Ghionda, Marco Ciro; Bertolini, Francesca; Geraci, Claudia; Bovo, Samuele; Fontanesi, Luca

    2017-10-01

    Nuclear DNA sequences of mitochondrial origin (numts) are derived by insertion of mitochondrial DNA (mtDNA), into the nuclear genome. In this study, we provide, for the first time, a genome picture of numts inserted in the pig nuclear genome. The Sus scrofa reference nuclear genome (Sscrofa10.2) was aligned with circularized and consensus mtDNA sequences using LAST software. A total of 430 numt sequences that may represent 246 different numt integration events (57 numt regions determined by at least two numt sequences and 189 singletons) were identified, covering about 0.0078% of the nuclear genome. Numt integration events were correlated (0.99) to the chromosome length. The longest numt sequence (about 11 kbp) was located on SSC2. Six numts were sequenced and PCR amplified in pigs of European commercial and local pig breeds, of the Chinese Meishan breed and in European wild boars. Three of them were polymorphic for the presence or absence of the insertion. Surprisingly, the estimated age of insertion of two of the three polymorphic numts was more ancient than that of the speciation time of the Sus scrofa, supporting that these polymorphic sites were originated from interspecies admixture that contributed to shape the pig genome. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  16. Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing

    PubMed Central

    Mak, Sarah Siu Tze; Gopalakrishnan, Shyam; Carøe, Christian; Geng, Chunyu; Liu, Shanlin; Sinding, Mikkel-Holger S; Kuderna, Lukas F K; Zhang, Wenwei; Fu, Shujin; Vieira, Filipe G; Germonpré, Mietje; Bocherens, Hervé; Fedorov, Sergey; Petersen, Bent; Sicheritz-Pontén, Thomas; Marques-Bonet, Tomas; Zhang, Guojie; Jiang, Hui; Gilbert, M Thomas P

    2017-01-01

    Abstract Ancient DNA research has been revolutionized following development of next-generation sequencing platforms. Although a number of such platforms have been applied to ancient DNA samples, the Illumina series are the dominant choice today, mainly because of high production capacities and short read production. Recently a potentially attractive alternative platform for palaeogenomic data generation has been developed, the BGISEQ-500, whose sequence output are comparable with the Illumina series. In this study, we modified the standard BGISEQ-500 library preparation specifically for use on degraded DNA, then directly compared the sequencing performance and data quality of the BGISEQ-500 to the Illumina HiSeq2500 platform on DNA extracted from 8 historic and ancient dog and wolf samples. The data generated were largely comparable between sequencing platforms, with no statistically significant difference observed for parameters including level (P = 0.371) and average sequence length (P = 0718) of endogenous nuclear DNA, sequence GC content (P = 0.311), double-stranded DNA damage rate (v. 0.309), and sequence clonality (P = 0.093). Small significant differences were found in single-strand DNA damage rate (δS; slightly lower for the BGISEQ-500, P = 0.011) and the background rate of difference from the reference genome (θ; slightly higher for BGISEQ-500, P = 0.012). This may result from the differences in amplification cycles used to polymerase chain reaction–amplify the libraries. A significant difference was also observed in the mitochondrial DNA percentages recovered (P = 0.018), although we believe this is likely a stochastic effect relating to the extremely low levels of mitochondria that were sequenced from 3 of the samples with overall very low levels of endogenous DNA. Although we acknowledge that our analyses were limited to animal material, our observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA. PMID:28854615

  17. Mansonella ozzardi mitogenome and pseudogene characterisation provides new perspectives on filarial parasite systematics and CO-1 barcoding.

    PubMed

    Crainey, James Lee; Marín, Michel Abanto; Silva, Túllio Romão Ribeiro da; de Medeiros, Jansen Fernandes; Pessoa, Felipe Arley Costa; Santos, Yago Vinícius; Vicente, Ana Carolina Paulo; Luz, Sérgio Luiz Bessa

    2018-04-18

    Despite the broad distribution of M. ozzardi in Latin America and the Caribbean, there is still very little DNA sequence data available to study this neglected parasite's epidemiology. Mitochondrial DNA (mtDNA) sequences, especially the cytochrome oxidase (CO1) gene's barcoding region, have been targeted successfully for filarial diagnostics and for epidemiological, ecological and evolutionary studies. MtDNA-based studies can, however, be compromised by unrecognised mitochondrial pseudogenes, such as Numts. Here, we have used shot-gun Illumina-HiSeq sequencing to recover the first complete Mansonella genus mitogenome and to identify several mitochondrial-origin pseudogenes. Mitogenome phylogenetic analysis placed M. ozzardi in the Onchocercidae "ONC5" clade and suggested that Mansonella parasites are more closely related to Wuchereria and Brugia genera parasites than they are to Loa genus parasites. DNA sequence alignments, BLAST searches and conceptual translations have been used to compliment phylogenetic analysis showing that M. ozzardi from the Amazon and Caribbean regions are near-identical and that previously reported Peruvian M. ozzardi CO1 reference sequences are probably of pseudogene origin. In addition to adding a much-needed resource to the Mansonella genus's molecular tool-kit and providing evidence that some M. ozzardi CO1 sequence deposits are pseudogenes, our results suggest that all Neotropical M. ozzardi parasites are closely related.

  18. Genomic characterization reconfirms the taxonomic status of Lactobacillus parakefiri

    PubMed Central

    TANIZAWA, Yasuhiro; KOBAYASHI, Hisami; KAMINUMA, Eli; SAKAMOTO, Mitsuo; OHKUMA, Moriya; NAKAMURA, Yasukazu; ARITA, Masanori; TOHNO, Masanori

    2017-01-01

    Whole-genome sequencing was performed for Lactobacillus parakefiri JCM 8573T to confirm its hitherto controversial taxonomic position. Here, we report its first reliable reference genome. Genome-wide metrics, such as average nucleotide identity and digital DNA-DNA hybridization, and phylogenomic analysis based on multiple genes supported its taxonomic status as a distinct species in the genus Lactobacillus. The availability of a reliable genome sequence will aid future investigations on the industrial applications of L. parakefiri in functional foods such as kefir grains. PMID:28748134

  19. Intrinsic flexibility of B-DNA: the experimental TRX scale.

    PubMed

    Heddi, Brahim; Oguey, Christophe; Lavelle, Christophe; Foloppe, Nicolas; Hartmann, Brigitte

    2010-01-01

    B-DNA flexibility, crucial for DNA-protein recognition, is sequence dependent. Free DNA in solution would in principle be the best reference state to uncover the relation between base sequences and their intrinsic flexibility; however, this has long been hampered by a lack of suitable experimental data. We investigated this relationship by compiling and analyzing a large dataset of NMR (31)P chemical shifts in solution. These measurements reflect the BI <--> BII equilibrium in DNA, intimately correlated to helicoidal descriptors of the curvature, winding and groove dimensions. Comparing the ten complementary DNA dinucleotide steps indicates that some steps are much more flexible than others. This malleability is primarily controlled at the dinucleotide level, modulated by the tetranucleotide environment. Our analyses provide an experimental scale called TRX that quantifies the intrinsic flexibility of the ten dinucleotide steps in terms of Twist, Roll, and X-disp (base pair displacement). Applying the TRX scale to DNA sequences optimized for nucleosome formation reveals a 10 base-pair periodic alternation of stiff and flexible regions. Thus, DNA flexibility captured by the TRX scale is relevant to nucleosome formation, suggesting that this scale may be of general interest to better understand protein-DNA recognition.

  20. Normalization of environmental metagenomic DNA enhances the discovery of under-represented microbial community members.

    PubMed

    Ramond, J-B; Makhalanyane, T P; Tuffin, M I; Cowan, D A

    2015-04-01

    Normalization is a procedure classically employed to detect rare sequences in cellular expression profiles (i.e. cDNA libraries). Here, we present a normalization protocol involving the direct treatment of extracted environmental metagenomic DNA with S1 nuclease, referred to as normalization of metagenomic DNA: NmDNA. We demonstrate that NmDNA, prior to post hoc PCR-based experiments (16S rRNA gene T-RFLP fingerprinting and clone library), increased the diversity of sequences retrieved from environmental microbial communities by detection of rarer sequences. This approach could be used to enhance the resolution of detection of ecologically relevant rare members in environmental microbial assemblages and therefore is promising in enabling a better understanding of ecosystem functioning. This study is the first testing 'normalization' on environmental metagenomic DNA (mDNA). The aim of this procedure was to improve the identification of rare phylotypes in environmental communities. Using hypoliths as model systems, we present evidence that this post-mDNA extraction molecular procedure substantially enhances the detection of less common phylotypes and could even lead to the discovery of novel microbial genotypes within a given environment. © 2014 The Society for Applied Microbiology.

  1. EMPOP-quality mtDNA control region sequences from Kashmiri of Azad Jammu & Kashmir, Pakistan.

    PubMed

    Rakha, Allah; Peng, Min-Sheng; Bi, Rui; Song, Jiao-Jiao; Salahudin, Zeenat; Adan, Atif; Israr, Muhammad; Yao, Yong-Gang

    2016-11-01

    The mitochondrial DNA (mtDNA) control region (nucleotide position 16024-576) sequences were generated through Sanger sequencing method for 317 self-identified Kashmiris from all districts of Azad Jammu & Kashmir Pakistan. The population sample set showed a total of 251 haplotypes, with a relatively high haplotype diversity (0.9977) and a low random match probability (0.54%). The containing matrilineal lineages belonging to three different phylogeographic origins of Western Eurasian (48.9%), South Asian (47.0%) and East Asian (4.1%). The present study was compared to previous data from Pakistan and other worldwide populations (Central Asia, Western Asia, and East & Southeast Asia). The dataset is made available through EMPOP under accession number EMP00679 and will serve as an mtDNA reference database in forensic casework in Pakistan. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  2. 5-bp Classical Satellite DNA Loci from Chromosome-1 Instability in Cervical Neoplasia Detected by DNA Breakage Detection/Fluorescence in Situ Hybridization (DBD-FISH).

    PubMed

    Cortés-Gutiérrez, Elva I; Ortíz-Hernández, Brenda L; Dávila-Rodríguez, Martha I; Cerda-Flores, Ricardo M; Fernández, José Luis; López-Fernández, Carmen; Gosálvez, Jaime

    2013-02-19

    We aimed to evaluate the association between the progressive stages of cervical neoplasia and DNA damage in 5-bp classical satellite DNA sequences from chromosome-1 in cervical epithelium and in peripheral blood lymphocytes using DNA breakage detection/fluorescence in situ hybridization (DBD-FISH). A hospital-based unmatched case-control study was conducted in 2011 with a sample of 30 women grouped according to disease stage and selected according to histological diagnosis; 10 with low-grade squamous intraepithelial lesions (LG-SIL), 10 with high-grade SIL (HG-SIL), and 10 with no cervical lesions, from the Unidad Medica de Alta Especialidad of The Mexican Social Security Institute, IMSS, Mexico. Specific chromosome damage levels in 5-bp classical satellite DNA sequences from chromosome-1 were evaluated in cervical epithelium and peripheral blood lymphocytes using the DBD-FISH technique. Whole-genome DNA hybridization was used as a reference for the level of damage. Results of Kruskal-Wallis test showed a significant increase according to neoplastic development in both tissues. The instability of 5-bp classical satellite DNA sequences from chromosome-1 was evidenced using chromosome-orientation FISH. In conclusion, we suggest that the progression to malignant transformation involves an increase in the instability of 5-bp classical satellite DNA sequences from chromosome-1.

  3. Exome copy number variation detection: Use of a pool of unrelated healthy tissue as reference sample.

    PubMed

    Wenric, Stephane; Sticca, Tiberio; Caberg, Jean-Hubert; Josse, Claire; Fasquelle, Corinne; Herens, Christian; Jamar, Mauricette; Max, Stéphanie; Gothot, André; Caers, Jo; Bours, Vincent

    2017-01-01

    An increasing number of bioinformatic tools designed to detect CNVs (copy number variants) in tumor samples based on paired exome data where a matched healthy tissue constitutes the reference have been published in the recent years. The idea of using a pool of unrelated healthy DNA as reference has previously been formulated but not thoroughly validated. As of today, the gold standard for CNV calling is still aCGH but there is an increasing interest in detecting CNVs by exome sequencing. We propose to design a metric allowing the comparison of two CNV profiles, independently of the technique used and assessed the validity of using a pool of unrelated healthy DNA instead of a matched healthy tissue as reference in exome-based CNV detection. We compared the CNV profiles obtained with three different approaches (aCGH, exome sequencing with a matched healthy tissue as reference, exome sequencing with a pool of eight unrelated healthy tissue as reference) on three multiple myeloma samples. We show that the usual analyses performed to compare CNV profiles (deletion/amplification ratios and CNV size distribution) lack in precision when confronted with low LRR values, as they only consider the binary status of each CNV. We show that the metric-based distance constitutes a more accurate comparison of two CNV profiles. Based on these analyses, we conclude that a reliable picture of CNV alterations in multiple myeloma samples can be obtained from whole-exome sequencing in the absence of a matched healthy sample. © 2016 WILEY PERIODICALS, INC.

  4. Mitochondrial cytochrome c oxidase subunit 1 gene and nuclear rDNA regions of Enterobius vermicularis parasitic in captive chimpanzees with special reference to its relationship with pinworms in humans.

    PubMed

    Nakano, Tadao; Okamoto, Munehiro; Ikeda, Yatsukaho; Hasegawa, Hideo

    2006-12-01

    Sequences of mitochondrial cytochrome c oxidase subunit 1 (CO1) gene, nuclear internal transcribed spacer 2 (ITS2) region of ribosomal DNA (rDNA), and 5S rDNA of Enterobius vermicularis from captive chimpanzees in five zoos/institutions in Japan were analyzed and compared with those of pinworm eggs from humans in Japan. Three major types of variants appearing in both CO1 and ITS2 sequences, but showing no apparent connection, were observed among materials collected from the chimpanzees. Each one of them was also observed in pinworms in humans. Sequences of 5S rDNA were identical in the materials from chimpanzees and humans. Phylogenetic analysis of CO1 gene revealed three clusters with high bootstrap value, suggesting considerable divergence, presumably correlated with human evolution, has occurred in the human pinworms. The synonymy of E. gregorii with E. vermicularis is supported by the molecular evidence.

  5. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  6. JVM: Java Visual Mapping tool for next generation sequencing read.

    PubMed

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.

  7. Specific primer design of mitochondrial 12S rRNA for species identification in raw meats

    NASA Astrophysics Data System (ADS)

    Cahyadi, M.; Puruhita; Barido, F. H.; Hertanto, B. S.

    2018-01-01

    Polymerase chain reaction (PCR) is a molecular technique that widely used in agriculture area including species identification in animal-based products for halalness and food safety reasons. Amplification of DNA using PCR needs a primer pair (forward and reverse primers) to isolate specific DNA fragment in the genome. This objective of this study was to design specific primer from mitochondrial 12S rRNA region for species identification in raw beef, pork and chicken meat. Three published sequences, HQ184045, JN601075, and KT626857, were downloaded from National Center for Biotechnology Information (NCBI) website. Furthermore, those reference sequences were used to design specific primer for bovine, pig, and chicken species using primer3 v.0.4.0. A total of 15 primer pairs were picked up from primer3 software. Of these, an universal forward primer and three reverse primers which are specific for bovine, pig, and chicken species were selected to be optimized using multiplex-PCR technique. The selected primers were namely UNIF (5’-ACC GCG GTC ATA CGA TTA AC-3’), SPR (5’-AGT GCG TCG GCT ATT GTA GG-3’), BBR (5’-GAA TTG GCA AGG GTT GGT AA-3’), and AR (5’-CGG TAT GTA CGT GCC TCA GA-3’). In addition, the PCR products were visualized using 2% agarose gels under the UV light and sequenced to be aligned with reference sequences using Clustal Omega. The result showed that those primers were specifically amplified mitochondrial 12S rRNA regions from bovine, pig, and chicken using PCR. It was indicated by the existence of 155, 357, and 611 bp of DNA bands for bovine, pig, and chicken species, respectively. Moreover, sequence analysis revealed that our sequences were identically similar with reference sequences. It can be concluded that mitochondrial 12S rRNA may be used as a genetic marker for species identification in meat products.

  8. G-Anchor: a novel approach for whole-genome comparative mapping utilizing evolutionary conserved DNA sequences.

    PubMed

    Lenis, Vasileios Panagiotis E; Swain, Martin; Larkin, Denis M

    2018-05-01

    Cross-species whole-genome sequence alignment is a critical first step for genome comparative analyses, ranging from the detection of sequence variants to studies of chromosome evolution. Animal genomes are large and complex, and whole-genome alignment is a computationally intense process, requiring expensive high-performance computing systems due to the need to explore extensive local alignments. With hundreds of sequenced animal genomes available from multiple projects, there is an increasing demand for genome comparative analyses. Here, we introduce G-Anchor, a new, fast, and efficient pipeline that uses a strictly limited but highly effective set of local sequence alignments to anchor (or map) an animal genome to another species' reference genome. G-Anchor makes novel use of a databank of highly conserved DNA sequence elements. We demonstrate how these elements may be aligned to a pair of genomes, creating anchors. These anchors enable the rapid mapping of scaffolds from a de novo assembled genome to chromosome assemblies of a reference species. Our results demonstrate that G-Anchor can successfully anchor a vertebrate genome onto a phylogenetically related reference species genome using a desktop or laptop computer within a few hours and with comparable accuracy to that achieved by a highly accurate whole-genome alignment tool such as LASTZ. G-Anchor thus makes whole-genome comparisons accessible to researchers with limited computational resources. G-Anchor is a ready-to-use tool for anchoring a pair of vertebrate genomes. It may be used with large genomes that contain a significant fraction of evolutionally conserved DNA sequences and that are not highly repetitive, polypoid, or excessively fragmented. G-Anchor is not a substitute for whole-genome aligning software but can be used for fast and accurate initial genome comparisons. G-Anchor is freely available and a ready-to-use tool for the pairwise comparison of two genomes.

  9. Towards a comprehensive barcode library for arctic life - Ephemeroptera, Plecoptera, and Trichoptera of Churchill, Manitoba, Canada

    PubMed Central

    2009-01-01

    Background This study reports progress in assembling a DNA barcode reference library for Ephemeroptera, Plecoptera, and Trichoptera ("EPTs") from a Canadian subarctic site, which is the focus of a comprehensive biodiversity inventory using DNA barcoding. These three groups of aquatic insects exhibit a moderate level of species diversity, making them ideal for testing the feasibility of DNA barcoding for routine biotic surveys. We explore the correlation between the morphological species delineations, DNA barcode-based haplotype clusters delimited by a sequence threshold (2%), and a threshold-free approach to biodiversity quantification--phylogenetic diversity. Results A DNA barcode reference library is built for 112 EPT species for the focal region, consisting of 2277 COI sequences. Close correspondence was found between EPT morphospecies and haplotype clusters as designated using a standard threshold value. Similarly, the shapes of taxon accumulation curves based upon haplotype clusters were very similar to those generated using phylogenetic diversity accumulation curves, but were much more computationally efficient. Conclusion The results of this study will facilitate other lines of research on northern EPTs and also bode well for rapidly conducting initial biodiversity assessments in unknown EPT faunas. PMID:20003245

  10. Museum genomics: low-cost and high-accuracy genetic data from historical specimens.

    PubMed

    Rowe, Kevin C; Singhal, Sonal; Macmanes, Matthew D; Ayroles, Julien F; Morelli, Toni Lyn; Rubidge, Emily M; Bi, Ke; Moritz, Craig C

    2011-11-01

    Natural history collections are unparalleled repositories of geographical and temporal variation in faunal conditions. Molecular studies offer an opportunity to uncover much of this variation; however, genetic studies of historical museum specimens typically rely on extracting highly degraded and chemically modified DNA samples from skins, skulls or other dried samples. Despite this limitation, obtaining short fragments of DNA sequences using traditional PCR amplification of DNA has been the primary method for genetic study of historical specimens. Few laboratories have succeeded in obtaining genome-scale sequences from historical specimens and then only with considerable effort and cost. Here, we describe a low-cost approach using high-throughput next-generation sequencing to obtain reliable genome-scale sequence data from a traditionally preserved mammal skin and skull using a simple extraction protocol. We show that single-nucleotide polymorphisms (SNPs) from the genome sequences obtained independently from the skin and from the skull are highly repeatable compared to a reference genome. © 2011 Blackwell Publishing Ltd.

  11. Quantitative Detection of Small Molecule/DNA Complexes Employing a Force-Based and Label-Free DNA-Microarray

    PubMed Central

    Ho, Dominik; Dose, Christian; Albrecht, Christian H.; Severin, Philip; Falter, Katja; Dervan, Peter B.; Gaub, Hermann E.

    2009-01-01

    Force-based ligand detection is a promising method to characterize molecular complexes label-free at physiological conditions. Because conventional implementations of this technique, e.g., based on atomic force microscopy or optical traps, are low-throughput and require extremely sensitive and sophisticated equipment, this approach has to date found only limited application. We present a low-cost, chip-based assay, which combines high-throughput force-based detection of dsDNA·ligand interactions with the ease of fluorescence detection. Within the comparative unbinding force assay, many duplicates of a target DNA duplex are probed against a defined reference DNA duplex each. The fractions of broken target and reference DNA duplexes are determined via fluorescence. With this assay, we investigated the DNA binding behavior of artificial pyrrole-imidazole polyamides. These small compounds can be programmed to target specific dsDNA sequences and distinguish between D- and L-DNA. We found that titration with polyamides specific for a binding motif, which is present in the target DNA duplex and not in the reference DNA duplex, reliably resulted in a shift toward larger fractions of broken reference bonds. From the concentration dependence nanomolar to picomolar dissociation constants of dsDNA·ligand complexes were determined, agreeing well with prior quantitative DNAase footprinting experiments. This finding corroborates that the forced unbinding of dsDNA in presence of a ligand is a nonequilibrium process that produces a snapshot of the equilibrium distribution between dsDNA and dsDNA·ligand complexes. PMID:19486688

  12. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs.

    PubMed

    Sanders, Ashley D; Falconer, Ester; Hills, Mark; Spierings, Diana C J; Lansdorp, Peter M

    2017-06-01

    The ability to distinguish between genome sequences of homologous chromosomes in single cells is important for studies of copy-neutral genomic rearrangements (such as inversions and translocations), building chromosome-length haplotypes, refining genome assemblies, mapping sister chromatid exchange events and exploring cellular heterogeneity. Strand-seq is a single-cell sequencing technology that resolves the individual homologs within a cell by restricting sequence analysis to the DNA template strands used during DNA replication. This protocol, which takes up to 4 d to complete, relies on the directionality of DNA, in which each single strand of a DNA molecule is distinguished based on its 5'-3' orientation. Culturing cells in a thymidine analog for one round of cell division labels nascent DNA strands, allowing for their selective removal during genomic library construction. To preserve directionality of template strands, genomic preamplification is bypassed and labeled nascent strands are nicked and not amplified during library preparation. Each single-cell library is multiplexed for pooling and sequencing, and the resulting sequence data are aligned, mapping to either the minus or plus strand of the reference genome, to assign template strand states for each chromosome in the cell. The major adaptations to conventional single-cell sequencing protocols include harvesting of daughter cells after a single round of BrdU incorporation, bypassing of whole-genome amplification, and removal of the BrdU + strand during Strand-seq library preparation. By sequencing just template strands, the structure and identity of each homolog are preserved.

  13. Aptaligner: automated software for aligning pseudorandom DNA X-aptamers from next-generation sequencing data.

    PubMed

    Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E

    2014-06-10

    Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.

  14. Potential benefits from using a new reference map in genomic prediction

    USDA-ARS?s Scientific Manuscript database

    Many genomic studies in cattle have used the 2009 reference assembly from the University of Maryland (UMD3.1). A new USDA Agricultural Research Service-University of California, Davis (ARS-UCD) assembly based on longer DNA reads from the same cow (Dominette) should improve sequence alignment, imputa...

  15. Advances in DNA metabarcoding for food and wildlife forensic species identification.

    PubMed

    Staats, Martijn; Arulandhu, Alfred J; Gravendeel, Barbara; Holst-Jensen, Arne; Scholtens, Ingrid; Peelen, Tamara; Prins, Theo W; Kok, Esther

    2016-07-01

    Species identification using DNA barcodes has been widely adopted by forensic scientists as an effective molecular tool for tracking adulterations in food and for analysing samples from alleged wildlife crime incidents. DNA barcoding is an approach that involves sequencing of short DNA sequences from standardized regions and comparison to a reference database as a molecular diagnostic tool in species identification. In recent years, remarkable progress has been made towards developing DNA metabarcoding strategies, which involves next-generation sequencing of DNA barcodes for the simultaneous detection of multiple species in complex samples. Metabarcoding strategies can be used in processed materials containing highly degraded DNA e.g. for the identification of endangered and hazardous species in traditional medicine. This review aims to provide insight into advances of plant and animal DNA barcoding and highlights current practices and recent developments for DNA metabarcoding of food and wildlife forensic samples from a practical point of view. Special emphasis is placed on new developments for identifying species listed in the Convention on International Trade of Endangered Species (CITES) appendices for which reliable methods for species identification may signal and/or prevent illegal trade. Current technological developments and challenges of DNA metabarcoding for forensic scientists will be assessed in the light of stakeholders' needs.

  16. Implications of Hybridization, NUMTs, and Overlooked Diversity for DNA Barcoding of Eurasian Ground Squirrels

    PubMed Central

    Ermakov, Oleg A.; Simonov, Evgeniy; Surin, Vadim L.; Titov, Sergey V.; Brandler, Oleg V.; Ivanova, Natalia V.; Borisenko, Alex V.

    2015-01-01

    The utility of DNA Barcoding for species identification and discovery has catalyzed a concerted effort to build the global reference library; however, many animal groups of economical or conservational importance remain poorly represented. This study aims to contribute DNA barcode records for all ground squirrel species (Xerinae, Sciuridae, Rodentia) inhabiting Eurasia and to test efficiency of this approach for species discrimination. Cytochrome c oxidase subunit 1 (COI) gene sequences were obtained for 97 individuals representing 16 ground squirrel species of which 12 were correctly identified. Taxonomic allocation of some specimens within four species was complicated by geographically restricted mtDNA introgression. Exclusion of individuals with introgressed mtDNA allowed reaching a 91.6% identification success rate. Significant COI divergence (3.5–4.4%) was observed within the most widespread ground squirrel species (Spermophilus erythrogenys, S. pygmaeus, S. suslicus, Urocitellus undulatus), suggesting the presence of cryptic species. A single putative NUMT (nuclear mitochondrial pseudogene) sequence was recovered during molecular analysis; mitochondrial COI from this sample was amplified following re-extraction of DNA. Our data show high discrimination ability of 100 bp COI fragments for Eurasian ground squirrels (84.3%) with no incorrect assessments, underscoring the potential utility of the existing reference librariy for the development of diagnostic ‘mini-barcodes’. PMID:25617768

  17. The Past, Present, and Future of Human Centromere Genomics

    PubMed Central

    Aldrup-MacDonald, Megan E.; Sullivan, Beth A.

    2014-01-01

    The centromere is the chromosomal locus essential for chromosome inheritance and genome stability. Human centromeres are located at repetitive alpha satellite DNA arrays that compose approximately 5% of the genome. Contiguous alpha satellite DNA sequence is absent from the assembled reference genome, limiting current understanding of centromere organization and function. Here, we review the progress in centromere genomics spanning the discovery of the sequence to its molecular characterization and the work done during the Human Genome Project era to elucidate alpha satellite structure and sequence variation. We discuss exciting recent advances in alpha satellite sequence assembly that have provided important insight into the abundance and complex organization of this sequence on human chromosomes. In light of these new findings, we offer perspectives for future studies of human centromere assembly and function. PMID:24683489

  18. Methylsorb: a simple method for quantifying DNA methylation using DNA-gold affinity interactions.

    PubMed

    Sina, Abu Ali Ibn; Carrascosa, Laura G; Palanisamy, Ramkumar; Rauf, Sakandar; Shiddiky, Muhammad J A; Trau, Matt

    2014-10-21

    The analysis of DNA methylation is becoming increasingly important both in the clinic and also as a research tool to unravel key epigenetic molecular mechanisms in biology. Current methodologies for the quantification of regional DNA methylation (i.e., the average methylation over a region of DNA in the genome) are largely affected by comprehensive DNA sequencing methodologies which tend to be expensive, tedious, and time-consuming for many applications. Herein, we report an alternative DNA methylation detection method referred to as "Methylsorb", which is based on the inherent affinity of DNA bases to the gold surface (i.e., the trend of the affinity interactions is adenine > cytosine ≥ guanine > thymine).1 Since the degree of gold-DNA affinity interaction is highly sequence dependent, it provides a new capability to detect DNA methylation by simply monitoring the relative adsorption of bisulfite treated DNA sequences onto a gold chip. Because the selective physical adsorption of DNA fragments to gold enable a direct read-out of regional DNA methylation, the current requirement for DNA sequencing is obviated. To demonstrate the utility of this method, we present data on the regional methylation status of two CpG clusters located in the EN1 and MIR200B genes in MCF7 and MDA-MB-231 cells. The methylation status of these regions was obtained from the change in relative mass on gold surface with respect to relative adsorption of an unmethylated DNA source and this was detected using surface plasmon resonance (SPR) in a label-free and real-time manner. We anticipate that the simplicity of this method, combined with the high level of accuracy for identifying the methylation status of cytosines in DNA, could find broad application in biology and diagnostics.

  19. Molecular Phylogenetics of Trichostrongylus Species (Nematoda: Trichostrongylidae) from Humans of Mazandaran Province, Iran.

    PubMed

    Sharifdini, Meysam; Heidari, Zahra; Hesari, Zahra; Vatandoost, Sajad; Kia, Eshrat Beigom

    2017-06-01

    The present study was performed to analyze molecularly the phylogenetic positions of human-infecting Trichostrongylus species in Mazandaran Province, Iran, which is an endemic area for trichostrongyliasis. DNA from 7 Trichostrongylus infected stool samples were extracted by using in-house (IH) method. PCR amplification of ITS2-rDNA region was performed, and products were sequenced. Phylogenetic analysis of the nucleotide sequence data was performed using MEGA 5.0 software. Six out of 7 isolates had high similarity with Trichostrongylus colubriformis , while the other one showed high homology with Trichostrongylus axei registered in GenBank reference sequences. Intra-specific variations within isolates of T. colubriformis and T. axei amounted to 0-1.8% and 0-0.6%, respectively. Trichostrongylus species obtained in the present study were in a cluster with the relevant reference sequences from previous studies. BLAST analysis indicated that there was 100% homology among all 6 ITS2 sequences of T. colubriformis in the present study and most previously registered sequences of T. colubriformis from human, sheep, and goat isolates from Iran and also human isolates from Laos, Thailand, and France. The ITS2 sequence of T. axei exhibited 99.4% homology with the human isolate of T. axei from Thailand, sheep isolates from New Zealand and Iran, and cattle isolate from USA.

  20. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  1. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

    PubMed

    Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

    2013-08-15

    Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

  2. Molecular characterization of eimeria species naturally infecting egyptian baldi chickens.

    PubMed

    Gadelhaq, Sahar M; Arafa, Waleed M; Aboelhadid, Shawky M

    2015-01-01

    Coccidiosis is a serious protozoal disease of poultry. The identification of Eimeria species has important implications for diagnosis and control as well as for epidemiology. The molecular characterization of Eimeria species infecting Egyptian baladi chickens was investigated. Eimeria species oocysts were harvested from intestines of naturally infected Egyptian baldi chickens. The morphometry characterization of oocysts along with COCCIMORPH software was done. The DNA was extracted initially by freezing and thawing then the prepared samples was subjected to commercial DNA kits. The DNA products were analyzed through conventional polymerase chain reaction by using amplified region (SCAR) marker. The PCR results confirmed the presence of 7 Eimeria species in the examined fecal samples of Egyptian baldi breed with their specific ampilicon sizes being E. acervulina (811bp), E. brunette (626bp), E. tenella (539bp), E. maxima (272bp), E. necatrix (200bp), E. mitis (327bp) and E. praecopx (354bp). A sequencing of the two most predominant species of Eimeria was done, on E. tenella and E. máxima. Analysis of the obtained sequences revealed high identities 99% between Egyptian isolates and the reference one. Similarly, E. maxima isolated from Egyptian baldi chickens showed 98% nucleotide identities with the reference strain. Only single nucleotide substitution was observed among the Egyptian E. tenella isolates (A181G) when compared to the reference one. The Egyptian isolates acquired 4 unique mutations (A68T, C164T, G190A and C227G) in compared with the reference sequence. This is the first time to identify the 7 species of Eimeria from Egyptian baladi chickens.

  3. Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel.

    PubMed

    Meadows, J R S; Hiendleder, S; Kijas, J W

    2011-04-01

    Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920,000 ± 190,000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA.

  4. Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel

    PubMed Central

    Meadows, J R S; Hiendleder, S; Kijas, J W

    2011-01-01

    Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920 000±190 000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA. PMID:20940734

  5. Global Analysis of Transcription Factor-Binding Sites in Yeast Using ChIP-Seq

    PubMed Central

    Lefrançois, Philippe; Gallagher, Jennifer E. G.; Snyder, Michael

    2016-01-01

    Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way. Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28–36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy. PMID:25213249

  6. Alignment of high-throughput sequencing data inside in-memory databases.

    PubMed

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  7. Arduino-based automation of a DNA extraction system.

    PubMed

    Kim, Kyung-Won; Lee, Mi-So; Ryu, Mun-Ho; Kim, Jong-Won

    2015-01-01

    There have been many studies to detect infectious diseases with the molecular genetic method. This study presents an automation process for a DNA extraction system based on microfluidics and magnetic bead, which is part of a portable molecular genetic test system. This DNA extraction system consists of a cartridge with chambers, syringes, four linear stepper actuators, and a rotary stepper actuator. The actuators provide a sequence of steps in the DNA extraction process, such as transporting, mixing, and washing for the gene specimen, magnetic bead, and reagent solutions. The proposed automation system consists of a PC-based host application and an Arduino-based controller. The host application compiles a G code sequence file and interfaces with the controller to execute the compiled sequence. The controller executes stepper motor axis motion, time delay, and input-output manipulation. It drives the stepper motor with an open library, which provides a smooth linear acceleration profile. The controller also provides a homing sequence to establish the motor's reference position, and hard limit checking to prevent any over-travelling. The proposed system was implemented and its functionality was investigated, especially regarding positioning accuracy and velocity profile.

  8. Authentication of Cordyceps sinensis by DNA Analyses: Comparison of ITS Sequence Analysis and RAPD-Derived Molecular Markers.

    PubMed

    Lam, Kelly Y C; Chan, Gallant K L; Xin, Gui-Zhong; Xu, Hong; Ku, Chuen-Fai; Chen, Jian-Ping; Yao, Ping; Lin, Huang-Quan; Dong, Tina T X; Tsim, Karl W K

    2015-12-15

    Cordyceps sinensis is an endoparasitic fungus widely used as a tonic and medicinal food in the practice of traditional Chinese medicine (TCM). In historical usage, Cordyceps specifically is referring to the species of C. sinensis. However, a number of closely related species are named themselves as Cordyceps, and they are sold commonly as C. sinensis. The substitutes and adulterants of C. sinensis are often introduced either intentionally or accidentally in the herbal market, which seriously affects the therapeutic effects or even leads to life-threatening poisoning. Here, we aim to identify Cordyceps by DNA sequencing technology. Two different DNA-based approaches were compared. The internal transcribed spacer (ITS) sequences and the random amplified polymorphic DNA (RAPD)-sequence characterized amplified region (SCAR) were developed here to authenticate different species of Cordyceps. Both approaches generally enabled discrimination of C. sinensis from others. The application of the two methods, supporting each other, increases the security of identification. For better reproducibility and faster analysis, the SCAR markers derived from the RAPD results provide a new method for quick authentication of Cordyceps.

  9. Quartz crystal microbalance (QCM) affinity biosensor for genetically modified organisms (GMOs) detection.

    PubMed

    Mannelli, Ilaria; Minunni, Maria; Tombelli, Sara; Mascini, Marco

    2003-03-01

    A DNA piezoelectric sensor has been developed for the detection of genetically modified organisms (GMOs). Single stranded DNA (ssDNA) probes were immobilised on the sensor surface of a quartz crystal microbalance (QCM) device and the hybridisation between the immobilised probe and the target complementary sequence in solution was monitored. The probe sequences were internal to the sequence of the 35S promoter (P) and Nos terminator (T), which are inserted sequences in the genome of GMOs regulating the transgene expression. Two different probe immobilisation procedures were applied: (a) a thiol-dextran procedure and (b) a thiol-derivatised probe and blocking thiol procedure. The system has been optimised using synthetic oligonucleotides, which were then applied to samples of plasmidic and genomic DNA isolated from the pBI121 plasmid, certified reference materials (CRM), and real samples amplified by the polymerase chain reaction (PCR). The analytical parameters of the sensor have been investigated (sensitivity, reproducibility, lifetime etc.). The results obtained showed that both immobilisation procedures enabled sensitive and specific detection of GMOs, providing a useful tool for screening analysis in food samples.

  10. Sequence and pattern of expression of a bovine homologue of a human mitochondrial transport protein associated with Grave's disease.

    PubMed

    Fiermonte, G; Runswick, M J; Walker, J E; Palmieri, F

    1992-01-01

    A human cDNA has been isolated previously from a thyroid library with the aid of serum from a patient with Grave's disease. It encodes a protein belonging to the mitochondrial metabolite carrier family, referred to as the Grave's disease carrier protein (GDC). Using primers based on this sequence, overlapping cDNAs encoding the bovine homologue of the GDC have been isolated from total bovine heart poly(A)+ cDNA. The bovine protein is 18 amino acids shorter than the published human sequence, but if a frame shift requiring the removal of one nucleotide is introduced into the human cDNA sequence, the human and bovine proteins become identical in their C-terminal regions, and 308 out of 330 amino acids are conserved over their entire sequences. The bovine cDNA has been used to investigate the expression of the GDC in various bovine tissues. In the tissues that were examined, the GDC is most strongly expressed in the thyroid, but substantial amounts of its mRNA were also detected in liver, lung and kidney, and lesser amounts in heart and skeletal muscle.

  11. Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi

    Treesearch

    C.L. Schoch; B. Robbertse; V. Robert; R.G. Haight; K. Kovacs; B. Leung; W. Meyer; R.H. Nilsson; K. Hughes; A.N. Miller; P.M. Kirk; K. Abarenkov; M.C. Aime; H.A. Ariyawansa; M. Bidartondo; T. Boekhout; B. Buyck; Q. Cai; J. Chen; A. Crespo; P.W. Crous; U. Damm; Z.W. De Beer; B.T.M. Dentinger; P.K. Divakar; M. Duenas; N. Feau; K. Fliegerova; M.A. Garcia; Z.-W. Ge; G.W. Griffith; J.Z. Groenewald; M. Groenewald; M. Grube; M. Gryzenhout; C. Gueidan; L. Guo; S. Hambleton; R. Hamelin; K. Hansen; V. Hofstetter; S.-B. Hong; J. Houbraken; K.D. Hyde; P. Inderbitzin; P.R. Johnston; S.C. Karunarathna; U. Koljalg; G.M. Kovacs; E. Kraichak; K. Krizsan; C.P. Kurtzman; K.-H. Larsson; S. Leavitt; P.M. Letcher; K. Liimatainen; J.-K. Liu; D.J. Lodge; J. Jennifer Luangsa-ard; H.T. Lumbsch; S.S.N. Maharachchikumbura; D. Manamgoda; M.P. Martin; A.M. Minnis; J.-M. Moncalvo; G. Mule; K.K. Nakasone; T. Niskanen; I. Olariaga; T. Papp; T. Petkovits; R. Pino-Bodas; M.J. Powell; H.A. Raja; D. Redecker; J.M. Sarmiento-Ramirez; K.A. Seifert; B. Shrestha; S. Stenroos; B. Stielow; S.-O. Suh; K. Tanaka; L. Tedersoo; M.T. Telleria; D. Udayanga; W.A. Untereiner; J. Dieguez Uribeondo; K.V. Subbarao; C. Vagvolgyi; C. Visagie; K. Voigt; D.M. Walker; B.S. Weir; M. Weiss; N.N. Wijayawardene; M.J. Wingfield; J.P. Xu; Z.L. Yang; N. Zhang; W.-Y. Zhuang; S. Federhen

    2014-01-01

    DNA phylogenetic comparisons have shown that morphology-based species recognition often underestimates fungal diversity. Therefore, the need for accurate DNA sequence data, tied to both correct taxonomic names and clearly annotated specimen data, has never been greater. Furthermore, the growing number of molecular ecology and microbiome projects using high-throughput...

  12. DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability

    PubMed Central

    Little, Damon P.

    2011-01-01

    For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple–sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple–sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment–free sequence identification algorithm–BRONX–that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple–sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user–defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini–barcode queries against a full–length barcode database). BRONX consistently produced better identifications at the genus–level for all query types. PMID:21857897

  13. Population structure of pigs determined by single nucleotide polymorphisms observed in assembled expressed sequence tags.

    PubMed

    Matsumoto, Toshimi; Okumura, Naohiko; Uenishi, Hirohide; Hayashi, Takeshi; Hamasima, Noriyuki; Awata, Takashi

    2012-01-01

    We have collected more than 190000 porcine expressed sequence tags (ESTs) from full-length complementary DNA (cDNA) libraries and identified more than 2800 single nucleotide polymorphisms (SNPs). In this study, we tentatively chose 222 SNPs observed in assembled ESTs to study pigs of different breeds; 104 were selected by comparing the cDNA sequences of a Meishan pig and samples of three-way cross pigs (Landrace, Large White, and Duroc: LWD), and 118 were selected from LWD samples. To evaluate the genetic variation between the chosen SNPs from pig breeds, we determined the genotypes for 192 pig samples (11 pig groups) from our DNA reference panel with matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Of the 222 reference SNPs, 186 were successfully genotyped. A neighbor-joining tree showed that the pig groups were classified into two large clusters, namely, Euro-American and East Asian pig populations. F-statistics and the analysis of molecular variance of Euro-American pig groups revealed that approximately 25% of the genetic variations occurred because of intergroup differences. As the F(IS) values were less than the F(ST) values(,) the clustering, based on the Bayesian inference, implied that there was strong genetic differentiation among pig groups and less divergence within the groups in our samples. © 2011 The Authors. Animal Science Journal © 2011 Japanese Society of Animal Science.

  14. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.

    PubMed

    Wright, Imogen A; Travers, Simon A

    2014-07-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Creation of reference DNA barcode library and authentication of medicinal plant raw drugs used in Ayurvedic medicine.

    PubMed

    Vassou, Sophie Lorraine; Nithaniyal, Stalin; Raju, Balaji; Parani, Madasamy

    2016-07-18

    Ayurveda is a system of traditional medicine that originated in ancient India, and it is still in practice. Medicinal plants are the backbone of Ayurveda, which heavily relies on the plant-derived therapeutics. While Ayurveda is becoming more popular in several countries throughout the World, lack of authenticated medicinal plant raw drugs is a growing concern. Our aim was to DNA barcode the medicinal plants that are listed in the Ayurvedic Pharmacopoeia of India (API) to create a reference DNA barcode library, and to use the same to authenticate the raw drugs that are sold in markets. We have DNA barcoded 347 medicinal plants using rbcL marker, and curated rbcL DNA barcodes for 27 medicinal plants from public databases. These sequences were used to create Ayurvedic Pharmacopoeia of India - Reference DNA Barcode Library (API-RDBL). This library was used to authenticate 100 medicinal plant raw drugs, which were in the form of powders (82) and seeds (18). Ayurvedic Pharmacopoeia of India - Reference DNA Barcode Library (API-RDBL) was created with high quality and authentic rbcL barcodes for 374 out of the 395 medicinal plants that are included in the API. The rbcL DNA barcode differentiated 319 species (85 %) with the pairwise divergence ranging between 0.2 and 29.9 %. PCR amplification and DNA sequencing success rate of rbcL marker was 100 % even for the poorly preserved medicinal plant raw drugs that were collected from local markets. DNA barcoding revealed that only 79 % raw drugs were authentic, and the remaining 21 % samples were adulterated. Further, adulteration was found to be much higher with powders (ca. 25 %) when compared to seeds (ca. 5 %). The present study demonstrated the utility of DNA barcoding in authenticating medicinal plant raw drugs, and found that approximately one fifth of the market samples were adulterated. Powdered raw drugs, which are very difficult to be identified by taxonomists as well as common people, seem to be the easy target for adulteration. Developing a quality control protocol for medicinal plant raw drugs by incorporating DNA barcoding as a component is essential to ensure safety to the consumers.

  16. Mitochondrial DNA variant at HVI region as a candidate of genetic markers of type 2 diabetes

    NASA Astrophysics Data System (ADS)

    Gumilar, Gun Gun; Purnamasari, Yunita; Setiadi, Rahmat

    2016-02-01

    Mitochondrial DNA (mtDNA) is maternally inherited. mtDNA mutations which can contribute to the excess of maternal inheritance of type 2 diabetes. Due to the high mutation rate, one of the areas in the mtDNA that is often associated with the disease is the hypervariable region I (HVI). Therefore, this study was conducted to determine the genetic variants of human mtDNA HVI that related to the type 2 diabetes in four samples that were taken from four generations in one lineage. Steps being taken include the lyses of hair follicles, amplification of mtDNA HVI fragment using Polymerase Chain Reaction (PCR), detection of PCR products through agarose gel electrophoresis technique, the measurement of the concentration of mtDNA using UV-Vis spectrophotometer, determination of the nucleotide sequence via direct sequencing method and analysis of the sequencing results using SeqMan DNASTAR program. Based on the comparison between nucleotide sequence of samples and revised Cambridge Reference Sequence (rCRS) obtained six same mutations that these are C16147T, T16189C, C16193del, T16127C, A16235G, and A16293C. After comparing the data obtained to the secondary data from Mitomap and NCBI, it were found that two mutations, T16189C and T16217C, become candidates as genetic markers of type 2 diabetes even the mutations were found also in the generations of undiagnosed type 2 diabetes. The results of this study are expected to give contribution to the collection of human mtDNA database of genetic variants that associated to metabolic diseases, so that in the future it can be utilized in various fields, especially in medicine.

  17. Differential DNA Methylation Analysis without a Reference Genome.

    PubMed

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-12-22

    Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  18. Quantitative molecular diagnostic assays of grain washes for Claviceps purpurea are correlated with visual determinations of ergot contamination.

    PubMed

    Comte, Alexia; Gräfenhan, Tom; Links, Matthew G; Hemmingsen, Sean M; Dumonceaux, Tim J

    2017-01-01

    We examined the epiphytic microbiome of cereal grain using the universal barcode chaperonin-60 (cpn60). Microbial community profiling of seed washes containing DNA extracts prepared from field-grown cereal grain detected sequences from a fungus identified only to Class Sordariomycetes. To identify the fungal sequence and to improve the reference database, we determined cpn60 sequences from field-collected and reference strains of the ergot fungus, Claviceps purpurea. These data allowed us to identify this fungal sequence as deriving from C. purpurea, and suggested that C. purpurea DNA is readily detectable on agricultural commodities, including those for which ergot was not identified as a grading factor. To get a sense of the prevalence and level of C. purpurea DNA in cereal grains, we developed a quantitative PCR assay based on the fungal internal transcribed spacer (ITS) and applied it to 137 samples from the 2014 crop year. The amount of Claviceps DNA quantified correlated strongly with the proportion of ergot sclerotia identified in each grain lot, although there was evidence that non-target organisms were responsible for some false positives with the ITS-based assay. We therefore developed a cpn60-targeted loop-mediated isothermal amplification assay and applied it to the same grain wash samples. The time to positive displayed a significant, inverse correlation to ergot levels determined by visual ratings. These results indicate that both laboratory-based and field-adaptable molecular diagnostic assays can be used to detect and quantify pathogen load in bulk commodities using cereal grain washes.

  19. Quantitative molecular diagnostic assays of grain washes for Claviceps purpurea are correlated with visual determinations of ergot contamination

    PubMed Central

    Comte, Alexia; Gräfenhan, Tom; Links, Matthew G.; Hemmingsen, Sean M.

    2017-01-01

    We examined the epiphytic microbiome of cereal grain using the universal barcode chaperonin-60 (cpn60). Microbial community profiling of seed washes containing DNA extracts prepared from field-grown cereal grain detected sequences from a fungus identified only to Class Sordariomycetes. To identify the fungal sequence and to improve the reference database, we determined cpn60 sequences from field-collected and reference strains of the ergot fungus, Claviceps purpurea. These data allowed us to identify this fungal sequence as deriving from C. purpurea, and suggested that C. purpurea DNA is readily detectable on agricultural commodities, including those for which ergot was not identified as a grading factor. To get a sense of the prevalence and level of C. purpurea DNA in cereal grains, we developed a quantitative PCR assay based on the fungal internal transcribed spacer (ITS) and applied it to 137 samples from the 2014 crop year. The amount of Claviceps DNA quantified correlated strongly with the proportion of ergot sclerotia identified in each grain lot, although there was evidence that non-target organisms were responsible for some false positives with the ITS-based assay. We therefore developed a cpn60-targeted loop-mediated isothermal amplification assay and applied it to the same grain wash samples. The time to positive displayed a significant, inverse correlation to ergot levels determined by visual ratings. These results indicate that both laboratory-based and field-adaptable molecular diagnostic assays can be used to detect and quantify pathogen load in bulk commodities using cereal grain washes. PMID:28257512

  20. Using faecal DNA to determine consumption by kangaroos of plants considered palatable to sheep.

    PubMed

    Ho, K W; Krebs, G L; McCafferty, P; van Wyngaarden, S P; Addison, J

    2010-02-01

    Disagreement exists within the scientific community with regards to the level of competition for feed between sheep and kangaroos in the Australian rangelands. The greatest challenge to solving this debate is finding effective means of determining the composition of the diets of these potential grazing competitors. An option is to adopt a non-invasive approach that combines faecal collection and molecular techniques that focus on faecal DNA as the primary source of dietary information. As proof-of-concept, we show that a DNA reference data bank on plant species can be established. This DNA reference data bank was then used as a library to identify plant species in kangaroo faeces collected in the southern rangelands of Western Australia. To enhance the method development and to begin the investigation of competitive grazing between sheep and kangaroos, 16 plant species known to be palatable to sheep were initially targeted for collection. To ensure that only plant sequences were studied, PCR amplification was performed using a universal primer pair previously shown to be specific to the chloroplast transfer RNA leucine (trnL) UAA gene intron. Overall, genus-specific, single and differently sized amplicons were reliably and reproducibly generated; enabling the differentiation of reference plants by PCR product length heterogeneity. However, there were a few plants that could not be clearly differentiated on the basis of size alone. This prompted the adoption of a post-PCR step that enabled further differentiation according to base sequence variation. Restriction endonucleases make sequence-specific cleavages on DNA to produce discrete and reproducible fragments having unique sizes and base compositions. Their availability, affordability and simplicity-of-use put restriction enzyme sequence (RES) profiling as a logical post-PCR step for confirming plant species identity. We demonstrate that PCR-RES profiling of plant and faecal matter is useful for the identification of plants included in the diet of kangaroos. The limitations, potential and the opportunities created for researchers interested in investigating the diet of competing herbivores in the rangelands are discussed.

  1. Sequences of multiple bacterial genomes and a Chlamydia trachomatis genotype from direct sequencing of DNA derived from a vaginal swab diagnostic specimen.

    PubMed

    Andersson, P; Klein, M; Lilliebridge, R A; Giffard, P M

    2013-09-01

    Ultra-deep Illumina sequencing was performed on whole genome amplified DNA derived from a Chlamydia trachomatis-positive vaginal swab. Alignment of reads with reference genomes allowed robust SNP identification from the C. trachomatis chromosome and plasmid. This revealed that the C. trachomatis in the specimen was very closely related to the sequenced urogenital, serovar F, clade T1 isolate F-SW4. In addition, high genome-wide coverage was obtained for Prevotella melaninogenica, Gardnerella vaginalis, Clostridiales genomosp. BVAB3 and Mycoplasma hominis. This illustrates the potential of metagenome data to provide high resolution bacterial typing data from multiple taxa in a diagnostic specimen. ©2013 The Authors Clinical Microbiology and Infection ©2013 European Society of Clinical Microbiology and Infectious Diseases.

  2. The Release 6 reference sequence of the Drosophila melanogaster genome

    DOE PAGES

    Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.; ...

    2015-01-14

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less

  3. The Release 6 reference sequence of the Drosophila melanogaster genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less

  4. Reconstruction of DNA sequences using genetic algorithms and cellular automata: towards mutation prediction?

    PubMed

    Mizas, Ch; Sirakoulis, G Ch; Mardiris, V; Karafyllidis, I; Glykos, N; Sandaltzopoulos, R

    2008-04-01

    Change of DNA sequence that fuels evolution is, to a certain extent, a deterministic process because mutagenesis does not occur in an absolutely random manner. So far, it has not been possible to decipher the rules that govern DNA sequence evolution due to the extreme complexity of the entire process. In our attempt to approach this issue we focus solely on the mechanisms of mutagenesis and deliberately disregard the role of natural selection. Hence, in this analysis, evolution refers to the accumulation of genetic alterations that originate from mutations and are transmitted through generations without being subjected to natural selection. We have developed a software tool that allows modelling of a DNA sequence as a one-dimensional cellular automaton (CA) with four states per cell which correspond to the four DNA bases, i.e. A, C, T and G. The four states are represented by numbers of the quaternary number system. Moreover, we have developed genetic algorithms (GAs) in order to determine the rules of CA evolution that simulate the DNA evolution process. Linear evolution rules were considered and square matrices were used to represent them. If DNA sequences of different evolution steps are available, our approach allows the determination of the underlying evolution rule(s). Conversely, once the evolution rules are deciphered, our tool may reconstruct the DNA sequence in any previous evolution step for which the exact sequence information was unknown. The developed tool may be used to test various parameters that could influence evolution. We describe a paradigm relying on the assumption that mutagenesis is governed by a near-neighbour-dependent mechanism. Based on the satisfactory performance of our system in the deliberately simplified example, we propose that our approach could offer a starting point for future attempts to understand the mechanisms that govern evolution. The developed software is open-source and has a user-friendly graphical input interface.

  5. Epigenetic modifications: basic mechanisms and role in cardiovascular disease (2013 Grover Conference series).

    PubMed

    Loscalzo, Joseph; Handy, Diane E

    2014-06-01

    Epigenetics refers to heritable traits that are not a consequence of DNA sequence. Three classes of epigenetic regulation exist: DNA methylation, histone modification, and noncoding RNA action. In the cardiovascular system, epigenetic regulation affects development, differentiation, and disease propensity or expression. Defining the determinants of epigenetic regulation offers opportunities for novel strategies for disease prevention and treatment.

  6. Interbreeding among deeply divergent mitochondrial lineages in the American cockroach (Periplaneta americana)

    NASA Astrophysics Data System (ADS)

    von Beeren, Christoph; Stoeckle, Mark Y.; Xia, Joyce; Burke, Griffin; Kronauer, Daniel J. C.

    2015-02-01

    DNA barcoding promises to be a useful tool to identify pest species assuming adequate representation of genetic variants in a reference library. Here we examined mitochondrial DNA barcodes in a global urban pest, the American cockroach (Periplaneta americana). Our sampling effort generated 284 cockroach specimens, most from New York City, plus 15 additional U.S. states and six other countries, enabling the first large-scale survey of P. americana barcode variation. Periplaneta americana barcode sequences (n = 247, including 24 GenBank records) formed a monophyletic lineage separate from other Periplaneta species. We found three distinct P. americana haplogroups with relatively small differences within (<=0.6%) and larger differences among groups (2.4%-4.7%). This could be interpreted as indicative of multiple cryptic species. However, nuclear DNA sequences (n = 77 specimens) revealed extensive gene flow among mitochondrial haplogroups, confirming a single species. This unusual genetic pattern likely reflects multiple introductions from genetically divergent source populations, followed by interbreeding in the invasive range. Our findings highlight the need for comprehensive reference databases in DNA barcoding studies, especially when dealing with invasive populations that might be derived from multiple genetically distinct source populations.

  7. Storage and utilization of HLA genomic data--new approaches to HLA typing.

    PubMed

    Helmberg, W

    2000-01-01

    Currently available DNA-based HLA typing assays can provide detailed information about sequence motifs of a tested sample. It is still a common practice, however, for information acquired by high-resolution sequence specific oligonucleotide probe (SSOP) typing or sequence specific priming (SSP) to be presented in a low-resolution serological format. Unfortunately, this representation can lead to significant loss of useful data in many cases. An alternative to assigning allele equivalents to suchDNA typing results is simply to store the observed typing pattern and utilize the information with the help of Virtual DNA Analysis (VDA). Interpretation of the stored typing patterns can then be updated based on newly defined alleles, assuming the sequence motifs detected by the typing reagents are known. Rather than updating reagent specificities in individual laboratories, such updates should be performed in a central, publicly available sequence database. By referring to this database, HLA genomic data can then be stored and transferred between laboratories without loss of information. The 13th International Histocompatibility Workshop offers an ideal opportunity to begin building this common database for the entire human MHC.

  8. Fingerprinting and quantification of GMOs in the agro-food sector.

    PubMed

    Taverniers, I; Van Bockstaele, E; De Loose, M

    2003-01-01

    Most strategies for analyzing GMOs in plants and derived food and feed products, are based on the polymerase chain reaction (PCR) technique. In conventional PCR methods, a 'known' sequence between two specific primers is amplified. To the contrary, with the 'anchor PCR' technique, unknown sequences adjacent to a known sequence, can be amplified. Because T-DNA/plant border sequences are being amplified, anchor PCR is the perfect tool for unique identification of transgenes, including non-authorized GMOs. In this work, anchor PCR was applied to characterize the 'transgene locus' and to clarify the complete molecular structure of at least six different commercial transgenic plants. Based on sequences of T-DNA/plant border junctions, obtained by anchor PCR, event specific primers were developed. The junction fragments, together with endogeneous reference gene targets, were cloned in plasmids. The latter were then used as event specific calibrators in real-time PCR, a new technique for the accurate relative quantification of GMOs. We demonstrate here the importance of anchor PCR for identification and the usefulness of plasmid DNA calibrators in quantification strategies for GMOs, throughout the agro-food sector.

  9. Correction of the lack of commutability between plasmid DNA and genomic DNA for quantification of genetically modified organisms using pBSTopas as a model.

    PubMed

    Zhang, Li; Wu, Yuhua; Wu, Gang; Cao, Yinglong; Lu, Changming

    2014-10-01

    Plasmid calibrators are increasingly applied for polymerase chain reaction (PCR) analysis of genetically modified organisms (GMOs). To evaluate the commutability between plasmid DNA (pDNA) and genomic DNA (gDNA) as calibrators, a plasmid molecule, pBSTopas, was constructed, harboring a Topas 19/2 event-specific sequence and a partial sequence of the rapeseed reference gene CruA. Assays of the pDNA showed similar limits of detection (five copies for Topas 19/2 and CruA) and quantification (40 copies for Topas 19/2 and 20 for CruA) as those for the gDNA. Comparisons of plasmid and genomic standard curves indicated that the slopes, intercepts, and PCR efficiency for pBSTopas were significantly different from CRM Topas 19/2 gDNA for quantitative analysis of GMOs. Three correction methods were used to calibrate the quantitative analysis of control samples using pDNA as calibrators: model a, or coefficient value a (Cva); model b, or coefficient value b (Cvb); and the novel model c or coefficient formula (Cf). Cva and Cvb gave similar estimated values for the control samples, and the quantitative bias of the low concentration sample exceeded the acceptable range within ±25% in two of the four repeats. Using Cfs to normalize the Ct values of test samples, the estimated values were very close to the reference values (bias -13.27 to 13.05%). In the validation of control samples, model c was more appropriate than Cva or Cvb. The application of Cf allowed pBSTopas to substitute for Topas 19/2 gDNA as a calibrator to accurately quantify the GMO.

  10. Molecular Characterization of Eimeria Species Naturally Infecting Egyptian Baldi Chickens

    PubMed Central

    GADELHAQ, Sahar M; ARAFA, Waleed M; ABOELHADID, Shawky M

    2015-01-01

    Background: Coccidiosis is a serious protozoal disease of poultry. The identification of Eimeria species has important implications for diagnosis and control as well as for epidemiology. The molecular characterization of Eimeria species infecting Egyptian baladi chickens was investigated. Methods: Eimeria species oocysts were harvested from intestines of naturally infected Egyptian baldi chickens. The morphometry characterization of oocysts along with COCCIMORPH software was done. The DNA was extracted initially by freezing and thawing then the prepared samples was subjected to commercial DNA kits. The DNA products were analyzed through conventional polymerase chain reaction by using amplified region (SCAR) marker. Results: The PCR results confirmed the presence of 7 Eimeria species in the examined fecal samples of Egyptian baldi breed with their specific ampilicon sizes being E. acervulina (811bp), E. brunette (626bp), E. tenella (539bp), E. maxima (272bp), E. necatrix (200bp), E. mitis (327bp) and E. praecopx (354bp). A sequencing of the two most predominant species of Eimeria was done, on E. tenella and E. máxima. Analysis of the obtained sequences revealed high identities 99% between Egyptian isolates and the reference one. Similarly, E. maxima isolated from Egyptian baldi chickens showed 98% nucleotide identities with the reference strain. Only single nucleotide substitution was observed among the Egyptian E. tenella isolates (A181G) when compared to the reference one. The Egyptian isolates acquired 4 unique mutations (A68T, C164T, G190A and C227G) in compared with the reference sequence. Conclusion: This is the first time to identify the 7 species of Eimeria from Egyptian baladi chickens. PMID:25904950

  11. Uncovering Trophic Interactions in Arthropod Predators through DNA Shotgun-Sequencing of Gut Contents

    PubMed Central

    Paula, Débora P.; Linard, Benjamin; Crampton-Platt, Alex; Srivathsan, Amrita; Timmermans, Martijn J. T. N.; Sujii, Edison R.; Pires, Carmen S. S.; Souza, Lucas M.; Andow, David A.; Vogler, Alfried P.

    2016-01-01

    Characterizing trophic networks is fundamental to many questions in ecology, but this typically requires painstaking efforts, especially to identify the diet of small generalist predators. Several attempts have been devoted to develop suitable molecular tools to determine predatory trophic interactions through gut content analysis, and the challenge has been to achieve simultaneously high taxonomic breadth and resolution. General and practical methods are still needed, preferably independent of PCR amplification of barcodes, to recover a broader range of interactions. Here we applied shotgun-sequencing of the DNA from arthropod predator gut contents, extracted from four common coccinellid and dermapteran predators co-occurring in an agroecosystem in Brazil. By matching unassembled reads against six DNA reference databases obtained from public databases and newly assembled mitogenomes, and filtering for high overlap length and identity, we identified prey and other foreign DNA in the predator guts. Good taxonomic breadth and resolution was achieved (93% of prey identified to species or genus), but with low recovery of matching reads. Two to nine trophic interactions were found for these predators, some of which were only inferred by the presence of parasitoids and components of the microbiome known to be associated with aphid prey. Intraguild predation was also found, including among closely related ladybird species. Uncertainty arises from the lack of comprehensive reference databases and reliance on low numbers of matching reads accentuating the risk of false positives. We discuss caveats and some future prospects that could improve the use of direct DNA shotgun-sequencing to characterize arthropod trophic networks. PMID:27622637

  12. DNApod: DNA polymorphism annotation database from next-generation sequence read archives.

    PubMed

    Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

    2017-01-01

    With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information.

  13. DNApod: DNA polymorphism annotation database from next-generation sequence read archives

    PubMed Central

    Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

    2017-01-01

    With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information. PMID:28234924

  14. Genetic heterogeneity of the dnaK gene locus including transcription terminator region (TTR) in Campylobacter lari.

    PubMed

    Shitara, M; Tsuboi, Y; Sekizuka, T; Tazumi, A; Moorei, J E; Millar, B C; Taneike, I; Matsuda, M

    2008-01-01

    Nucleotide sequences of approximately 3.1 kbp consisting of the full-length open reading frame (ORF) for grpE, a non-coding (NC) region and a putative ORF for the full-length dnaK gene (1860 bp) were identified from a urease-positive thermophilic Campylobacter (UPTC) CF89-12 isolate. Then, following the construction of a new degenerate polymerase chain reaction (PCR) primer pair for amplification of the dnaK structural gene, including the transcription terminator region of C. lari isolates, the dnaK region was amplified successfully, TA-cloned and sequenced in nine C. lari isolates. The dnaK gene sequences commenced with an ATG and terminated with a TAA in all 10 isolates, including CF89-12. In addition, the putative ORFs for the dnaK gene locus from seven UPTC isolates consisted of 1860 bases, and the four urease-negative (UN) C. lari isolates included C. lari RM2100 reference strain 1866. Interestingly, different probable ribosome binding sites and hypothetically intrinsic p-independent terminator structures were identified between the seven UPTC and four UN C. lari isolates, respectively. Moreover, it is interesting to note that 20 out of a total of 28 polymorphic sites occurred among amino acid sequences of the dnaK ORF from 11 C. lari isolates, identified to be alternatively UPTC-specific or UN C. lari-specific. In the neighbour-joining tree based on the nucleotide sequence information of the dnaK gene, C. lari forms two major distinct clusters consisting of UPTC and UN C. lari isolates, respectively, with UN C. lari being more closely related to other thermophilic campylobacters than to UPTC.

  15. Aquatic environmental DNA detects seasonal fish abundance and habitat preference in an urban estuary

    PubMed Central

    Soboleva, Lyubov; Charlop-Powers, Zachary

    2017-01-01

    The difficulty of censusing marine animal populations hampers effective ocean management. Analyzing water for DNA traces shed by organisms may aid assessment. Here we tested aquatic environmental DNA (eDNA) as an indicator of fish presence in the lower Hudson River estuary. A checklist of local marine fish and their relative abundance was prepared by compiling 12 traditional surveys conducted between 1988–2015. To improve eDNA identification success, 31 specimens representing 18 marine fish species were sequenced for two mitochondrial gene regions, boosting coverage of the 12S eDNA target sequence to 80% of local taxa. We collected 76 one-liter shoreline surface water samples at two contrasting estuary locations over six months beginning in January 2016. eDNA was amplified with vertebrate-specific 12S primers. Bioinformatic analysis of amplified DNA, using a reference library of GenBank and our newly generated 12S sequences, detected most (81%) locally abundant or common species and relatively few (23%) uncommon taxa, and corresponded to seasonal presence and habitat preference as determined by traditional surveys. Approximately 2% of fish reads were commonly consumed species that are rare or absent in local waters, consistent with wastewater input. Freshwater species were rarely detected despite Hudson River inflow. These results support further exploration and suggest eDNA will facilitate fine-scale geographic and temporal mapping of marine fish populations at relatively low cost. PMID:28403183

  16. Opening the treasure chest: A DNA-barcoding primer set for most higher taxa of Central European birds and mammals from museum collections

    PubMed Central

    Schäffer, Sylvia; Zachos, Frank E.

    2017-01-01

    DNA-barcoding is a rapidly developing method for efficiently identifying samples to species level by means of short standard DNA sequences. However, reliable species assignment requires the availability of a comprehensive DNA barcode reference library, and hence numerous initiatives aim at generating such barcode databases for particular taxa or geographic regions. Historical museum collections represent a potentially invaluable source for the DNA-barcoding of many taxa. This is particularly true for birds and mammals, for which collecting fresh (voucher) material is often very difficult to (nearly) impossible due to the special animal welfare and conservation regulations that apply to vertebrates in general, and birds and mammals in particular. Moreover, even great efforts might not guarantee sufficiently complete sampling of fresh material in a short period of time. DNA extracted from historical samples is usually degraded, such that only short fragments can be amplified, rendering the recovery of the barcoding region as a single fragment impossible. Here, we present a new set of primers that allows the efficient amplification and sequencing of the entire barcoding region in most higher taxa of Central European birds and mammals in six overlapping fragments, thus greatly increasing the value of historical museum collections for generating DNA barcode reference libraries. Applying our new primer set in recently established NGS protocols promises to further increase the efficiency of barcoding old bird and mammal specimens. PMID:28358863

  17. Opening the treasure chest: A DNA-barcoding primer set for most higher taxa of Central European birds and mammals from museum collections.

    PubMed

    Schäffer, Sylvia; Zachos, Frank E; Koblmüller, Stephan

    2017-01-01

    DNA-barcoding is a rapidly developing method for efficiently identifying samples to species level by means of short standard DNA sequences. However, reliable species assignment requires the availability of a comprehensive DNA barcode reference library, and hence numerous initiatives aim at generating such barcode databases for particular taxa or geographic regions. Historical museum collections represent a potentially invaluable source for the DNA-barcoding of many taxa. This is particularly true for birds and mammals, for which collecting fresh (voucher) material is often very difficult to (nearly) impossible due to the special animal welfare and conservation regulations that apply to vertebrates in general, and birds and mammals in particular. Moreover, even great efforts might not guarantee sufficiently complete sampling of fresh material in a short period of time. DNA extracted from historical samples is usually degraded, such that only short fragments can be amplified, rendering the recovery of the barcoding region as a single fragment impossible. Here, we present a new set of primers that allows the efficient amplification and sequencing of the entire barcoding region in most higher taxa of Central European birds and mammals in six overlapping fragments, thus greatly increasing the value of historical museum collections for generating DNA barcode reference libraries. Applying our new primer set in recently established NGS protocols promises to further increase the efficiency of barcoding old bird and mammal specimens.

  18. Genetic diversity of mtDNA D-loop sequences in four native Chinese chicken breeds.

    PubMed

    Guo, H W; Li, C; Wang, X N; Li, Z J; Sun, G R; Li, G X; Liu, X J; Kang, X T; Han, R L

    2017-10-01

    1. To explore the genetic diversity of Chinese indigenous chicken breeds, a 585 bp fragment of the mitochondrial DNA (mtDNA) region was sequenced in 102 birds from the Xichuan black-bone chicken, Yunyang black-bone chicken and Lushi chicken. In addition, 30 mtDNA D-loop sequences of Silkie fowls were downloaded from NCBI. The mtDNA D-loop sequence polymorphism and maternal origin of 4 chicken breeds were analysed in this study. 2. The results showed that a total of 33 mutation sites and 28 haplotypes were detected in the 4 chicken breeds. The haplotype diversity and nucleotide diversity of these 4 native breeds were 0.916 ± 0.014 and 0.012 ± 0.002, respectively. Three clusters were formed in 4 Chinese native chickens and 12 reference breeds. Both the Xichuan black-bone chicken and Yunyang black-bone chicken were grouped into one cluster. Four haplogroups (A, B, C and E) emerged in the median-joining network in these breeds. 3. It was concluded that these 4 Chinese chicken breeds had high genetic diversity. The phylogenetic tree and median network profiles showed that Chinese native chickens and its neighbouring countries had at least two maternal origins, one from Yunnan, China and another from Southeast Asia or its surrounding area.

  19. Massively parallel pyrosequencing of the mitochondrial genome with the 454 methodology in forensic genetics.

    PubMed

    Mikkelsen, Martin; Frank-Hansen, Rune; Hansen, Anders J; Morling, Niels

    2014-09-01

    of sequencing of whole mitochondrial genome, HV1 and HV2 DNA with the second generation system (SGS) Roche 454 GS Junior were compared with results of Sanger sequencing and SNP typing with SNaPshot single base extension detected with MALDI-TOF and capillary electrophoresis. We investigated the performance of the software analysis of the data, reproducibility, ability to sequence homopolymeric regions, detection of mixtures and heteroplasmy as well as the implications of the depth of coverage. We found full reproducibility between samples sequenced twice with SGS. We found close to full concordance between the mtDNA sequences of 26 samples obtained with (1) the 454 SGS method using a depth of coverage above 100 and (2) Sanger sequencing and SNP typing. The discrepancies were primarily observed in homopolymeric regions. The 454 SGS method was able to sequence 95% of the reads correctly in homopolymers up to 4 bases, and up to 6 bases could be sequenced with similar success if the results were carefully, visually inspected. The 454 technology was able to detect mixtures or heteroplasmy of approximately 10%. We detected previously unreported heteroplasmy in the GM9947A component of the NIST human mitochondrial DNA SRM-2392 standard reference material. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  20. DNA barcode data accurately assign higher spider taxa

    PubMed Central

    Coddington, Jonathan A.; Agnarsson, Ingi; Cheng, Ren-Chung; Čandek, Klemen; Driskell, Amy; Frick, Holger; Gregorič, Matjaž; Kostanjšek, Rok; Kropf, Christian; Kweskin, Matthew; Lokovšek, Tjaša; Pipan, Miha; Vidergar, Nina

    2016-01-01

    The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75–100%). Accurate assignment of higher taxa (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades. PMID:27547527

  1. Organizational heterogeneity of vertebrate genomes.

    PubMed

    Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham

    2012-01-01

    Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.

  2. MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

    PubMed

    Grimes, Susan M; Ji, Hanlee P

    2014-08-27

    Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

  3. Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

    PubMed

    Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

    2016-02-27

    In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.

  4. Use of ancient sedimentary DNA as a novel conservation tool for high-altitude tropical biodiversity.

    PubMed

    Boessenkool, Sanne; McGlynn, Gayle; Epp, Laura S; Taylor, David; Pimentel, Manuel; Gizaw, Abel; Nemomissa, Sileshi; Brochmann, Christian; Popp, Magnus

    2014-04-01

    Conservation of biodiversity may in the future increasingly depend upon the availability of scientific information to set suitable restoration targets. In traditional paleoecology, sediment-based pollen provides a means to define preanthropogenic impact conditions, but problems in establishing the exact provenance and ecologically meaningful levels of taxonomic resolution of the evidence are limiting. We explored the extent to which the use of sedimentary ancient DNA (sedaDNA) may complement pollen data in reconstructing past alpine environments in the tropics. We constructed a record of afro-alpine plants retrieved from DNA preserved in sediment cores from 2 volcanic crater sites in the Albertine Rift, eastern Africa. The record extended well beyond the onset of substantial anthropogenic effects on tropical mountains. To ensure high-quality taxonomic inference from the sedaDNA sequences, we built an extensive DNA reference library covering the majority of the afro-alpine flora, by sequencing DNA from taxonomically verified specimens. Comparisons with pollen records from the same sediment cores showed that plant diversity recovered with sedaDNA improved vegetation reconstructions based on pollen records by revealing both additional taxa and providing increased taxonomic resolution. Furthermore, combining the 2 measures assisted in distinguishing vegetation change at different geographic scales; sedaDNA almost exclusively reflects local vegetation, whereas pollen can potentially originate from a wide area that in highlands in particular can span several ecozones. Our results suggest that sedaDNA may provide information on restoration targets and the nature and magnitude of human-induced environmental changes, including in high conservation priority, biodiversity hotspots, where understanding of preanthropogenic impact (or reference) conditions is highly limited. © 2013 Society for Conservation Biology.

  5. Plans and progress for building a Great Lakes fauna DNA ...

    EPA Pesticide Factsheets

    DNA reference libraries provide researchers with an important tool for assessing regional biodiversity by allowing unknown genetic sequences to be assigned identities, while also providing a means for taxonomists to validate identifications. Expanding the representation of Great Lakes species in such reference libraries is an explicit component of research at EPA’s Mid-Continent Ecology Division. Our DNA reference library building efforts began in 2012 with the goal of providing barcodes for at least 5 specimens of each native and nonindigenous fish and aquatic invertebrate species currently present in the Great Lakes. The approach is to pull taxonomically validated specimen for sequencing from EPA led sampling efforts of adult/juvenile fish, larval fish, benthic macroinvertebrates, and zooplankton; while also soliciting aid from state and federal agencies for tissue from “shopping list” organisms. The barcodes we generate are made available through the publicly accessible BOLD (Barcode of Life) database, and help inform a planned Great Lakes biodiversity inventory. To date, our submissions to BOLD are limited to fishes; of the 88 fish species listed as being present within Lake Superior, roughly half were successfully barcoded, while only 22 species met the desired quota of 5 barcoded specimens per species. As we continue to generate genomic information from our collections and the taxonomic representations become more complete, we will continue to

  6. Equid herpesvirus 9 (EHV-9) isolates from zebras in Ontario, Canada, 1989 to 2007.

    PubMed

    Rebelo, Ana Rita; Carman, Susy; Shapiro, Jan; van Dreumel, Tony; Hazlett, Murray; Nagy, Éva

    2015-04-01

    The objective of this study was to identify and partially characterize 3 equid herpesviruses that were isolated postmortem from zebras in Ontario, Canada in 1989, 2002, and 2007. These 3 virus isolates were characterized by plaque morphology, restriction fragment length polymorphism (RFLP) of their genomic deoxyribonucleic acid (DNA), real-time polymerase chain reaction (PCR) assay, and sequence analyses of the full length of the glycoprotein G (gG) gene (ORF70) and a portion of the DNA polymerase gene (ORF30). The isolates were also compared to 3 reference strains of equid herpesvirus 1 (EHV-1). Using rabbit kidney cells, the plaques for the isolates from the zebras were found to be much larger in size than the EHV-1 reference strains. The RFLP patterns of the zebra viruses differed among each other and from those of the EHV-1 reference strains. Real-time PCR and sequence analysis of a portion of the DNA polymerase gene determined that the herpesvirus isolates from the zebras contained a G at nucleotide 2254 and a corresponding N at amino acid position 752, which suggested that they could be neuropathogenic EHV-1 strains. However, subsequent phylogenetic analysis of the gG gene suggested that they were EHV-9 and not EHV-1.

  7. Investigation of occult hepatitis B virus infection in anti-hbc positive patients from a liver clinic.

    PubMed

    Martinez, Maria Carmela; Kok, Chee Choy; Baleriola, Cristina; Robertson, Peter; Rawlinson, William D

    2015-01-01

    Occult hepatitis B infection (OBI) is manifested by presence of very low levels (<200IU/mL) of Hepatitis B viral DNA (HBV DNA) in the blood and the liver while exhibiting undetectable HBV surface antigen (HBsAg). The molecular mechanisms underlying this occurrence are still not completely understood. This study investigated the prevalence of OBI in a high-risk Australian population and compared the HBV S gene sequences of our cohort with reference sequences. Serum from HBV DNA positive, HBsAg negative, and hepatitis B core antibody (anti-HBc) positive patients (study cohort) were obtained from samples tested at SEALS Serology Laboratory using the Abbott Architect, as part of screening and diagnostic testing. From a total of 228,108 samples reviewed, 1,451 patients were tested for all three OBI markers. Only 10 patients (0.69%) out of the 1,451 patients were found to fit the selection criteria for OBI. Sequence analysis of the HBV S gene from 5 suspected OBI infected patients showed increased sequence variability in the 'a' epitope of the major hydrophilic region compared to reference sequences. In addition, a total of eight consistent nucleotide substitutions resulting in seven amino acid changes were observed, and three patients had truncated S gene sequence. These mutations appeared to be stable and may result in alterations in HBsAg conformation. These may negatively impact the affinity of hepatitis B surface antibody (anti-HBs) and may explain the false negative results in serological HBV diagnosis. These changes may also enable the virus to persist in the liver by evading immune surveillance. Further studies on a bigger cohort are required to determine whether these amino acid variations have been acquired in the process of immune escape and serve as markers of OBI.

  8. Short-read, high-throughput sequencing technology for STR genotyping

    PubMed Central

    Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.

    2013-01-01

    DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315

  9. Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

    PubMed Central

    Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei

    2013-01-01

    Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042

  10. Bacterial Degraders of Coexisting Dichloromethane, Benzene, and Toluene, Identified by Stable-Isotope Probing.

    PubMed

    Yoshikawa, Miho; Zhang, Ming; Kurisu, Futoshi; Toyota, Koki

    2017-01-01

    Most bioremediation studies on volatile organic compounds (VOCs) have focused on a single contaminant or its derived compounds and degraders have been identified under single contaminant conditions. Bioremediation of multiple contaminants remains a challenging issue. To identify a bacterial consortium that degrades multiple VOCs (dichloromethane (DCM), benzene, and toluene), we applied DNA-stable isotope probing. For individual tests, we combined a 13 C-labeled VOC with other two unlabeled VOCs, and prepared three unlabeled VOCs as a reference. Over 11 days, DNA was periodically extracted from the consortia, and the bacterial community was evaluated by next-generation sequencing of bacterial 16S rRNA gene amplicons. Density gradient fractions of the DNA extracts were amplified by universal bacterial primers for the 16S rRNA gene sequences, and the amplicons were analyzed by terminal restriction fragment length polymorphism (T-RFLP) using restriction enzymes: Hha I and Msp I. The T-RFLP fragments were identified by 16S rRNA gene cloning and sequencing. Under all test conditions, the consortia were dominated by Rhodanobacter , Bradyrhizobium / Afipia , Rhizobium , and Hyphomicrobium . DNA derived from Hyphomicrobium and Propioniferax shifted toward heavier fractions under the condition added with 13 C-DCM and 13 C-benzene, respectively, compared with the reference, but no shifts were induced by 13 C-toluene addition. This implies that Hyphomicrobium and Propioniferax were the main DCM and benzene degraders, respectively, under the coexisting condition. The known benzene degrader Pseudomonas sp. was present but not actively involved in the degradation.

  11. A Comprehensive, Automatically Updated Fungal ITS Sequence Dataset for Reference-Based Chimera Control in Environmental Sequencing Efforts.

    PubMed

    Nilsson, R Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M; Bengtsson-Palme, Johan; Walker, Donald M; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C; Abarenkov, Kessy

    2015-01-01

    The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric-artificially joined-DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation.

  12. A Comprehensive, Automatically Updated Fungal ITS Sequence Dataset for Reference-Based Chimera Control in Environmental Sequencing Efforts

    PubMed Central

    Nilsson, R. Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M.; Bengtsson-Palme, Johan; Walker, Donald M.; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C.; Abarenkov, Kessy

    2015-01-01

    The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric—artificially joined—DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation. PMID:25786896

  13. Sequence-dependent effects in drug-DNA interaction: the crystal structure of Hoechst 33258 bound to the d(CGCAAATTTGCG)2 duplex.

    PubMed Central

    Spink, N; Brown, D G; Skelly, J V; Neidle, S

    1994-01-01

    The bis-benzimidazole drug Hoechst 33258 has been co-crystallized with the dodecanucleotide sequence d(CGCAAATTTGCG)2. The structure has been solved by molecular replacement and refined to an R factor of 18.5% for 2125 reflections collected on a Xentronics area detector. The drug is bound in the minor groove, at the five base-pair site 5'-ATTTG and is in a unique orientation. This is displaced by one base pair in the 5' direction compared to previously-determined structures of this drug with the sequence d(CGCGAATTCGCG)2. Reasons for this difference in behaviour are discussed in terms of several sequence-dependent structural features of the DNA, with particular reference to differences in propeller twist and minor-groove width. Images PMID:7515488

  14. Determination of fetal DNA fraction from the plasma of pregnant women using sequence read counts.

    PubMed

    Kim, Sung K; Hannum, Gregory; Geis, Jennifer; Tynan, John; Hogg, Grant; Zhao, Chen; Jensen, Taylor J; Mazloom, Amin R; Oeth, Paul; Ehrich, Mathias; van den Boom, Dirk; Deciu, Cosmin

    2015-08-01

    This study introduces a novel method, referred to as SeqFF, for estimating the fetal DNA fraction in the plasma of pregnant women and to infer the underlying mechanism that allows for such statistical modeling. Autosomal regional read counts from whole-genome massively parallel single-end sequencing of circulating cell-free DNA (ccfDNA) from the plasma of 25 312 pregnant women were used to train a multivariate model. The pretrained model was then applied to 505 pregnant samples to assess the performance of SeqFF against known methodologies for fetal DNA fraction calculations. Pearson's correlation between chromosome Y and SeqFF for pregnancies with male fetuses from two independent cohorts ranged from 0.932 to 0.938. Comparison between a single-nucleotide polymorphism-based approach and SeqFF yielded a Pearson's correlation of 0.921. Paired-end sequencing suggests that shorter ccfDNA, that is, less than 150 bp in length, is nonuniformly distributed across the genome. Regions exhibiting an increased proportion of short ccfDNA, which are more likely of fetal origin, tend to provide more information in the SeqFF calculations. SeqFF is a robust and direct method to determine fetal DNA fraction. Furthermore, the method is applicable to both male and female pregnancies and can greatly improve the accuracy of noninvasive prenatal testing for fetal copy number variation. © 2015 John Wiley & Sons, Ltd.

  15. TREE2FASTA: a flexible Perl script for batch extraction of FASTA sequences from exploratory phylogenetic trees.

    PubMed

    Sauvage, Thomas; Plouviez, Sophie; Schmidt, William E; Fredericq, Suzanne

    2018-03-05

    The body of DNA sequence data lacking taxonomically informative sequence headers is rapidly growing in user and public databases (e.g. sequences lacking identification and contaminants). In the context of systematics studies, sorting such sequence data for taxonomic curation and/or molecular diversity characterization (e.g. crypticism) often requires the building of exploratory phylogenetic trees with reference taxa. The subsequent step of segregating DNA sequences of interest based on observed topological relationships can represent a challenging task, especially for large datasets. We have written TREE2FASTA, a Perl script that enables and expedites the sorting of FASTA-formatted sequence data from exploratory phylogenetic trees. TREE2FASTA takes advantage of the interactive, rapid point-and-click color selection and/or annotations of tree leaves in the popular Java tree-viewer FigTree to segregate groups of FASTA sequences of interest to separate files. TREE2FASTA allows for both simple and nested segregation designs to facilitate the simultaneous preparation of multiple data sets that may overlap in sequence content.

  16. Preparation of reference material for UGT1A1 (TA)n polymorphism genotyping.

    PubMed

    Mlakar, Vid; Mlakar, Simona Jurković; Marc, Janja; Ostanek, Barbara

    2014-08-05

    Gilbert's syndrome is one of the most common metabolic syndromes in the human population characterised by mild unconjugated hyperbilirubinemia resulting from reduced activity of the bilirubin conjugating enzyme UDP-glucuronosyltransferase (UGT1A1). Although Gilbert's syndrome is usually quite benign UGT1A1(TA)n genotyping is important in exclusion of more serious causes of hyperbilirubinemia and since it has significant implications for personalised medicine. The aim of our study was to develop plasmid based reference materials which could be used for UGT1A1(TA)n genotyping. Plasmids were generated using recombinant DNA technology and their number of repeats as well as the entire sequence verified by Sanger sequencing. Their suitability as reference materials was tested using sizing by capillary electrophoresis and denaturing high performance liquid chromatography. Plasmids containing all four different alleles (TA)5, (TA)6, (TA)7 and (TA)8 that are present in the human population as well as a plasmid with (TA)4 repeats were successfully generated. Prepared plasmid reference materials allow the creation of all possible UGT1A1(TA)n polymorphism genotypes and can serve as an efficient substitute for the human genomic DNA reference material in routine genotyping and in the development of new genotyping tests. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Uncommonly isolated clinical Pseudomonas: identification and phylogenetic assignation.

    PubMed

    Mulet, M; Gomila, M; Ramírez, A; Cardew, S; Moore, E R B; Lalucat, J; García-Valdés, E

    2017-02-01

    Fifty-two Pseudomonas strains that were difficult to identify at the species level in the phenotypic routine characterizations employed by clinical microbiology laboratories were selected for genotypic-based analysis. Species level identifications were done initially by partial sequencing of the DNA dependent RNA polymerase sub-unit D gene (rpoD). Two other gene sequences, for the small sub-unit ribosonal RNA (16S rRNA) and for DNA gyrase sub-unit B (gyrB) were added in a multilocus sequence analysis (MLSA) study to confirm the species identifications. These sequences were analyzed with a collection of reference sequences from the type strains of 161 Pseudomonas species within an in-house multi-locus sequence analysis database. Whole-cell matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) analyses of these strains complemented the DNA sequenced-based phylogenetic analyses and were observed to be in accordance with the results of the sequence data. Twenty-three out of 52 strains were assigned to 12 recognized species not commonly detected in clinical specimens and 29 (56 %) were considered representatives of at least ten putative new species. Most strains were distributed within the P. fluorescens and P. aeruginosa lineages. The value of rpoD sequences in species-level identifications for Pseudomonas is emphasized. The correct species identifications of clinical strains is essential for establishing the intrinsic antibiotic resistance patterns and improved treatment plans.

  18. Environmental Barcoding: A Next-Generation Sequencing Approach for Biomonitoring Applications Using River Benthos

    PubMed Central

    Hajibabaei, Mehrdad; Shokralla, Shadi; Zhou, Xin; Singer, Gregory A. C.; Baird, Donald J.

    2011-01-01

    Timely and accurate biodiversity analysis poses an ongoing challenge for the success of biomonitoring programs. Morphology-based identification of bioindicator taxa is time consuming, and rarely supports species-level resolution especially for immature life stages. Much work has been done in the past decade to develop alternative approaches for biodiversity analysis using DNA sequence-based approaches such as molecular phylogenetics and DNA barcoding. On-going assembly of DNA barcode reference libraries will provide the basis for a DNA-based identification system. The use of recently introduced next-generation sequencing (NGS) approaches in biodiversity science has the potential to further extend the application of DNA information for routine biomonitoring applications to an unprecedented scale. Here we demonstrate the feasibility of using 454 massively parallel pyrosequencing for species-level analysis of freshwater benthic macroinvertebrate taxa commonly used for biomonitoring. We designed our experiments in order to directly compare morphology-based, Sanger sequencing DNA barcoding, and next-generation environmental barcoding approaches. Our results show the ability of 454 pyrosequencing of mini-barcodes to accurately identify all species with more than 1% abundance in the pooled mixture. Although the approach failed to identify 6 rare species in the mixture, the presence of sequences from 9 species that were not represented by individuals in the mixture provides evidence that DNA based analysis may yet provide a valuable approach in finding rare species in bulk environmental samples. We further demonstrate the application of the environmental barcoding approach by comparing benthic macroinvertebrates from an urban region to those obtained from a conservation area. Although considerable effort will be required to robustly optimize NGS tools to identify species from bulk environmental samples, our results indicate the potential of an environmental barcoding approach for biomonitoring programs. PMID:21533287

  19. Comparative molecular cytogenetic analyses of a major tandemly repeated DNA family and retrotransposon sequences in cultivated jute Corchorus species (Malvaceae).

    PubMed

    Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas

    2013-07-01

    The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100-500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S-5·8S-25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species.

  20. Isolation and sequence characterization of DNA-A genome of a new begomovirus strain associated with severe leaf curling symptoms of Jatropha curcas L.

    PubMed

    Chauhan, Sushma; Rahman, Hifzur; Mastan, Shaik G; Pamidimarri, D V N Sudheer; Reddy, Muppala P

    2018-07-20

    Begomoviruses belong to the family Geminiviridae are associated with several disease symptoms, such as mosaic and leaf curling in Jatropha curcas. The molecular characterization of these viral strains will help in developing management strategies to control the disease. In this study, J. curcas that was infected with begomovirus and showed acute leaf curling symptoms were identified. DNA-A segment from pathogenic viral strain was isolated and sequenced. The sequenced genome was assembled and characterized in detail. The full-length DNA-A sequence was covered by primer walking. The genome sequence showed the general organization of DNA-A from begomovirus by the distribution of ORFs in both viral and anti-viral strands. The genome size ranged from 2844 bp-2852 bp. Three strains with minor nucleotide variations were identified, and a phylogenetic analysis was performed by comparing the DNA-A segments from other reported begomovirus isolates. The maximum sequence similarity was observed with Euphorbia yellow mosaic virus (FN435995). In the phylogenetic tree, no clustering was observed with previously reported begomovirus strains isolated from J. curcas host. The strains isolated in this study belong to new begomoviral strain that elicits symptoms of leaf curling in J. curcas. The results indicate that the probable origin of the strains is from Jatropha mosaic virus infecting J. gassypifolia. The strains isolated in this study are referred as Jatropha curcas leaf curl India virus (JCLCIV) based on the major symptoms exhibited by host J. curcas. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Molecular Identification of Commercialized Medicinal Plants in Southern Morocco

    PubMed Central

    Krüger, Åsa; Rydberg, Anders; Abbad, Abdelaziz; Björk, Lars; Martin, Gary

    2012-01-01

    Background Medicinal plant trade is important for local livelihoods. However, many medicinal plants are difficult to identify when they are sold as roots, powders or bark. DNA barcoding involves using a short, agreed-upon region of a genome as a unique identifier for species– ideally, as a global standard. Research Question What is the functionality, efficacy and accuracy of the use of barcoding for identifying root material, using medicinal plant roots sold by herbalists in Marrakech, Morocco, as a test dataset. Methodology In total, 111 root samples were sequenced for four proposed barcode regions rpoC1, psbA-trnH, matK and ITS. Sequences were searched against a tailored reference database of Moroccan medicinal plants and their closest relatives using BLAST and Blastclust, and through inference of RAxML phylograms of the aligned market and reference samples. Principal Findings Sequencing success was high for rpoC1, psbA-trnH, and ITS, but low for matK. Searches using rpoC1 alone resulted in a number of ambiguous identifications, indicating insufficient DNA variation for accurate species-level identification. Combining rpoC1, psbA-trnH and ITS allowed the majority of the market samples to be identified to genus level. For a minority of the market samples, the barcoding identification differed significantly from previous hypotheses based on the vernacular names. Conclusions/Significance Endemic plant species are commercialized in Marrakech. Adulteration is common and this may indicate that the products are becoming locally endangered. Nevertheless the majority of the traded roots belong to species that are common and not known to be endangered. A significant conclusion from our results is that unknown samples are more difficult to identify than earlier suggested, especially if the reference sequences were obtained from different populations. A global barcoding database should therefore contain sequences from different populations of the same species to assure the reference sequences characterize the species throughout its distributional range. PMID:22761800

  2. Digital PCR methods improve detection sensitivity and measurement precision of low abundance mtDNA deletions.

    PubMed

    Belmonte, Frances R; Martin, James L; Frescura, Kristin; Damas, Joana; Pereira, Filipe; Tarnopolsky, Mark A; Kaufman, Brett A

    2016-04-28

    Mitochondrial DNA (mtDNA) mutations are a common cause of primary mitochondrial disorders, and have also been implicated in a broad collection of conditions, including aging, neurodegeneration, and cancer. Prevalent among these pathogenic variants are mtDNA deletions, which show a strong bias for the loss of sequence in the major arc between, but not including, the heavy and light strand origins of replication. Because individual mtDNA deletions can accumulate focally, occur with multiple mixed breakpoints, and in the presence of normal mtDNA sequences, methods that detect broad-spectrum mutations with enhanced sensitivity and limited costs have both research and clinical applications. In this study, we evaluated semi-quantitative and digital PCR-based methods of mtDNA deletion detection using double-stranded reference templates or biological samples. Our aim was to describe key experimental assay parameters that will enable the analysis of low levels or small differences in mtDNA deletion load during disease progression, with limited false-positive detection. We determined that the digital PCR method significantly improved mtDNA deletion detection sensitivity through absolute quantitation, improved precision and reduced assay standard error.

  3. Digital PCR methods improve detection sensitivity and measurement precision of low abundance mtDNA deletions

    PubMed Central

    Belmonte, Frances R.; Martin, James L.; Frescura, Kristin; Damas, Joana; Pereira, Filipe; Tarnopolsky, Mark A.; Kaufman, Brett A.

    2016-01-01

    Mitochondrial DNA (mtDNA) mutations are a common cause of primary mitochondrial disorders, and have also been implicated in a broad collection of conditions, including aging, neurodegeneration, and cancer. Prevalent among these pathogenic variants are mtDNA deletions, which show a strong bias for the loss of sequence in the major arc between, but not including, the heavy and light strand origins of replication. Because individual mtDNA deletions can accumulate focally, occur with multiple mixed breakpoints, and in the presence of normal mtDNA sequences, methods that detect broad-spectrum mutations with enhanced sensitivity and limited costs have both research and clinical applications. In this study, we evaluated semi-quantitative and digital PCR-based methods of mtDNA deletion detection using double-stranded reference templates or biological samples. Our aim was to describe key experimental assay parameters that will enable the analysis of low levels or small differences in mtDNA deletion load during disease progression, with limited false-positive detection. We determined that the digital PCR method significantly improved mtDNA deletion detection sensitivity through absolute quantitation, improved precision and reduced assay standard error. PMID:27122135

  4. Maternal Plasma DNA and RNA Sequencing for Prenatal Testing.

    PubMed

    Tamminga, Saskia; van Maarle, Merel; Henneman, Lidewij; Oudejans, Cees B M; Cornel, Martina C; Sistermans, Erik A

    2016-01-01

    Cell-free DNA (cfDNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide polymorphism-based approaches, fetal cfDNA in maternal plasma can be analyzed to screen for rhesus D genotype, common chromosomal aneuploidies, and increasingly for testing other conditions, including monogenic disorders. With regard to screening for common aneuploidies, challenges arise when implementing NIPT in current prenatal settings. Depending on the method used (targeted or nontargeted), chromosomal anomalies other than trisomy 21, 18, or 13 can be detected, either of fetal or maternal origin, also referred to as unsolicited or incidental findings. For various biological reasons, there is a small chance of having either a false-positive or false-negative NIPT result, or no result, also referred to as a "no-call." Both pre- and posttest counseling for NIPT should include discussing potential discrepancies. Since NIPT remains a screening test, a positive NIPT result should be confirmed by invasive diagnostic testing (either by chorionic villus biopsy or by amniocentesis). As the scope of NIPT is widening, professional guidelines need to discuss the ethics of what to offer and how to offer. In this review, we discuss the current biochemical, clinical, and ethical challenges of cfDNA testing in the prenatal setting and its future perspectives including novel applications that target RNA instead of DNA. © 2016 Elsevier Inc. All rights reserved.

  5. Comparison of Ion Personal Genome Machine Platforms for the Detection of Variants in BRCA1 and BRCA2.

    PubMed

    Hwang, Sang Mee; Lee, Ki Chan; Lee, Min Seob; Park, Kyoung Un

    2018-01-01

    Transition to next generation sequencing (NGS) for BRCA1 / BRCA2 analysis in clinical laboratories is ongoing but different platforms and/or data analysis pipelines give different results resulting in difficulties in implementation. We have evaluated the Ion Personal Genome Machine (PGM) Platforms (Ion PGM, Ion PGM Dx, Thermo Fisher Scientific) for the analysis of BRCA1 /2. The results of Ion PGM with OTG-snpcaller, a pipeline based on Torrent mapping alignment program and Genome Analysis Toolkit, from 75 clinical samples and 14 reference DNA samples were compared with Sanger sequencing for BRCA1 / BRCA2 . Ten clinical samples and 14 reference DNA samples were additionally sequenced by Ion PGM Dx with Torrent Suite. Fifty types of variants including 18 pathogenic or variants of unknown significance were identified from 75 clinical samples and known variants of the reference samples were confirmed by Sanger sequencing and/or NGS. One false-negative results were present for Ion PGM/OTG-snpcaller for an indel variant misidentified as a single nucleotide variant. However, eight discordant results were present for Ion PGM Dx/Torrent Suite with both false-positive and -negative results. A 40-bp deletion, a 4-bp deletion and a 1-bp deletion variant was not called and a false-positive deletion was identified. Four other variants were misidentified as another variant. Ion PGM/OTG-snpcaller showed acceptable performance with good concordance with Sanger sequencing. However, Ion PGM Dx/Torrent Suite showed many discrepant results not suitable for use in a clinical laboratory, requiring further optimization of the data analysis for calling variants.

  6. A single amino-acid substitution in the Ets domain alters core DNA binding specificity of Ets1 to that of the related transcription factors Elf1 and E74.

    PubMed

    Bosselut, R; Levin, J; Adjadj, E; Ghysdael, J

    1993-11-11

    Ets proteins form a family of sequence specific DNA binding proteins which bind DNA through a 85 aminoacids conserved domain, the Ets domain, whose sequence is unrelated to any other characterized DNA binding domain. Unlike all other known Ets proteins, which bind specific DNA sequences centered over either GGAA or GGAT core motifs, E74 and Elf1 selectively bind to GGAA corecontaining sites. Elf1 and E74 differ from other Ets proteins in three residues located in an otherwise highly conserved region of the Ets domain, referred to as conserved region III (CRIII). We show that a restricted selectivity for GGAA core-containing sites could be conferred to Ets1 upon changing a single lysine residue within CRIII to the threonine found in Elf1 and E74 at this position. Conversely, the reciprocal mutation in Elf1 confers to this protein the ability to bind to GGAT core containing EBS. This, together with the fact that mutation of two invariant arginine residues in CRIII abolishes DNA binding, indicates that CRIII plays a key role in Ets domain recognition of the GGAA/T core motif and lead us to discuss a model of Ets proteins--core motif interaction.

  7. Identification of Medically Important Yeasts Using PCR-Based Detection of DNA Sequence Polymorphisms in the Internal Transcribed Spacer 2 Region of the rRNA Genes

    PubMed Central

    Chen, Y. C.; Eisner, J. D.; Kattar, M. M.; Rassoulian-Barrett, S. L.; LaFe, K.; Yarfitz, S. L.; Limaye, A. P.; Cookson, B. T.

    2000-01-01

    Identification of medically relevant yeasts can be time-consuming and inaccurate with current methods. We evaluated PCR-based detection of sequence polymorphisms in the internal transcribed spacer 2 (ITS2) region of the rRNA genes as a means of fungal identification. Clinical isolates (401), reference strains (6), and type strains (27), representing 34 species of yeasts were examined. The length of PCR-amplified ITS2 region DNA was determined with single-base precision in less than 30 min by using automated capillary electrophoresis. Unique, species-specific PCR products ranging from 237 to 429 bp were obtained from 92% of the clinical isolates. The remaining 8%, divided into groups with ITS2 regions which differed by ≤2 bp in mean length, all contained species-specific DNA sequences easily distinguishable by restriction enzyme analysis. These data, and the specificity of length polymorphisms for identifying yeasts, were confirmed by DNA sequence analysis of the ITS2 region from 93 isolates. Phenotypic and ITS2-based identification was concordant for 427 of 434 yeast isolates examined using sequence identity of ≥99%. Seven clinical isolates contained ITS2 sequences that did not agree with their phenotypic identification, and ITS2-based phylogenetic analyses indicate the possibility of new or clinically unusual species in the Rhodotorula and Candida genera. This work establishes an initial database, validated with over 400 clinical isolates, of ITS2 length and sequence polymorphisms for 34 species of yeasts. We conclude that size and restriction analysis of PCR-amplified ITS2 region DNA is a rapid and reliable method to identify clinically significant yeasts, including potentially new or emerging pathogenic species. PMID:10834993

  8. The blood DNA virome in 8,000 humans.

    PubMed

    Moustafa, Ahmed; Xie, Chao; Kirkness, Ewen; Biggs, William; Wong, Emily; Turpaz, Yaron; Bloom, Kenneth; Delwart, Eric; Nelson, Karen E; Venter, J Craig; Telenti, Amalio

    2017-03-01

    The characterization of the blood virome is important for the safety of blood-derived transfusion products, and for the identification of emerging pathogens. We explored non-human sequence data from whole-genome sequencing of blood from 8,240 individuals, none of whom were ascertained for any infectious disease. Viral sequences were extracted from the pool of sequence reads that did not map to the human reference genome. Analyses sifted through close to 1 Petabyte of sequence data and performed 0.5 trillion similarity searches. With a lower bound for identification of 2 viral genomes/100,000 cells, we mapped sequences to 94 different viruses, including sequences from 19 human DNA viruses, proviruses and RNA viruses (herpesviruses, anelloviruses, papillomaviruses, three polyomaviruses, adenovirus, HIV, HTLV, hepatitis B, hepatitis C, parvovirus B19, and influenza virus) in 42% of the study participants. Of possible relevance to transfusion medicine, we identified Merkel cell polyomavirus in 49 individuals, papillomavirus in blood of 13 individuals, parvovirus B19 in 6 individuals, and the presence of herpesvirus 8 in 3 individuals. The presence of DNA sequences from two RNA viruses was unexpected: Hepatitis C virus is revealing of an integration event, while the influenza virus sequence resulted from immunization with a DNA vaccine. Age, sex and ancestry contributed significantly to the prevalence of infection. The remaining 75 viruses mostly reflect extensive contamination of commercial reagents and from the environment. These technical problems represent a major challenge for the identification of novel human pathogens. Increasing availability of human whole-genome sequences will contribute substantial amounts of data on the composition of the normal and pathogenic human blood virome. Distinguishing contaminants from real human viruses is challenging.

  9. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    PubMed

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well.

  10. Equine infectious anemia virus in naturally infected horses from the Brazilian Pantanal.

    PubMed

    Cursino, Andreia Elisa; Vilela, Ana Paula Pessoa; Franco-Luiz, Ana Paula Moreira; de Oliveira, Jaquelline Germano; Nogueira, Márcia Furlan; Júnior, João Pessoa Araújo; de Aguiar, Daniel Moura; Kroon, Erna Geessien

    2018-05-11

    Equine infectious anemia (EIA) has a worldwide distribution, and is widespread in Brazil. The Brazilian Pantanal presents with high prevalence comprising equine performance and indirectly the livestock industry, since the horses are used for cattle management. Although EIA is routinely diagnosed by the agar gel immunodiffusion test (AGID), this serological assay has some limitations, so PCR-based detection methods have the potential to overcome these limitations and act as complementary tests to those currently used. Considering the limited number of equine infectious anemia virus (EIAV) sequences which are available in public databases and the great genome variability, studies of EIAV detection and characterization molecular remain important. In this study we detected EIAV proviral DNA from 23 peripheral blood mononuclear cell (PBMCs) samples of naturally infected horses from Brazilian Pantanal using a semi-nested-PCR (sn-PCR). The serological profile of the animals was also evaluated by AGID and ELISA for gp90 and p26. Furthermore, the EIAV PCR amplified DNA was sequenced and phylogenetically analyzed. Here we describe the first EIAV sequences of the 5' LTR of the tat gene in naturally infected horses from Brazil, which presented with 91% similarity to EIAV reference sequences. The Brazilian EIAV sequences also presented variable nucleotide similarities among themselves, ranging from 93,5% to 100%. Phylogenetic analysis showed that Brazilian EIAV sequences grouped in a separate clade relative to other reference sequences. Thus this molecular detection and characterization may provide information about EIAV circulation in Brazilian territories and improve phylogenetic inferences.

  11. A single mini-barcode test to screen for Australian mammalian predators from environmental samples

    PubMed Central

    MacDonald, Anna J; Sarre, Stephen D

    2017-01-01

    Abstract Identification of species from trace samples is now possible through the comparison of diagnostic DNA fragments against reference DNA sequence databases. DNA detection of animals from non-invasive samples, such as predator faeces (scats) that contain traces of DNA from their species of origin, has proved to be a valuable tool for the management of elusive wildlife. However, application of this approach can be limited by the availability of appropriate genetic markers. Scat DNA is often degraded, meaning that longer DNA sequences, including standard DNA barcoding markers, are difficult to recover. Instead, targeted short diagnostic markers are required to serve as diagnostic mini-barcodes. The mitochondrial genome is a useful source of such trace DNA markers because it provides good resolution at the species level and occurs in high copy numbers per cell. We developed a mini-barcode based on a short (178 bp) fragment of the conserved 12S ribosomal ribonucleic acid mitochondrial gene sequence, with the goal of discriminating amongst the scats of large mammalian predators of Australia. We tested the sensitivity and specificity of our primers and can accurately detect and discriminate amongst quolls, cats, dogs, foxes, and devils from trace DNA samples. Our approach provides a cost-effective, time-efficient, and non-invasive tool that enables identification of all 8 medium-large mammal predators in Australia, including native and introduced species, using a single test. With modification, this approach is likely to be of broad applicability elsewhere. PMID:28810700

  12. Deep Sequencing of Plant and Animal DNA Contained within Traditional Chinese Medicines Reveals Legality Issues and Health Safety Concerns

    PubMed Central

    Coghlan, Megan L.; Haile, James; Houston, Jayne; Murray, Dáithí C.; White, Nicole E.; Moolhuijzen, Paula; Bellgard, Matthew I.; Bunce, Michael

    2012-01-01

    Traditional Chinese medicine (TCM) has been practiced for thousands of years, but only within the last few decades has its use become more widespread outside of Asia. Concerns continue to be raised about the efficacy, legality, and safety of many popular complementary alternative medicines, including TCMs. Ingredients of some TCMs are known to include derivatives of endangered, trade-restricted species of plants and animals, and therefore contravene the Convention on International Trade in Endangered Species (CITES) legislation. Chromatographic studies have detected the presence of heavy metals and plant toxins within some TCMs, and there are numerous cases of adverse reactions. It is in the interests of both biodiversity conservation and public safety that techniques are developed to screen medicinals like TCMs. Targeting both the p-loop region of the plastid trnL gene and the mitochondrial 16S ribosomal RNA gene, over 49,000 amplicon sequence reads were generated from 15 TCM samples presented in the form of powders, tablets, capsules, bile flakes, and herbal teas. Here we show that second-generation, high-throughput sequencing (HTS) of DNA represents an effective means to genetically audit organic ingredients within complex TCMs. Comparison of DNA sequence data to reference databases revealed the presence of 68 different plant families and included genera, such as Ephedra and Asarum, that are potentially toxic. Similarly, animal families were identified that include genera that are classified as vulnerable, endangered, or critically endangered, including Asiatic black bear (Ursus thibetanus) and Saiga antelope (Saiga tatarica). Bovidae, Cervidae, and Bufonidae DNA were also detected in many of the TCM samples and were rarely declared on the product packaging. This study demonstrates that deep sequencing via HTS is an efficient and cost-effective way to audit highly processed TCM products and will assist in monitoring their legality and safety especially when plant reference databases become better established. PMID:22511890

  13. Determination of a mutational spectrum

    DOEpatents

    Thilly, William G.; Keohavong, Phouthone

    1991-01-01

    A method of resolving (physically separating) mutant DNA from nonmutant DNA and a method of defining or establishing a mutational spectrum or profile of alterations present in nucleic acid sequences from a sample to be analyzed, such as a tissue or body fluid. The present method is based on the fact that it is possible, through the use of DGGE, to separate nucleic acid sequences which differ by only a single base change and on the ability to detect the separate mutant molecules. The present invention, in another aspect, relates to a method for determining a mutational spectrum in a DNA sequence of interest present in a population of cells. The method of the present invention is useful as a diagnostic or analytical tool in forensic science in assessing environmental and/or occupational exposures to potentially genetically toxic materials (also referred to as potential mutagens); in biotechnology, particularly in the study of the relationship between the amino acid sequence of enzymes and other biologically-active proteins or protein-containing substances and their respective functions; and in determining the effects of drugs, cosmetics and other chemicals for which toxicity data must be obtained.

  14. Typing Clostridium difficile strains based on tandem repeat sequences

    PubMed Central

    2009-01-01

    Background Genotyping of epidemic Clostridium difficile strains is necessary to track their emergence and spread. Portability of genotyping data is desirable to facilitate inter-laboratory comparisons and epidemiological studies. Results This report presents results from a systematic screen for variation in repetitive DNA in the genome of C. difficile. We describe two tandem repeat loci, designated 'TR6' and 'TR10', which display extensive sequence variation that may be useful for sequence-based strain typing. Based on an investigation of 154 C. difficile isolates comprising 75 ribotypes, tandem repeat sequencing demonstrated excellent concordance with widely used PCR ribotyping and equal discriminatory power. Moreover, tandem repeat sequences enabled the reconstruction of the isolates' largely clonal population structure and evolutionary history. Conclusion We conclude that sequence analysis of the two repetitive loci introduced here may be highly useful for routine typing of C. difficile. Tandem repeat sequence typing resolves phylogenetic diversity to a level equivalent to PCR ribotypes. DNA sequences may be stored in databases accessible over the internet, obviating the need for the exchange of reference strains. PMID:19133124

  15. The Apis mellifera Filamentous Virus Genome

    PubMed Central

    Gauthier, Laurent; Cornman, Scott; Hartmann, Ulrike; Cousserans, François; Evans, Jay D.; de Miranda, Joachim R.; Neumann, Peter

    2015-01-01

    A complete reference genome of the Apis mellifera Filamentous virus (AmFV) was determined using Illumina Hiseq sequencing. The AmFV genome is a double stranded DNA molecule of approximately 498,500 nucleotides with a GC content of 50.8%. It encompasses 247 non-overlapping open reading frames (ORFs), equally distributed on both strands, which cover 65% of the genome. While most of the ORFs lacked threshold sequence alignments to reference protein databases, twenty-eight were found to display significant homologies with proteins present in other large double stranded DNA viruses. Remarkably, 13 ORFs had strong similarity with typical baculovirus domains such as PIFs (per os infectivity factor genes: pif-1, pif-2, pif-3 and p74) and BRO (Baculovirus Repeated Open Reading Frame). The putative AmFV DNA polymerase is of type B, but is only distantly related to those of the baculoviruses. The ORFs encoding proteins involved in nucleotide metabolism had the highest percent identity to viral proteins in GenBank. Other notable features include the presence of several collagen-like, chitin-binding, kinesin and pacifastin domains. Due to the large size of the AmFV genome and the inconsistent affiliation with other large double stranded DNA virus families infecting invertebrates, AmFV may belong to a new virus family. PMID:26184284

  16. The Apis mellifera Filamentous Virus Genome.

    PubMed

    Gauthier, Laurent; Cornman, Scott; Hartmann, Ulrike; Cousserans, François; Evans, Jay D; de Miranda, Joachim R; Neumann, Peter

    2015-07-09

    A complete reference genome of the Apis mellifera Filamentous virus (AmFV) was determined using Illumina Hiseq sequencing. The AmFV genome is a double stranded DNA molecule of approximately 498,500 nucleotides with a GC content of 50.8%. It encompasses 247 non-overlapping open reading frames (ORFs), equally distributed on both strands, which cover 65% of the genome. While most of the ORFs lacked threshold sequence alignments to reference protein databases, twenty-eight were found to display significant homologies with proteins present in other large double stranded DNA viruses. Remarkably, 13 ORFs had strong similarity with typical baculovirus domains such as PIFs (per os infectivity factor genes: pif-1, pif-2, pif-3 and p74) and BRO (Baculovirus Repeated Open Reading Frame). The putative AmFV DNA polymerase is of type B, but is only distantly related to those of the baculoviruses. The ORFs encoding proteins involved in nucleotide metabolism had the highest percent identity to viral proteins in GenBank. Other notable features include the presence of several collagen-like, chitin-binding, kinesin and pacifastin domains. Due to the large size of the AmFV genome and the inconsistent affiliation with other large double stranded DNA virus families infecting invertebrates, AmFV may belong to a new virus family.

  17. DNA barcoding as a useful tool in the systematic study of wild bees of the tribe Augochlorini (Hymenoptera: Halictidae).

    PubMed

    González-Vaquero, Rocío Ana; Roig-Alsina, Arturo; Packer, Laurence

    2016-10-01

    Special care is needed in the delimitation and identification of halictid bee species, which are renowned for being morphologically monotonous. Corynura Spinola and Halictillus Moure (Halictidae: Augochlorini) contain species that are key elements in southern South American ecosystems. These bees are very difficult to identify due to close morphological similarity among species and high sexual dimorphism. We analyzed 170 barcode-compliant COI sequences from 19 species. DNA barcodes were useful to confirm gender associations and to detect two new cryptic species. Interspecific distances were significantly higher than those reported for other bees. Maximum intraspecific divergence was less than 1% in 14 species. Barcode index numbers (BINs) were useful to identify putative species that need further study. More than one BIN was assigned to five species. The name Corynura patagonica (Cockerell) probably refers to two cryptic species. The results suggest that Corynura and Halictillus species can be identified using DNA barcodes. The sequences of the species included in this study can be used as a reference to assess the identification of unknown specimens. This study provides additional support for the use of DNA barcodes in bee taxonomy and the identification of specimens, which is particularly relevant in insects of ecological importance such as pollinators.

  18. Housekeeping while brain's storming Validation of normalizing factors for gene expression studies in a murine model of traumatic brain injury

    PubMed Central

    Rhinn, Hervé; Marchand-Leroux, Catherine; Croci, Nicole; Plotkine, Michel; Scherman, Daniel; Escriou, Virginie

    2008-01-01

    Background Traumatic brain injury models are widely studied, especially through gene expression, either to further understand implied biological mechanisms or to assess the efficiency of potential therapies. A large number of biological pathways are affected in brain trauma models, whose elucidation might greatly benefit from transcriptomic studies. However the suitability of reference genes needed for quantitative RT-PCR experiments is missing for these models. Results We have compared five potential reference genes as well as total cDNA level monitored using Oligreen reagent in order to determine the best normalizing factors for quantitative RT-PCR expression studies in the early phase (0–48 h post-trauma (PT)) of a murine model of diffuse brain injury. The levels of 18S rRNA, and of transcripts of β-actin, glyceraldehyde-3P-dehydrogenase (GAPDH), β-microtubulin and S100β were determined in the injured brain region of traumatized mice sacrificed at 30 min, 3 h, 6 h, 12 h, 24 h and 48 h post-trauma. The stability of the reference genes candidates and of total cDNA was evaluated by three different methods, leading to the following rankings as normalization factors, from the most suitable to the less: by using geNorm VBA applet, we obtained the following sequence: cDNA(Oligreen); GAPDH > 18S rRNA > S100β > β-microtubulin > β-actin; by using NormFinder Excel Spreadsheet, we obtained the following sequence: GAPDH > cDNA(Oligreen) > S100β > 18S rRNA > β-actin > β-microtubulin; by using a Confidence-Interval calculation, we obtained the following sequence: cDNA(Oligreen) > 18S rRNA; GAPDH > S100β > β-microtubulin > β-actin. Conclusion This work suggests that Oligreen cDNA measurements, 18S rRNA and GAPDH or a combination of them may be used to efficiently normalize qRT-PCR gene expression in mouse brain trauma injury, and that β-actin and β-microtubulin should be avoided. The potential of total cDNA as measured by Oligreen as a first-intention normalizing factor with a broad field of applications is highlighted. Pros and cons of the three methods of normalization factors selection are discussed. A generic time- and cost-effective procedure for normalization factor validation is proposed. PMID:18611280

  19. Universal Readers Based on Hydrogen Bonding or π-π Stacking for Identification of DNA Nucleotides in Electron Tunnel Junctions.

    PubMed

    Biswas, Sovan; Sen, Suman; Im, JongOne; Biswas, Sudipta; Krstic, Predrag; Ashcroft, Brian; Borges, Chad; Zhao, Yanan; Lindsay, Stuart; Zhang, Peiming

    2016-12-27

    A reader molecule, which recognizes all the naturally occurring nucleobases in an electron tunnel junction, is required for sequencing DNA by a recognition tunneling (RT) technique, referred to as a universal reader. In the present study, we have designed a series of heterocyclic carboxamides based on hydrogen bonding and a large-sized pyrene ring based on a π-π stacking interaction as universal reader candidates. Each of these compounds was synthesized to bear a thiolated linker for attachment to metal electrodes and examined for their interactions with naturally occurring DNA nucleosides and nucleotides by 1 H NMR, ESI-MS, computational calculations, and surface plasmon resonance. RT measurements were carried out in a scanning tunnel microscope. All of these molecules generated electrical signals with DNA nucleotides in tunneling junctions under physiological conditions (phosphate buffered aqueous solution, pH 7.4). Using a support vector machine as a tool for data analysis, we found that these candidates distinguished among naturally occurring DNA nucleotides with the accuracy of pyrene (by π-π stacking interactions) > azole carboxamides (by hydrogen-bonding interactions). In addition, the pyrene reader operated efficiently in a larger tunnel junction. However, the azole carboxamide could read abasic (AP) monophosphate, a product from spontaneous base hydrolysis or an intermediate of base excision repair. Thus, we envision that sequencing DNA using both π-π stacking and hydrogen-bonding-based universal readers in parallel should generate more comprehensive genome sequences than sequencing based on either reader molecule alone.

  20. Serovar distribution of a DNA sequence involved in the antigenic relationship between Leptospira and equine cornea.

    PubMed

    Lucchesi, Paula M A; Parma, Alberto E; Arroyo, Guillermo H

    2002-01-01

    Horses infected with Leptospira present several clinical disorders, one of them being recurrent uveitis. A common endpoint of equine recurrent uveitis is blindness. Serovar pomona has often been incriminated, although others have also been reported. An antigenic relationship between this bacterium and equine cornea has been described in previous studies. A leptospiral DNA fragment that encodes cross-reacting epitopes was previously cloned and expressed in Escherichia coli. A region of that DNA fragment was subcloned and sequenced. Samples of leptospiral DNA from several sources were analysed by PCR with two primer pairs designed to amplify that region. Reference strains from serovars canicola, icterohaemorrhagiae, pomona, pyrogenes, wolffi, bataviae, sentot, hebdomadis and hardjo rendered products of the expected sizes with both pairs of primers. The specific DNA region was also amplified from isolates from Argentina belonging to serogroups Canicola and Pomona. Both L. biflexa serovar patoc and L. borgpetersenii serovar tarassovi rendered a negative result. The DNA sequence related to the antigen mimicry with equine cornea was not exclusively found in serovar pomona as it was also detected in several strains of Leptospira belonging to different serovars. The results obtained with L. biflexa serovar patoc strain Patoc I and L. borgpetersenii serovar tarassovi strain Perepelicin suggest that this sequence is not present in these strains, which belong to different genomospecies than those which gave positive results. This is an interesting finding since L. biflexa comprises nonpathogenic strains and serovar tarassovi has not been associated clinically with equine uveitis.

  1. Experimental mapping of DNA duplex shape enabled by global lineshape analyses of a nucleotide-independent nitroxide probe

    PubMed Central

    Ding, Yuan; Zhang, Xiaojun; Tham, Kenneth W.; Qin, Peter Z.

    2014-01-01

    Sequence-dependent variation in structure and dynamics of a DNA duplex, collectively referred to as ‘DNA shape’, critically impacts interactions between DNA and proteins. Here, a method based on the technique of site-directed spin labeling was developed to experimentally map shapes of two DNA duplexes that contain response elements of the p53 tumor suppressor. An R5a nitroxide spin label, which was covalently attached at a specific phosphate group, was scanned consecutively through the DNA duplex. X-band continuous-wave electron paramagnetic resonance spectroscopy was used to monitor rotational motions of R5a, which report on DNA structure and dynamics at the labeling site. An approach based on Pearson's coefficient analysis was developed to collectively examine the degree of similarity among the ensemble of R5a spectra. The resulting Pearson's coefficients were used to generate maps representing variation of R5a mobility along the DNA duplex. The R5a mobility maps were found to correlate with maps of certain DNA helical parameters, and were capable of revealing similarity and deviation in the shape of the two closely related DNA duplexes. Collectively, the R5a probe and the Pearson's coefficient-based lineshape analysis scheme yielded a generalizable method for examining sequence-dependent DNA shapes. PMID:25092920

  2. A communal catalogue reveals Earth's multiscale microbial diversity.

    PubMed

    Thompson, Luke R; Sanders, Jon G; McDonald, Daniel; Amir, Amnon; Ladau, Joshua; Locey, Kenneth J; Prill, Robert J; Tripathi, Anupriya; Gibbons, Sean M; Ackermann, Gail; Navas-Molina, Jose A; Janssen, Stefan; Kopylova, Evguenia; Vázquez-Baeza, Yoshiki; González, Antonio; Morton, James T; Mirarab, Siavash; Zech Xu, Zhenjiang; Jiang, Lingjing; Haroon, Mohamed F; Kanbar, Jad; Zhu, Qiyun; Jin Song, Se; Kosciolek, Tomasz; Bokulich, Nicholas A; Lefler, Joshua; Brislawn, Colin J; Humphrey, Gregory; Owens, Sarah M; Hampton-Marcell, Jarrad; Berg-Lyons, Donna; McKenzie, Valerie; Fierer, Noah; Fuhrman, Jed A; Clauset, Aaron; Stevens, Rick L; Shade, Ashley; Pollard, Katherine S; Goodwin, Kelly D; Jansson, Janet K; Gilbert, Jack A; Knight, Rob

    2017-11-23

    Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.

  3. Cutaneous Granulomas in Dolphins Caused by Novel Uncultivated Paracoccidioides brasiliensis

    PubMed Central

    Vilela, Raquel; Bossart, Gregory D.; St. Leger, Judy A.; Dalton, Leslie M.; Reif, John S.; Schaefer, Adam M.; McCarthy, Peter J.; Fair, Patricia A.

    2016-01-01

    Cutaneous granulomas in dolphins were believed to be caused by Lacazia loboi, which also causes a similar disease in humans. This hypothesis was recently challenged by reports that fungal DNA sequences from dolphins grouped this pathogen with Paracoccidioides brasiliensis. We conducted phylogenetic analysis of fungi from 6 bottlenose dolphins (Tursiops truncatus) with cutaneous granulomas and chains of yeast cells in infected tissues. Kex gene sequences of P. brasiliensis from dolphins showed 100% homology with sequences from cultivated P. brasiliensis, 73% with those of L. loboi, and 93% with those of P. lutzii. Parsimony analysis placed DNA sequences from dolphins within a cluster with human P. brasiliensis strains. This cluster was the sister taxon to P. lutzii and L. loboi. Our molecular data support previous findings and suggest that a novel uncultivated strain of P. brasiliensis restricted to cutaneous lesions in dolphins is probably the cause of lacaziosis/lobomycosis, herein referred to as paracoccidioidomycosis ceti. PMID:27869614

  4. Cutaneous Granulomas in Dolphins Caused by Novel Uncultivated Paracoccidioides brasiliensis.

    PubMed

    Vilela, Raquel; Bossart, Gregory D; St Leger, Judy A; Dalton, Leslie M; Reif, John S; Schaefer, Adam M; McCarthy, Peter J; Fair, Patricia A; Mendoza, Leonel

    2016-12-01

    Cutaneous granulomas in dolphins were believed to be caused by Lacazia loboi, which also causes a similar disease in humans. This hypothesis was recently challenged by reports that fungal DNA sequences from dolphins grouped this pathogen with Paracoccidioides brasiliensis. We conducted phylogenetic analysis of fungi from 6 bottlenose dolphins (Tursiops truncatus) with cutaneous granulomas and chains of yeast cells in infected tissues. Kex gene sequences of P. brasiliensis from dolphins showed 100% homology with sequences from cultivated P. brasiliensis, 73% with those of L. loboi, and 93% with those of P. lutzii. Parsimony analysis placed DNA sequences from dolphins within a cluster with human P. brasiliensis strains. This cluster was the sister taxon to P. lutzii and L. loboi. Our molecular data support previous findings and suggest that a novel uncultivated strain of P. brasiliensis restricted to cutaneous lesions in dolphins is probably the cause of lacaziosis/lobomycosis, herein referred to as paracoccidioidomycosis ceti.

  5. Evaluation of plasmid and genomic DNA calibrants used for the quantification of genetically modified organisms.

    PubMed

    Caprioara-Buda, M; Meyer, W; Jeynov, B; Corbisier, P; Trapmann, S; Emons, H

    2012-07-01

    The reliable quantification of genetically modified organisms (GMOs) by real-time PCR requires, besides thoroughly validated quantitative detection methods, sustainable calibration systems. The latter establishes the anchor points for the measured value and the measurement unit, respectively. In this paper, the suitability of two types of DNA calibrants, i.e. plasmid DNA and genomic DNA extracted from plant leaves, for the certification of the GMO content in reference materials as copy number ratio between two targeted DNA sequences was investigated. The PCR efficiencies and coefficients of determination of the calibration curves as well as the measured copy number ratios for three powder certified reference materials (CRMs), namely ERM-BF415e (NK603 maize), ERM-BF425c (356043 soya), and ERM-BF427c (98140 maize), originally certified for their mass fraction of GMO, were compared for both types of calibrants. In all three systems investigated, the PCR efficiencies of plasmid DNA were slightly closer to the PCR efficiencies observed for the genomic DNA extracted from seed powders rather than those of the genomic DNA extracted from leaves. Although the mean DNA copy number ratios for each CRM overlapped within their uncertainties, the DNA copy number ratios were significantly different using the two types of calibrants. Based on these observations, both plasmid and leaf genomic DNA calibrants would be technically suitable as anchor points for the calibration of the real-time PCR methods applied in this study. However, the most suitable approach to establish a sustainable traceability chain is to fix a reference system based on plasmid DNA.

  6. Utility of GenBank and the Barcode of Life Data Systems (BOLD) for the identification of forensically important Diptera from Belgium and France

    PubMed Central

    Sonet, Gontran; Jordaens, Kurt; Braet, Yves; Bourguignon, Luc; Dupont, Eréna; Backeljau, Thierry; De Meyer, Marc; Desmyter, Stijn

    2013-01-01

    Abstract Fly larvae living on dead corpses can be used to estimate post-mortem intervals. The identification of these flies is decisive in forensic casework and can be facilitated by using DNA barcodes provided that a representative and comprehensive reference library of DNA barcodes is available. We constructed a local (Belgium and France) reference library of 85 sequences of the COI DNA barcode fragment (mitochondrial cytochrome c oxidase subunit I gene), from 16 fly species of forensic interest (Calliphoridae, Muscidae, Fanniidae). This library was then used to evaluate the ability of two public libraries (GenBank and the Barcode of Life Data Systems – BOLD) to identify specimens from Belgian and French forensic cases. The public libraries indeed allow a correct identification of most specimens. Yet, some of the identifications remain ambiguous and some forensically important fly species are not, or insufficiently, represented in the reference libraries. Several search options offered by GenBank and BOLD can be used to further improve the identifications obtained from both libraries using DNA barcodes. PMID:24453564

  7. In Silico Identification of Protein Disulfide Isomerase Gene Families in the De Novo Assembled Transcriptomes of Four Different Species of the Genus Conus.

    PubMed

    Figueroa-Montiel, Andrea; Ramos, Marco A; Mares, Rosa E; Dueñas, Salvador; Pimienta, Genaro; Ortiz, Ernesto; Possani, Lourival D; Licea-Navarro, Alexei F

    2016-01-01

    Small peptides isolated from the venom of the marine snails belonging to the genus Conus have been largely studied because of their therapeutic value. These peptides can be classified in two groups. The largest one is composed by peptides rich in disulfide bonds, and referred to as conotoxins. Despite the importance of conotoxins given their pharmacology value, little is known about the protein disulfide isomerase (PDI) enzymes that are required to catalyze their correct folding. To discover the PDIs that may participate in the folding and structural maturation of conotoxins, the transcriptomes of the venom duct of four different species of Conus from the peninsula of Baja California (Mexico) were assembled. Complementary DNA (cDNA) libraries were constructed for each species and sequenced using a Genome Analyzer Illumina platform. The raw RNA-seq data was converted into transcript sequences using Trinity, a de novo assembler that allows the grouping of reads into contigs without a reference genome. An N50 value of 605 was established as a reference for future assemblies of Conus transcriptomes using this software. Transdecoder was used to extract likely coding sequences from Trinity transcripts, and PDI-specific sequence motif "APWCGHCK" was used to capture potential PDIs. An in silico analysis was performed to characterize the group of PDI protein sequences encoded by the duct-transcriptome of each species. The computational approach entailed a structural homology characterization, based on the presence of functional Thioredoxin-like domains. Four different PDI families were characterized, which are constituted by a total of 41 different gene sequences. The sequences had an average of 65% identity with other PDIs. Using MODELLER 9.14, the homology-based three-dimensional structure prediction of a subset of the sequences reported, showed the expected thioredoxin fold which was confirmed by a "simulated annealing" method.

  8. Reducing DNA context dependence in bacterial promoters

    PubMed Central

    Carr, Swati B.; Densmore, Douglas M.

    2017-01-01

    Variation in the DNA sequence upstream of bacterial promoters is known to affect the expression levels of the products they regulate, sometimes dramatically. While neutral synthetic insulator sequences have been found to buffer promoters from upstream DNA context, there are no established methods for designing effective insulator sequences with predictable effects on expression levels. We address this problem with Degenerate Insulation Screening (DIS), a novel method based on a randomized 36-nucleotide insulator library and a simple, high-throughput, flow-cytometry-based screen that randomly samples from a library of 436 potential insulated promoters. The results of this screen can then be compared against a reference uninsulated device to select a set of insulated promoters providing a precise level of expression. We verify this method by insulating the constitutive, inducible, and repressible promotors of a four transcriptional-unit inverter (NOT-gate) circuit, finding both that order dependence is largely eliminated by insulation and that circuit performance is also significantly improved, with a 5.8-fold mean improvement in on/off ratio. PMID:28422998

  9. Assessment of primer/template mismatch effects on real-time PCR amplification of target taxa for GMO quantification.

    PubMed

    Ghedira, Rim; Papazova, Nina; Vuylsteke, Marnik; Ruttink, Tom; Taverniers, Isabel; De Loose, Marc

    2009-10-28

    GMO quantification, based on real-time PCR, relies on the amplification of an event-specific transgene assay and a species-specific reference assay. The uniformity of the nucleotide sequences targeted by both assays across various transgenic varieties is an important prerequisite for correct quantification. Single nucleotide polymorphisms (SNPs) frequently occur in the maize genome and might lead to nucleotide variation in regions used to design primers and probes for reference assays. Further, they may affect the annealing of the primer to the template and reduce the efficiency of DNA amplification. We assessed the effect of a minor DNA template modification, such as a single base pair mismatch in the primer attachment site, on real-time PCR quantification. A model system was used based on the introduction of artificial mismatches between the forward primer and the DNA template in the reference assay targeting the maize starch synthase (SSIIb) gene. The results show that the presence of a mismatch between the primer and the DNA template causes partial to complete failure of the amplification of the initial DNA template depending on the type and location of the nucleotide mismatch. With this study, we show that the presence of a primer/template mismatch affects the estimated total DNA quantity to a varying degree.

  10. Evaluating multiplexed next-generation sequencing as a method in palynology for mixed pollen samples.

    PubMed

    Keller, A; Danner, N; Grimmer, G; Ankenbrand, M; von der Ohe, K; von der Ohe, W; Rost, S; Härtel, S; Steffan-Dewenter, I

    2015-03-01

    The identification of pollen plays an important role in ecology, palaeo-climatology, honey quality control and other areas. Currently, expert knowledge and reference collections are essential to identify pollen origin through light microscopy. Pollen identification through molecular sequencing and DNA barcoding has been proposed as an alternative approach, but the assessment of mixed pollen samples originating from multiple plant species is still a tedious and error-prone task. Next-generation sequencing has been proposed to avoid this hindrance. In this study we assessed mixed pollen probes through next-generation sequencing of amplicons from the highly variable, species-specific internal transcribed spacer 2 region of nuclear ribosomal DNA. Further, we developed a bioinformatic workflow to analyse these high-throughput data with a newly created reference database. To evaluate the feasibility, we compared results from classical identification based on light microscopy from the same samples with our sequencing results. We assessed in total 16 mixed pollen samples, 14 originated from honeybee colonies and two from solitary bee nests. The sequencing technique resulted in higher taxon richness (deeper assignments and more identified taxa) compared to light microscopy. Abundance estimations from sequencing data were significantly correlated with counted abundances through light microscopy. Simulation analyses of taxon specificity and sensitivity indicate that 96% of taxa present in the database are correctly identifiable at the genus level and 70% at the species level. Next-generation sequencing thus presents a useful and efficient workflow to identify pollen at the genus and species level without requiring specialised palynological expert knowledge. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.

  11. [Bacterial diversity in sequencing batch biofilm reactor (SBBR) for landfill leachate treatment using PCR-DGGE].

    PubMed

    Xiao, Yong; Yang, Zhao-hui; Zeng, Guang-ming; Ma, Yan-he; Liu, You-sheng; Wang, Rong-juan; Xu, Zheng-yong

    2007-05-01

    For studying the bacterial diversity and the mechanism of denitrification in sequencing bath biofilm reactor (SBBR) treating landfill leachate to provide microbial evidence for technique improvements, total microbial DNA was extracted from samples which were collected from natural landfill leachate and biofilm of a SBBR that could efficiently remove NH4+ -N and COD of high concentration. 16S rDNA fragments were amplified from the total DNA successfully using a pair of universal bacterial 16S rDNA primer, GC341F and 907R, and then were used for denaturing gradient gel electrophoresis (DGGE) analysis. The bands in the gel were analyzed by statistical methods and excided from the gel for sequencing, and the sequences were used for homology analysis and then two phylogenetic trees were constructed using DNAStar software. Results indicated that the bacterial diversity of the biofilm in SBBR and the landfill leachate was abundant, and no obvious change of community structure happened during running in the biofilm, in which most bacteria came from the landfill leachate. There may be three different modes of denitrification in the reactor because several different nitrifying bacteria, denitrifying bacteria and anaerobic ammonia oxidation bacteria coexisted in it. The results provided some valuable references for studying microbiological mechanism of denitrification in SBBR.

  12. A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS

    PubMed Central

    Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T.; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J.; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A.; Lempicki, Richard A.; Huang, Da Wei

    2013-01-01

    PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results. PMID:24179701

  13. A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS.

    PubMed

    Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A; Lempicki, Richard A; Huang, Da Wei

    2013-07-31

    PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.

  14. Cooperative heteroassembly of the adenoviral L4-22K and IVa2 proteins onto the viral packaging sequence DNA.

    PubMed

    Yang, Teng-Chieh; Maluf, Nasib Karl

    2012-02-21

    Human adenovirus (Ad) is an icosahedral, double-stranded DNA virus. Viral DNA packaging refers to the process whereby the viral genome becomes encapsulated by the viral particle. In Ad, activation of the DNA packaging reaction requires at least three viral components: the IVa2 and L4-22K proteins and a section of DNA within the viral genome, called the packaging sequence. Previous studies have shown that the IVa2 and L4-22K proteins specifically bind to conserved elements within the packaging sequence and that these interactions are absolutely required for the observation of DNA packaging. However, the equilibrium mechanism for assembly of IVa2 and L4-22K onto the packaging sequence has not been determined. Here we characterize the assembly of the IVa2 and L4-22K proteins onto truncated packaging sequence DNA by analytical sedimentation velocity and equilibrium methods. At limiting concentrations of L4-22K, we observe a species with two IVa2 monomers and one L4-22K monomer bound to the DNA. In this species, the L4-22K monomer is promoting positive cooperative interactions between the two bound IVa2 monomers. As L4-22K levels are increased, we observe a species with one IVa2 monomer and three L4-22K monomers bound to the DNA. To explain this result, we propose a model in which L4-22K self-assembly on the DNA competes with IVa2 for positive heterocooperative interactions, destabilizing binding of the second IVa2 monomer. Thus, we propose that L4-22K levels control the extent of cooperativity observed between adjacently bound IVa2 monomers. We have also determined the hydrodynamic properties of all observed stoichiometric species; we observe that species with three L4-22K monomers bound have more extended conformations than species with a single L4-22K bound. We suggest this might reflect a molecular switch that controls insertion of the viral DNA into the capsid.

  15. Deep COI sequencing of standardized benthic samples unveils overlooked diversity of Jordanian coral reefs in the northern Red Sea.

    PubMed

    Al-Rshaidat, Mamoon M D; Snider, Allison; Rosebraugh, Sydney; Devine, Amanda M; Devine, Thomas D; Plaisance, Laetitia; Knowlton, Nancy; Leray, Matthieu

    2016-09-01

    High-throughput sequencing (HTS) of DNA barcodes (metabarcoding), particularly when combined with standardized sampling protocols, is one of the most promising approaches for censusing overlooked cryptic invertebrate communities. We present biodiversity estimates based on sequencing of the cytochrome c oxidase subunit 1 (COI) gene for coral reefs of the Gulf of Aqaba, a semi-enclosed system in the northern Red Sea. Samples were obtained from standardized sampling devices (Autonomous Reef Monitoring Structures (ARMS)) deployed for 18 months. DNA barcoding of non-sessile specimens >2 mm revealed 83 OTUs in six phyla, of which only 25% matched a reference sequence in public databases. Metabarcoding of the 2 mm - 500 μm and sessile bulk fractions revealed 1197 OTUs in 15 animal phyla, of which only 4.9% matched reference barcodes. These results highlight the scarcity of COI data for cryptobenthic organisms of the Red Sea. Compared with data obtained using similar methods, our results suggest that Gulf of Aqaba reefs are less diverse than two Pacific coral reefs but much more diverse than an Atlantic oyster reef at a similar latitude. The standardized approaches used here show promise for establishing baseline data on biodiversity, monitoring the impacts of environmental change, and quantifying patterns of diversity at regional and global scales.

  16. Genome Analysis of the Domestic Dog (Korean Jindo) by Massively Parallel Sequencing

    PubMed Central

    Kim, Ryong Nam; Kim, Dae-Soo; Choi, Sang-Haeng; Yoon, Byoung-Ha; Kang, Aram; Nam, Seong-Hyeuk; Kim, Dong-Wook; Kim, Jong-Joo; Ha, Ji-Hong; Toyoda, Atsushi; Fujiyama, Asao; Kim, Aeri; Kim, Min-Young; Park, Kun-Hyang; Lee, Kang Seon; Park, Hong-Seog

    2012-01-01

    Although pioneering sequencing projects have shed light on the boxer and poodle genomes, a number of challenges need to be met before the sequencing and annotation of the dog genome can be considered complete. Here, we present the DNA sequence of the Jindo dog genome, sequenced to 45-fold average coverage using Illumina massively parallel sequencing technology. A comparison of the sequence to the reference boxer genome led to the identification of 4 675 437 single nucleotide polymorphisms (SNPs, including 3 346 058 novel SNPs), 71 642 indels and 8131 structural variations. Of these, 339 non-synonymous SNPs and 3 indels are located within coding sequences (CDS). In particular, 3 non-synonymous SNPs and a 26-bp deletion occur in the TCOF1 locus, implying that the difference observed in cranial facial morphology between Jindo and boxer dogs might be influenced by those variations. Through the annotation of the Jindo olfactory receptor gene family, we found 2 unique olfactory receptor genes and 236 olfactory receptor genes harbouring non-synonymous homozygous SNPs that are likely to affect smelling capability. In addition, we determined the DNA sequence of the Jindo dog mitochondrial genome and identified Jindo dog-specific mtDNA genotypes. This Jindo genome data upgrade our understanding of dog genomic architecture and will be a very valuable resource for investigating not only dog genetics and genomics but also human and dog disease genetics and comparative genomics. PMID:22474061

  17. Next-generation transcriptome sequencing, SNP discovery and validation in four market classes of peanut, Arachis hypogaea L.

    PubMed

    Chopra, Ratan; Burow, Gloria; Farmer, Andrew; Mudge, Joann; Simpson, Charles E; Wilkins, Thea A; Baring, Michael R; Puppala, Naveen; Chamberlin, Kelly D; Burow, Mark D

    2015-06-01

    Single-nucleotide polymorphisms, which can be identified in the thousands or millions from comparisons of transcriptome or genome sequences, are ideally suited for making high-resolution genetic maps, investigating population evolutionary history, and discovering marker-trait linkages. Despite significant results from their use in human genetics, progress in identification and use in plants, and particularly polyploid plants, has lagged. As part of a long-term project to identify and use SNPs suitable for these purposes in cultivated peanut, which is tetraploid, we generated transcriptome sequences of four peanut cultivars, namely OLin, New Mexico Valencia C, Tamrun OL07 and Jupiter, which represent the four major market classes of peanut grown in the world, and which are important economically to the US southwest peanut growing region. CopyDNA libraries of each genotype were used to generate 2 × 54 paired-end reads using an Illumina GAIIx sequencer. Raw reads were mapped to a custom reference consisting of Tifrunner 454 sequences plus peanut ESTs in GenBank, compromising 43,108 contigs; 263,840 SNP and indel variants were identified among four genotypes compared to the reference. A subset of 6 variants was assayed across 24 genotypes representing four market types using KASP chemistry to assess the criteria for SNP selection. Results demonstrated that transcriptome sequencing can identify SNPs usable as selectable DNA-based markers in complex polyploid species such as peanut. Criteria for effective use of SNPs as markers are discussed in this context.

  18. Within-genome evolution of REPINs: a new family of miniature mobile DNA in bacteria.

    PubMed

    Bertels, Frederic; Rainey, Paul B

    2011-06-01

    Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT-containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA.

  19. Real-Time PCR-Based Quantitation Method for the Genetically Modified Soybean Line GTS 40-3-2.

    PubMed

    Kitta, Kazumi; Takabatake, Reona; Mano, Junichi

    2016-01-01

    This chapter describes a real-time PCR-based method for quantitation of the relative amount of genetically modified (GM) soybean line GTS 40-3-2 [Roundup Ready(®) soybean (RRS)] contained in a batch. The method targets a taxon-specific soybean gene (lectin gene, Le1) and the specific DNA construct junction region between the Petunia hybrida chloroplast transit peptide sequence and the Agrobacterium 5-enolpyruvylshikimate-3-phosphate synthase gene (epsps) sequence present in GTS 40-3-2. The method employs plasmid pMulSL2 as a reference material in order to quantify the relative amount of GTS 40-3-2 in soybean samples using a conversion factor (Cf) equal to the ratio of the RRS-specific DNA to the taxon-specific DNA in representative genuine GTS 40-3-2 seeds.

  20. Integrative analysis of 111 reference human epigenomes

    PubMed Central

    Kundaje, Anshul; Meuleman, Wouter; Ernst, Jason; Bilenky, Misha; Yen, Angela; Kheradpour, Pouya; Zhang, Zhizhuo; Heravi-Moussavi, Alireza; Liu, Yaping; Amin, Viren; Ziller, Michael J; Whitaker, John W; Schultz, Matthew D; Sandstrom, Richard S; Eaton, Matthew L; Wu, Yi-Chieh; Wang, Jianrong; Ward, Lucas D; Sarkar, Abhishek; Quon, Gerald; Pfenning, Andreas; Wang, Xinchen; Claussnitzer, Melina; Coarfa, Cristian; Harris, R Alan; Shoresh, Noam; Epstein, Charles B; Gjoneska, Elizabeta; Leung, Danny; Xie, Wei; Hawkins, R David; Lister, Ryan; Hong, Chibo; Gascard, Philippe; Mungall, Andrew J; Moore, Richard; Chuah, Eric; Tam, Angela; Canfield, Theresa K; Hansen, R Scott; Kaul, Rajinder; Sabo, Peter J; Bansal, Mukul S; Carles, Annaick; Dixon, Jesse R; Farh, Kai-How; Feizi, Soheil; Karlic, Rosa; Kim, Ah-Ram; Kulkarni, Ashwinikumar; Li, Daofeng; Lowdon, Rebecca; Mercer, Tim R; Neph, Shane J; Onuchic, Vitor; Polak, Paz; Rajagopal, Nisha; Ray, Pradipta; Sallari, Richard C; Siebenthall, Kyle T; Sinnott-Armstrong, Nicholas; Stevens, Michael; Thurman, Robert E; Wu, Jie; Zhang, Bo; Zhou, Xin; Beaudet, Arthur E; Boyer, Laurie A; De Jager, Philip; Farnham, Peggy J; Fisher, Susan J; Haussler, David; Jones, Steven; Li, Wei; Marra, Marco; McManus, Michael T; Sunyaev, Shamil; Thomson, James A; Tlsty, Thea D; Tsai, Li-Huei; Wang, Wei; Waterland, Robert A; Zhang, Michael; Chadwick, Lisa H; Bernstein, Bradley E; Costello, Joseph F; Ecker, Joseph R; Hirst, Martin; Meissner, Alexander; Milosavljevic, Aleksandar; Ren, Bing; Stamatoyannopoulos, John A; Wang, Ting; Kellis, Manolis

    2015-01-01

    The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but a similar reference has lacked for epigenomic studies. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection to-date of human epigenomes for primary cells and tissues. Here, we describe the integrative analysis of 111 reference human epigenomes generated as part of the program, profiled for histone modification patterns, DNA accessibility, DNA methylation, and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically-relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation, and human disease. PMID:25693563

  1. Integrative analysis of 111 reference human epigenomes.

    PubMed

    Kundaje, Anshul; Meuleman, Wouter; Ernst, Jason; Bilenky, Misha; Yen, Angela; Heravi-Moussavi, Alireza; Kheradpour, Pouya; Zhang, Zhizhuo; Wang, Jianrong; Ziller, Michael J; Amin, Viren; Whitaker, John W; Schultz, Matthew D; Ward, Lucas D; Sarkar, Abhishek; Quon, Gerald; Sandstrom, Richard S; Eaton, Matthew L; Wu, Yi-Chieh; Pfenning, Andreas R; Wang, Xinchen; Claussnitzer, Melina; Liu, Yaping; Coarfa, Cristian; Harris, R Alan; Shoresh, Noam; Epstein, Charles B; Gjoneska, Elizabeta; Leung, Danny; Xie, Wei; Hawkins, R David; Lister, Ryan; Hong, Chibo; Gascard, Philippe; Mungall, Andrew J; Moore, Richard; Chuah, Eric; Tam, Angela; Canfield, Theresa K; Hansen, R Scott; Kaul, Rajinder; Sabo, Peter J; Bansal, Mukul S; Carles, Annaick; Dixon, Jesse R; Farh, Kai-How; Feizi, Soheil; Karlic, Rosa; Kim, Ah-Ram; Kulkarni, Ashwinikumar; Li, Daofeng; Lowdon, Rebecca; Elliott, GiNell; Mercer, Tim R; Neph, Shane J; Onuchic, Vitor; Polak, Paz; Rajagopal, Nisha; Ray, Pradipta; Sallari, Richard C; Siebenthall, Kyle T; Sinnott-Armstrong, Nicholas A; Stevens, Michael; Thurman, Robert E; Wu, Jie; Zhang, Bo; Zhou, Xin; Beaudet, Arthur E; Boyer, Laurie A; De Jager, Philip L; Farnham, Peggy J; Fisher, Susan J; Haussler, David; Jones, Steven J M; Li, Wei; Marra, Marco A; McManus, Michael T; Sunyaev, Shamil; Thomson, James A; Tlsty, Thea D; Tsai, Li-Huei; Wang, Wei; Waterland, Robert A; Zhang, Michael Q; Chadwick, Lisa H; Bernstein, Bradley E; Costello, Joseph F; Ecker, Joseph R; Hirst, Martin; Meissner, Alexander; Milosavljevic, Aleksandar; Ren, Bing; Stamatoyannopoulos, John A; Wang, Ting; Kellis, Manolis

    2015-02-19

    The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

  2. Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries

    PubMed Central

    Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.

    2012-01-01

    Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365

  3. Production and certification of NIST Standard Reference Material 2372 Human DNA Quantitation Standard.

    PubMed

    Kline, Margaret C; Duewer, David L; Travis, John C; Smith, Melody V; Redman, Janette W; Vallone, Peter M; Decker, Amy E; Butler, John M

    2009-06-01

    Modern highly multiplexed short tandem repeat (STR) assays used by the forensic human-identity community require tight control of the initial amount of sample DNA amplified in the polymerase chain reaction (PCR) process. This, in turn, requires the ability to reproducibly measure the concentration of human DNA, [DNA], in a sample extract. Quantitative PCR (qPCR) techniques can determine the number of intact stretches of DNA of specified nucleotide sequence in an extremely small sample; however, these assays must be calibrated with DNA extracts of well-characterized and stable composition. By 2004, studies coordinated by or reported to the National Institute of Standards and Technology (NIST) indicated that a well-characterized, stable human DNA quantitation certified reference material (CRM) could help the forensic community reduce within- and among-laboratory quantitation variability. To ensure that the stability of such a quantitation standard can be monitored and that, if and when required, equivalent replacement materials can be prepared, a measurement of some stable quantity directly related to [DNA] is required. Using a long-established conventional relationship linking optical density (properly designated as decadic attenuance) at 260 nm with [DNA] in aqueous solution, NIST Standard Reference Material (SRM) 2372 Human DNA Quantitation Standard was issued in October 2007. This SRM consists of three quite different DNA extracts: a single-source male, a multiple-source female, and a mixture of male and female sources. All three SRM components have very similar optical densities, and thus very similar conventional [DNA]. The materials perform very similarly in several widely used gender-neutral assays, demonstrating that the combination of appropriate preparation methods and metrologically sound spectrophotometric measurements enables the preparation and certification of quantitation [DNA] standards that are both maintainable and of practical utility.

  4. Formation of cis-diamminedichloroplatinum(II) 1,2-intrastrand cross-links on DNA is flanking-sequence independent.

    PubMed

    Burstyn, J N; Heiger-Bernays, W J; Cohen, S M; Lippard, S J

    2000-11-01

    Mapping of cis-diamminedichloroplatinum(II) (cis-DDP, cisplatin) DNA adducts over >3000 nucleotides was carried out using a replication blockage assay. The sites of inhibition of modified T4 DNA polymerase, also referred to as stop sites, were analyzed to determine the effects of local sequence context on the distribution of intrastrand cisplatin cross-links. In a 3120 base fragment from replicative form M13mp18 DNA containing 24.6% guanine, 25.5% thymine, 26.9% adenine and 23.0% cytosine, 166 individual stop sites were observed at a bound platinum/nucleotide ratio of 1-2 per thousand. The majority of stop sites (90%) occurred at G(n>2) sequences and the remainder were located at sites containing an AG dinucleotide. For all of the GG sites present in the mapped sequences, including those with Gn(>)2, 89% blocked replication, whereas for the AG sites only 17% blocked replication. These blockage sites were independent of flanking nucleotides in a sequence of N(1)G*G*N(2) where N(1), N(2) = A, C, G, T and G*G* indicates a 1,2-intrastrand platinum cross-link. The absence of long-range sequence dependence was confirmed by monitoring the reaction of cisplatin with a plasmid containing an 800 bp insert of the human telomere repeat sequence (TTAGGG)(n). Platination reactions monitored at several formal platinum/nucleotide ratios or as a function of time reveal that the telomere insert was not preferentially damaged by cisplatin. Both replication blockage and telomere-insert plasmid platination experiments indicate that cisplatin 1,2-intrastrand adducts do not form preferentially at G-rich sequences in vitro.

  5. Using pseudoalignment and base quality to accurately quantify microbial community composition

    PubMed Central

    Novembre, John

    2018-01-01

    Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies. PMID:29659582

  6. Molecular taxonomy of phytopathogenic fungi: a case study in Peronospora.

    PubMed

    Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M Teresa; Martín, María P

    2009-07-29

    Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.

  7. Molecular Taxonomy of Phytopathogenic Fungi: A Case Study in Peronospora

    PubMed Central

    Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M. Teresa; Martín, María P.

    2009-01-01

    Background Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Methodology Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. Conclusions A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence. PMID:19641601

  8. High quality methylome-wide investigations through next-generation sequencing of DNA from a single archived dry blood spot

    PubMed Central

    Aberg, Karolina A.; Xie, Lin Y.; Nerella, Srilaxmi; Copeland, William E.; Costello, E. Jane; van den Oord, Edwin J.C.G.

    2013-01-01

    The potential importance of DNA methylation in the etiology of complex diseases has led to interest in the development of methylome-wide association studies (MWAS) aimed at interrogating all methylation sites in the human genome. When using blood as biomaterial for a MWAS the DNA is typically extracted directly from fresh or frozen whole blood that was collected via venous puncture. However, DNA extracted from dry blood spots may also be an alternative starting material. In the present study, we apply a methyl-CpG binding domain (MBD) protein enrichment-based technique in combination with next generation sequencing (MBD-seq) to assess the methylation status of the ~27 million CpGs in the human autosomal reference genome. We investigate eight methylomes using DNA from blood spots. This data are compared with 1,500 methylomes previously assayed with the same MBD-seq approach using DNA from whole blood. When investigating the sequence quality and the enrichment profile across biological features, we find that DNA extracted from blood spots gives comparable results with DNA extracted from whole blood. Only if the amount of starting material is ≤ 0.5µg DNA we observe a slight decrease in the assay performance. In conclusion, we show that high quality methylome-wide investigations using MBD-seq can be conducted in DNA extracted from archived dry blood spots without sacrificing quality and without bias in enrichment profile as long as the amount of starting material is sufficient. In general, the amount of DNA extracted from a single blood spot is sufficient for methylome-wide investigations with the MBD-seq approach. PMID:23644822

  9. High quality methylome-wide investigations through next-generation sequencing of DNA from a single archived dry blood spot.

    PubMed

    Aberg, Karolina A; Xie, Lin Y; Nerella, Srilaxmi; Copeland, William E; Costello, E Jane; van den Oord, Edwin J C G

    2013-05-01

    The potential importance of DNA methylation in the etiology of complex diseases has led to interest in the development of methylome-wide association studies (MWAS) aimed at interrogating all methylation sites in the human genome. When using blood as biomaterial for a MWAS the DNA is typically extracted directly from fresh or frozen whole blood that was collected via venous puncture. However, DNA extracted from dry blood spots may also be an alternative starting material. In the present study, we apply a methyl-CpG binding domain (MBD) protein enrichment-based technique in combination with next generation sequencing (MBD-seq) to assess the methylation status of the ~27 million CpGs in the human autosomal reference genome. We investigate eight methylomes using DNA from blood spots. This data are compared with 1,500 methylomes previously assayed with the same MBD-seq approach using DNA from whole blood. When investigating the sequence quality and the enrichment profile across biological features, we find that DNA extracted from blood spots gives comparable results with DNA extracted from whole blood. Only if the amount of starting material is ≤ 0.5µg DNA we observe a slight decrease in the assay performance. In conclusion, we show that high quality methylome-wide investigations using MBD-seq can be conducted in DNA extracted from archived dry blood spots without sacrificing quality and without bias in enrichment profile as long as the amount of starting material is sufficient. In general, the amount of DNA extracted from a single blood spot is sufficient for methylome-wide investigations with the MBD-seq approach.

  10. OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.

    PubMed

    Poehlman, William L; Rynge, Mats; Branton, Chris; Balamurugan, D; Feltus, Frank A

    2016-01-01

    High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.

  11. OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid

    PubMed Central

    Poehlman, William L.; Rynge, Mats; Branton, Chris; Balamurugan, D.; Feltus, Frank A.

    2016-01-01

    High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments. PMID:27499617

  12. Preparation of metagenomic libraries from naturally occurring marine viruses.

    PubMed

    Solonenko, Sergei A; Sullivan, Matthew B

    2013-01-01

    Microbes are now well recognized as major drivers of the biogeochemical cycling that fuels the Earth, and their viruses (phages) are known to be abundant and important in microbial mortality, horizontal gene transfer, and modulating microbial metabolic output. Investigation of environmental phages has been frustrated by an inability to culture the vast majority of naturally occurring diversity coupled with the lack of robust, quantitative, culture-independent methods for studying this uncultured majority. However, for double-stranded DNA phages, a quantitative viral metagenomic sample-to-sequence workflow now exists. Here, we review these advances with special emphasis on the technical details of preparing DNA sequencing libraries for metagenomic sequencing from environmentally relevant low-input DNA samples. Library preparation steps broadly involve manipulating the sample DNA by fragmentation, end repair and adaptor ligation, size fractionation, and amplification. One critical area of future research and development is parallel advances for alternate nucleic acid types such as single-stranded DNA and RNA viruses that are also abundant in nature. Combinations of recent advances in fragmentation (e.g., acoustic shearing and tagmentation), ligation reactions (adaptor-to-template ratio reference table availability), size fractionation (non-gel-sizing), and amplification (linear amplification for deep sequencing and linker amplification protocols) enhance our ability to generate quantitatively representative metagenomic datasets from low-input DNA samples. Such datasets are already providing new insights into the role of viruses in marine systems and will continue to do so as new environments are explored and synergies and paradigms emerge from large-scale comparative analyses. © 2013 Elsevier Inc. All rights reserved.

  13. Serovar distribution of a DNA sequence involved in the antigenic relationship between Leptospira and equine cornea

    PubMed Central

    Lucchesi, Paula MA; Parma, Alberto E; Arroyo, Guillermo H

    2002-01-01

    Background Horses infected with Leptospira present several clinical disorders, one of them being recurrent uveitis. A common endpoint of equine recurrent uveitis is blindness. Serovar pomona has often been incriminated, although others have also been reported. An antigenic relationship between this bacterium and equine cornea has been described in previous studies. A leptospiral DNA fragment that encodes cross-reacting epitopes was previously cloned and expressed in Escherichia coli. Results A region of that DNA fragment was subcloned and sequenced. Samples of leptospiral DNA from several sources were analysed by PCR with two primer pairs designed to amplify that region. Reference strains from serovars canicola, icterohaemorrhagiae, pomona, pyrogenes, wolffi, bataviae, sentot, hebdomadis and hardjo rendered products of the expected sizes with both pairs of primers. The specific DNA region was also amplified from isolates from Argentina belonging to serogroups Canicola and Pomona. Both L. biflexa serovar patoc and L. borgpetersenii serovar tarassovi rendered a negative result. Conclusions The DNA sequence related to the antigen mimicry with equine cornea was not exclusively found in serovar pomona as it was also detected in several strains of Leptospira belonging to different serovars. The results obtained with L. biflexa serovar patoc strain Patoc I and L. borgpetersenii serovar tarassovi strain Perepelicin suggest that this sequence is not present in these strains, which belong to different genomospecies than those which gave positive results. This is an interesting finding since L. biflexa comprises nonpathogenic strains and serovar tarassovi has not been associated clinically with equine uveitis. PMID:11869455

  14. Minimap2: pairwise alignment for nucleotide sequences.

    PubMed

    Li, Heng

    2018-05-10

    Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.

  15. Detection and decay rates of prey and prey symbionts in the gut of a predator through metagenomics.

    PubMed

    Paula, Débora P; Linard, Benjamin; Andow, David A; Sujii, Edison R; Pires, Carmen S S; Vogler, Alfried P

    2015-07-01

    DNA methods are useful to identify ingested prey items from the gut of predators, but reliable detection is hampered by low amounts of degraded DNA. PCR-based methods can retrieve minute amounts of starting material but suffer from amplification biases and cross-reactions with the predator and related species genomes. Here, we use PCR-free direct shotgun sequencing of total DNA isolated from the gut of the harlequin ladybird Harmonia axyridis at five time points after feeding on a single pea aphid Acyrthosiphon pisum. Sequence reads were matched to three reference databases: Insecta mitogenomes of 587 species, including H. axyridis sequenced here; A. pisum nuclear genome scaffolds; and scaffolds and complete genomes of 13 potential bacterial symbionts. Immediately after feeding, multicopy mtDNA of A. pisum was detected in tens of reads, while hundreds of matches to nuclear scaffolds were detected. Aphid nuclear DNA and mtDNA decayed at similar rates (0.281 and 0.11 h(-1) respectively), and the detectability periods were 32.7 and 23.1 h. Metagenomic sequencing also revealed thousands of reads of the obligate Buchnera aphidicola and facultative Regiella insecticola aphid symbionts, which showed exponential decay rates significantly faster than aphid DNA (0.694 and 0.80 h(-1) , respectively). However, the facultative aphid symbionts Hamiltonella defensa, Arsenophonus spp. and Serratia symbiotica showed an unexpected temporary increase in population size by 1-2 orders of magnitude in the predator guts before declining. Metagenomics is a powerful tool that can reveal complex relationships and the dynamics of interactions among predators, prey and their symbionts. © 2014 John Wiley & Sons Ltd.

  16. De novo assembly of human genomes with massively parallel short read sequencing.

    PubMed

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

    2010-02-01

    Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

  17. Adaptive efficient compression of genomes

    PubMed Central

    2012-01-01

    Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, memory requirements of the current algorithms are high and run times often are slow. In this paper, we propose an adaptive, parallel and highly efficient referential sequence compression method which allows fine-tuning of the trade-off between required memory and compression speed. When using 12 MB of memory, our method is for human genomes on-par with the best previous algorithms in terms of compression ratio (400:1) and compression speed. In contrast, it compresses a complete human genome in just 11 seconds when provided with 9 GB of main memory, which is almost three times faster than the best competitor while using less main memory. PMID:23146997

  18. BarraCUDA - a fast short read sequence aligner using graphics processing units

    PubMed Central

    2012-01-01

    Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497

  19. High Prevalence of Methanobrevibacter smithii and Methanosphaera stadtmanae Detected in the Human Gut Using an Improved DNA Detection Protocol

    PubMed Central

    Dridi, Bédis; Henry, Mireille; El Khéchine, Amel; Raoult, Didier; Drancourt, Michel

    2009-01-01

    Background The low and variable prevalence of Methanobrevibacter smithii and Methanosphaera stadtmanae DNA in human stool contrasts with the paramount role of these methanogenic Archaea in digestion processes. We hypothesized that this contrast is a consequence of the inefficiencies of current protocols for archaeon DNA extraction. We developed a new protocol for the extraction and PCR-based detection of M. smithii and M. stadtmanae DNA in human stool. Methodology/Principal Findings Stool specimens collected from 700 individuals were filtered, mechanically lysed twice, and incubated overnight with proteinase K prior to DNA extraction using a commercial DNA extraction kit. Total DNA was used as a template for quantitative real-time PCR targeting M. smithii and M. stadtmanae 16S rRNA and rpoB genes. Amplification of 16S rRNA and rpoB yielded positive detection of M. smithii in 95.7% and M. stadtmanae in 29.4% of specimens. Sequencing of 16S rRNA gene PCR products from 30 randomly selected specimens (15 for M. smithii and 15 for M. stadtmanae) yielded a sequence similarity of 99–100% using the reference M. smithii ATCC 35061 and M. stadtmanae DSM 3091 sequences. Conclusions/Significance In contrast to previous reports, these data indicate a high prevalence of the methanogens M. smithii and M. stadtmanae in the human gut, with the former being an almost ubiquitous inhabitant of the intestinal microbiome. PMID:19759898

  20. Highlights of DNA Barcoding in identification of salient microorganisms like fungi.

    PubMed

    Dulla, E L; Kathera, C; Gurijala, H K; Mallakuntla, T R; Srinivasan, P; Prasad, V; Mopati, R D; Jasti, P K

    2016-12-01

    Fungi, the second largest kingdom of eukaryotic life, are diverse and widespread. Fungi play a distinctive role in the production of different products on industrial scale, like fungal enzymes, antibiotics, fermented foods, etc., to give storage stability and improved health to meet major global challenges. To utilize algae perfectly for human needs, and to pave the way for getting a healthy relationship with fungi, it is important to identify them in a quick and robust manner with molecular-based identification system. So, there is a technique that aims to provide a well-organized method for species level identifications and to contribute powerfully to taxonomic and biodiversity research is DNA Barcoding. DNA Barcoding is generally achieved by the retrieval of a short DNA sequence - the 'barcode' - from a standard part of the genome and that barcode is then compared with a library of reference barcode sequences derived from individuals of known identity for identification. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  1. A Universal Method for Species Identification of Mammals Utilizing Next Generation Sequencing for the Analysis of DNA Mixtures

    PubMed Central

    Tillmar, Andreas O.; Dell'Amico, Barbara; Welander, Jenny; Holmlund, Gunilla

    2013-01-01

    Species identification can be interesting in a wide range of areas, for example, in forensic applications, food monitoring and in archeology. The vast majority of existing DNA typing methods developed for species determination, mainly focuses on a single species source. There are, however, many instances where all species from mixed sources need to be determined, even when the species in minority constitutes less than 1 % of the sample. The introduction of next generation sequencing opens new possibilities for such challenging samples. In this study we present a universal deep sequencing method using 454 GS Junior sequencing of a target on the mitochondrial gene 16S rRNA. The method was designed through phylogenetic analyses of DNA reference sequences from more than 300 mammal species. Experiments were performed on artificial species-species mixture samples in order to verify the method’s robustness and its ability to detect all species within a mixture. The method was also tested on samples from authentic forensic casework. The results showed to be promising, discriminating over 99.9 % of mammal species and the ability to detect multiple donors within a mixture and also to detect minor components as low as 1 % of a mixed sample. PMID:24358309

  2. Practical aspects of genetic identification of hallucinogenic and other poisonous mushrooms for clinical and forensic purposes

    PubMed Central

    Kowalczyk, Marek; Sekuła, Andrzej; Mleczko, Piotr; Olszowy, Zofia; Kujawa, Anna; Zubek, Szymon; Kupiec, Tomasz

    2015-01-01

    Aim To assess the usefulness of a DNA-based method for identifying mushroom species for application in forensic laboratory practice. Methods Two hundred twenty-one samples of clinical forensic material (dried mushrooms, food remains, stomach contents, feces, etc) were analyzed. ITS2 region of nuclear ribosomal DNA (nrDNA) was sequenced and the sequences were compared with reference sequences collected from the National Center for Biotechnology Information gene bank (GenBank). Sporological identification of mushrooms was also performed for 57 samples of clinical material. Results Of 221 samples, positive sequencing results were obtained for 152 (69%). The highest percentage of positive results was obtained for samples of dried mushrooms (96%) and food remains (91%). Comparison with GenBank sequences enabled identification of all samples at least at the genus level. Most samples (90%) were identified at the level of species or a group of closely related species. Sporological and molecular identification were consistent at the level of species or genus for 30% of analyzed samples. Conclusion Molecular analysis identified a larger number of species than sporological method. It proved to be suitable for analysis of evidential material (dried hallucinogenic mushrooms) in forensic genetic laboratories as well as to complement classical methods in the analysis of clinical material. PMID:25727040

  3. Practical aspects of genetic identification of hallucinogenic and other poisonous mushrooms for clinical and forensic purposes.

    PubMed

    Kowalczyk, Marek; Sekuła, Andrzej; Mleczko, Piotr; Olszowy, Zofia; Kujawa, Anna; Zubek, Szymon; Kupiec, Tomasz

    2015-02-01

    To assess the usefulness of a DNA-based method for identifying mushroom species for application in forensic laboratory practice. Two hundred twenty-one samples of clinical forensic material (dried mushrooms, food remains, stomach contents, feces, etc) were analyzed. ITS2 region of nuclear ribosomal DNA (nrDNA) was sequenced and the sequen-ces were compared with reference sequences collected from the National Center for Biotechnology Information gene bank (GenBank). Sporological identification of mushrooms was also performed for 57 samples of clinical material. Of 221 samples, positive sequencing results were obtained for 152 (69%). The highest percentage of positive results was obtained for samples of dried mushrooms (96%) and food remains (91%). Comparison with GenBank sequences enabled identification of all samples at least at the genus level. Most samples (90%) were identified at the level of species or a group of closely related species. Sporological and molecular identification were consistent at the level of species or genus for 30% of analyzed samples. Molecular analysis identified a larger number of species than sporological method. It proved to be suitable for analysis of evidential material (dried hallucinogenic mushrooms) in forensic genetic laboratories as well as to complement classical methods in the analysis of clinical material.

  4. Digital transcriptome profiling using selective hexamer priming for cDNA synthesis.

    PubMed

    Armour, Christopher D; Castle, John C; Chen, Ronghua; Babak, Tomas; Loerch, Patrick; Jackson, Stuart; Shah, Jyoti K; Dey, John; Rohl, Carol A; Johnson, Jason M; Raymond, Christopher K

    2009-09-01

    We developed a procedure for the preparation of whole transcriptome cDNA libraries depleted of ribosomal RNA from only 1 microg of total RNA. The method relies on a collection of short, computationally selected oligonucleotides, called 'not-so-random' (NSR) primers, to obtain full-length, strand-specific representation of nonribosomal RNA transcripts. In this study we validated the technique by profiling human whole brain and universal human reference RNA using ultra-high-throughput sequencing.

  5. Sequence analysis of ORF IV RTBV isolated from tungro infected Oryza sativa L. cv Ciherang

    NASA Astrophysics Data System (ADS)

    Hastilestari, Bernadetta Rina; Astuti, Dwi; Estiati, Amy; Nugroho, Satya

    2015-09-01

    The Effort to increase rice production is often constrained by pest and disease such as Tungro. The Tungro disease is caused by the joint infection with two dissimilar viruses; a bacil-form-DNA virus, the Rice tungro bacilliform virus(RTBV) and the spherical RNA virus, Rice tungro spherical virus (RTSV) and transmitted by Green leafhopper (Nephotettix virescens). The symptom of disease is caused by the presence of RTBV. The genome of RTBV consists of four Open reading frames (ORFs) which encode functional proteins. Of the four, ORF IV is unique because it exists only in RTBV. The most efficient method of generating disease resistance plants is to look for natural sources of resistance genes in wild or germplasm and then transfer the gene and the accompanying resistance in cultivated crop varieties. The aim of this study is, therefore, to isolate and analyze of 1170 bp gene of ORF 4 of Tungro virus isolated from an Indonesian rice cultivar, Ciherang (Oryza sativa L. cv Indica). DNA sequencing analysis using BLAST showed 94% similarity with the reference sequence gen bank Acc.M65026.1. The comparisons and mutation analysis of DNA sequences were discussed in this research.

  6. Simultaneous detection of transgenic DNA by surface plasmon resonance imaging with potential application to gene doping detection.

    PubMed

    Scarano, Simona; Ermini, Maria Laura; Spiriti, Maria Michela; Mascini, Marco; Bogani, Patrizia; Minunni, Maria

    2011-08-15

    Surface plasmon resonance imaging (SPRi) was used as the transduction principle for the development of optical-based sensing for transgenes detection in human cell lines. The objective was to develop a multianalyte, label-free, and real-time approach for DNA sequences that are identified as markers of transgenosis events. The strategy exploits SPRi sensing to detect the transgenic event by targeting selected marker sequences, which are present on shuttle vector backbone used to carry out the transfection of human embryonic kidney (HEK) cell lines. Here, we identified DNA sequences belonging to the Cytomegalovirus promoter and the Enhanced Green Fluorescent Protein gene. System development is discussed in terms of probe efficiency and influence of secondary structures on biorecognition reaction on sensor; moreover, optimization of PCR samples pretreatment was carried out to allow hybridization on biosensor, together with an approach to increase SPRi signals by in situ mass enhancement. Real-time PCR was also employed as reference technique for marker sequences detection on human HEK cells. We can foresee that the developed system may have potential applications in the field of antidoping research focused on the so-called gene doping.

  7. Sequencing and assembly of the 22-gb loblolly pine genome.

    PubMed

    Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H

    2014-03-01

    Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.

  8. 'FloraArray' for screening of specific DNA probes representing the characteristics of a certain microbial community.

    PubMed

    Yokoi, Takahide; Kaku, Yoshiko; Suzuki, Hiroyuki; Ohta, Masayuki; Ikuta, Hajime; Isaka, Kazuichi; Sumino, Tatsuo; Wagatsuma, Masako

    2007-08-01

    To investigate uncharacterized microbial communities, a custom DNA microarray named 'FloraArray' was developed for screening specific probes that would represent the characteristics of a microbial community. The array was prepared by spotting 2000 plasmid DNAs from a genomic shotgun library of a sludge sample on a DNA microarray. By comparative hybridization of the array with two different samples of genomic DNA, one from the activated sludge and the other from a nonactivated sludge sample of an anaerobic ammonium oxidation (anammox) bacterial community, specific spots were visualized as a definite fluctuating profile in an MA (differential intensity ratio vs. spot intensity) plot. About 300 spots of the array accounted for the candidate probes to represent anammox reaction of the activated sludge. After sequence analysis of the probes and examination of the results of blastn searches against the reported anammox reference sequence, complete matches were found for 161 probes (58.3%) and >90% matches were found for 242 probes (87.1%). These results demonstrate that 'FloraArray' could be a useful tool for screening specific DNA molecules of unknown microbial communities.

  9. Potential concerns with analytical Methods Used for the detection of Batrachochytrium salamandrivorans from archived DNA of amphibian swab samples, Oregon, USA

    USGS Publications Warehouse

    Iwanowicz, Deborah; Olson, Deanna H.; Adams, Michael J.; Adams, Cynthia; Anderson, Chauncey; Blaustein, Andrew R; Densmore, Christine L.; Figiel, Chester; Schill, William B.; Chestnut, Tara

    2017-01-01

    Taxonomic identification of pollen has historically been accomplished via light microscopy but requires specialized knowledge and reference collections, particularly when identification to lower taxonomic levels is necessary. Recently, next-generation sequencing technology has been used as a cost-effective alternative for identifying bee-collected pollen; however, this novel approach has not been tested on a spatially or temporally robust number of pollen samples. Here, we compare pollen identification results derived from light microscopy and DNA sequencing techniques with samples collected from honey bee colonies embedded within a gradient of intensive agricultural landscapes in the Northern Great Plains throughout the 2010–2011 growing seasons. We demonstrate that at all taxonomic levels, DNA sequencing was able to discern a greater number of taxa, and was particularly useful for the identification of infrequently detected species. Importantly, substantial phenological overlap did occur for commonly detected taxa using either technique, suggesting that DNA sequencing is an appropriate, and enhancing, substitutive technique for accurately capturing the breadth of bee-collected species of pollen present across agricultural landscapes. We also show that honey bees located in high and low intensity agricultural settings forage on dissimilar plants, though with overlap of the most abundantly collected pollen taxa. We highlight practical applications of utilizing sequencing technology, including addressing ecological issues surrounding land use, climate change, importance of taxa relative to abundance, and evaluating the impact of conservation program habitat enhancement efforts.

  10. Comparative molecular cytogenetic analyses of a major tandemly repeated DNA family and retrotransposon sequences in cultivated jute Corchorus species (Malvaceae)

    PubMed Central

    Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas

    2013-01-01

    Background and Aims The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. Methods A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100–500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Key Results Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S–5·8S–25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. Conclusions The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species. PMID:23666888

  11. Extensive sequencing of seven human genomes to characterize benchmark reference materials

    PubMed Central

    Zook, Justin M.; Catoe, David; McDaniel, Jennifer; Vang, Lindsay; Spies, Noah; Sidow, Arend; Weng, Ziming; Liu, Yuling; Mason, Christopher E.; Alexander, Noah; Henaff, Elizabeth; McIntyre, Alexa B.R.; Chandramohan, Dhruva; Chen, Feng; Jaeger, Erich; Moshrefi, Ali; Pham, Khoa; Stedman, William; Liang, Tiffany; Saghbini, Michael; Dzakula, Zeljko; Hastie, Alex; Cao, Han; Deikus, Gintaras; Schadt, Eric; Sebra, Robert; Bashir, Ali; Truty, Rebecca M.; Chang, Christopher C.; Gulbahce, Natali; Zhao, Keyan; Ghosh, Srinka; Hyland, Fiona; Fu, Yutao; Chaisson, Mark; Xiao, Chunlin; Trow, Jonathan; Sherry, Stephen T.; Zaranek, Alexander W.; Ball, Madeleine; Bobe, Jason; Estep, Preston; Church, George M.; Marks, Patrick; Kyriazopoulou-Panagiotopoulou, Sofia; Zheng, Grace X.Y.; Schnall-Levin, Michael; Ordonez, Heather S.; Mudivarti, Patrice A.; Giorda, Kristina; Sheng, Ying; Rypdal, Karoline Bjarnesdatter; Salit, Marc

    2016-01-01

    The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly. PMID:27271295

  12. Protein domain analysis of genomic sequence data reveals regulation of LRR related domains in plant transpiration in Ficus.

    PubMed

    Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H; Du, Fang K

    2014-01-01

    Predicting protein domains is essential for understanding a protein's function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.

  13. DNA barcodes for bio-surveillance: regulated and economically important arthropod plant pests.

    PubMed

    Ashfaq, Muhammad; Hebert, Paul D N

    2016-11-01

    Many of the arthropod species that are important pests of agriculture and forestry are impossible to discriminate morphologically throughout all of their life stages. Some cannot be differentiated at any life stage. Over the past decade, DNA barcoding has gained increasing adoption as a tool to both identify known species and to reveal cryptic taxa. Although there has not been a focused effort to develop a barcode library for them, reference sequences are now available for 77% of the 409 species of arthropods documented on major pest databases. Aside from developing the reference library needed to guide specimen identifications, past barcode studies have revealed that a significant fraction of arthropod pests are a complex of allied taxa. Because of their importance as pests and disease vectors impacting global agriculture and forestry, DNA barcode results on these arthropods have significant implications for quarantine detection, regulation, and management. The current review discusses these implications in light of the presence of cryptic species in plant pests exposed by DNA barcoding.

  14. Comparison of methods for library construction and short read annotation of shellfish viral metagenomes.

    PubMed

    Wei, Hong-Ying; Huang, Sheng; Wang, Jiang-Yong; Gao, Fang; Jiang, Jing-Zhe

    2018-03-01

    The emergence and widespread use of high-throughput sequencing technologies have promoted metagenomic studies on environmental or animal samples. Library construction for metagenome sequencing and annotation of the produced sequence reads are important steps in such studies and influence the quality of metagenomic data. In this study, we collected some marine mollusk samples, such as Crassostrea hongkongensis, Chlamys farreri, and Ruditapes philippinarum, from coastal areas in South China. These samples were divided into two batches to compare two library construction methods for shellfish viral metagenome. Our analysis showed that reverse-transcribing RNA into cDNA and then amplifying it simultaneously with DNA by whole genome amplification (WGA) yielded a larger amount of DNA compared to using only WGA or WTA (whole transcriptome amplification). Moreover, higher quality libraries were obtained by agarose gel extraction rather than with AMPure bead size selection. However, the latter can also provide good results if combined with the adjustment of the filter parameters. This, together with its simplicity, makes it a viable alternative. Finally, we compared three annotation tools (BLAST, DIAMOND, and Taxonomer) and two reference databases (NCBI's NR and Uniprot's Uniref). Considering the limitations of computing resources and data transfer speed, we propose the use of DIAMOND with Uniref for annotating metagenomic short reads as its running speed can guarantee a good annotation rate. This study may serve as a useful reference for selecting methods for Shellfish viral metagenome library construction and read annotation.

  15. Genetic characterization of strains of Saccharomycescerevisiae responsible for 'refermentation' in Botrytis-affected wines.

    PubMed

    Divol, B; Miot-Sertier, C; Lonvaud-Funel, A

    2006-03-01

    Saccharomyces cerevisiae is responsible for alcoholic fermentation of wines. However, some strains can also spoil sweet Botrytis-affected wines. Three 'refermentation' strains were isolated during maturation. Characterization of those strains in regards to their fingerprint, rDNA sequence and resistance to SO2, which constituted the main source of stress in Botrytis-affected wines, was carried out. Refermentation strains could be clearly discriminated by interdelta fingerprinting. However, they exhibited close relationships by karyotyping. A part of RDN1 locus sequence was examined by using PCR-RFLP and PCR-DGGE. The resistance of refermentation strains to SO2 was performed by using real time quantitative PCR focusing on SSU1 gene. Results suggested that refermentation strains were heterozygote in 26S rDNA and their ITS1-5.8S rDNA-ITS2 region sequence revealed relationships with 'flor' strains. As described in the literature for flor strain, two out of three refermentation strains constitutively developed a higher level of SSU1 expression than the reference strains, improving their putative tolerance to SO2. Therefore, refermentation strains of S. cerevisiae had developed many strategies to survive during maturing sweet wines. Singularities in rDNA sequence and SSU1 overexpression revealed a natural adaptation. Moreover, genomic relationship between flor and refermentation strains suggested that stress sources could induced selection of survivor strains.

  16. A Hybrid Approach for the Automated Finishing of Bacterial Genomes

    PubMed Central

    Robins, William P.; Chin, Chen-Shan; Webster, Dale; Paxinos, Ellen; Hsu, David; Ashby, Meredith; Wang, Susana; Peluso, Paul; Sebra, Robert; Sorenson, Jon; Bullard, James; Yen, Jackie; Valdovino, Marie; Mollova, Emilia; Luong, Khai; Lin, Steven; LaMay, Brianna; Joshi, Amruta; Rowe, Lori; Frace, Michael; Tarr, Cheryl L.; Turnsek, Maryann; Davis, Brigid M; Kasarskis, Andrew; Mekalanos, John J.; Waldor, Matthew K.; Schadt, Eric E.

    2013-01-01

    Dramatic improvements in DNA sequencing technology have revolutionized our ability to characterize most genomic diversity. However, accurate resolution of large structural events has remained challenging due to the comparatively shorter read lengths of second-generation technologies. Emerging third-generation sequencing technologies, which yield markedly increased read length on rapid time scales and for low cost, have the potential to address assembly limitations. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at > 99.9% accuracy. Complex regions with clinically significant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 reference we obtain 14 and 8 scaffolds greater than 1kb, respectively, correcting several errors in the underlying source data. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly. PMID:22750883

  17. PFR²: a curated database of planktonic foraminifera 18S ribosomal DNA as a resource for studies of plankton ecology, biogeography and evolution.

    PubMed

    Morard, Raphaël; Darling, Kate F; Mahé, Frédéric; Audic, Stéphane; Ujiié, Yurika; Weiner, Agnes K M; André, Aurore; Seears, Heidi A; Wade, Christopher M; Quillévéré, Frédéric; Douady, Christophe J; Escarguel, Gilles; de Garidel-Thoron, Thibault; Siccha, Michael; Kucera, Michal; de Vargas, Colomban

    2015-11-01

    Planktonic foraminifera (Rhizaria) are ubiquitous marine pelagic protists producing calcareous shells with conspicuous morphology. They play an important role in the marine carbon cycle, and their exceptional fossil record serves as the basis for biochronostratigraphy and past climate reconstructions. A major worldwide sampling effort over the last two decades has resulted in the establishment of multiple large collections of cryopreserved individual planktonic foraminifera samples. Thousands of 18S rDNA partial sequences have been generated, representing all major known morphological taxa across their worldwide oceanic range. This comprehensive data coverage provides an opportunity to assess patterns of molecular ecology and evolution in a holistic way for an entire group of planktonic protists. We combined all available published and unpublished genetic data to build PFR(2), the Planktonic foraminifera Ribosomal Reference database. The first version of the database includes 3322 reference 18S rDNA sequences belonging to 32 of the 47 known morphospecies of extant planktonic foraminifera, collected from 460 oceanic stations. All sequences have been rigorously taxonomically curated using a six-rank annotation system fully resolved to the morphological species level and linked to a series of metadata. The PFR(2) website, available at http://pfr2.sb-roscoff.fr, allows downloading the entire database or specific sections, as well as the identification of new planktonic foraminiferal sequences. Its novel, fully documented curation process integrates advances in morphological and molecular taxonomy. It allows for an increase in its taxonomic resolution and assures that integrity is maintained by including a complete contingency tracking of annotations and assuring that the annotations remain internally consistent. © 2015 John Wiley & Sons Ltd.

  18. The Hemiptera (Insecta) of Canada: Constructing a Reference Library of DNA Barcodes

    PubMed Central

    Gwiazdowski, Rodger A.; Foottit, Robert G.; Maw, H. Eric L.; Hebert, Paul D. N.

    2015-01-01

    DNA barcode reference libraries linked to voucher specimens create new opportunities for high-throughput identification and taxonomic re-evaluations. This study provides a DNA barcode library for about 45% of the recognized species of Canadian Hemiptera, and the publically available R workflow used for its generation. The current library is based on the analysis of 20,851 specimens including 1849 species belonging to 628 genera and 64 families. These individuals were assigned to 1867 Barcode Index Numbers (BINs), sequence clusters that often coincide with species recognized through prior taxonomy. Museum collections were a key source for identified specimens, but we also employed high-throughput collection methods that generated large numbers of unidentified specimens. Many of these specimens represented novel BINs that were subsequently identified by taxonomists, adding barcode coverage for additional species. Our analyses based on both approaches includes 94 species not listed in the most recent Canadian checklist, representing a potential 3% increase in the fauna. We discuss the development of our workflow in the context of prior DNA barcode library construction projects, emphasizing the importance of delineating a set of reference specimens to aid investigations in cases of nomenclatural and DNA barcode discordance. The identification for each specimen in the reference set can be annotated on the Barcode of Life Data System (BOLD), allowing experts to highlight questionable identifications; annotations can be added by any registered user of BOLD, and instructions for this are provided. PMID:25923328

  19. Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria

    PubMed Central

    Bertels, Frederic; Rainey, Paul B.

    2011-01-01

    Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT–containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA. PMID:21698139

  20. Capturing chloroplast variation for molecular ecology studies: a simple next generation sequencing approach applied to a rainforest tree

    PubMed Central

    2013-01-01

    Background With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. Results The chloroplast genome of Toona ciliata (Meliaceae), 159,514 base pairs long, was assembled from shotgun sequencing on the Illumina platform using de novo assembly of contigs. To evaluate its practicality, value and quality, we compared the short read assembly with an assembly completed using 454 data obtained after chloroplast DNA isolation. Sanger sequence verifications indicated that the Illumina dataset outperformed the longer read 454 data. Pooling of several individuals during preparation of the shotgun library enabled detection of informative chloroplast SNP markers. Following validation, we used the identified SNPs for a preliminary phylogeographic study of T. ciliata in Australia and to confirm low diversity across the distribution. Conclusions Our approach provides a simple method for construction of whole chloroplast genomes from shotgun sequencing of whole genomic DNA using short-read data and no available closely related reference genome (e.g. from the same species or genus). The high coverage of Illumina sequence data also renders this method appropriate for multiplexing and SNP discovery and therefore a useful approach for landscape level studies of evolutionary ecology. PMID:23497206

  1. Analysis of the whole mitochondrial genome: translation of the Ion Torrent Personal Genome Machine system to the diagnostic bench?

    PubMed

    Seneca, Sara; Vancampenhout, Kim; Van Coster, Rudy; Smet, Joél; Lissens, Willy; Vanlander, Arnaud; De Paepe, Boel; Jonckheere, An; Stouffs, Katrien; De Meirleir, Linda

    2015-01-01

    Next-generation sequencing (NGS), an innovative sequencing technology that enables the successful analysis of numerous gene sequences in a massive parallel sequencing approach, has revolutionized the field of molecular biology. Although NGS was introduced in a rather recent past, the technology has already demonstrated its potential and effectiveness in many research projects, and is now on the verge of being introduced into the diagnostic setting of routine laboratories to delineate the molecular basis of genetic disease in undiagnosed patient samples. We tested a benchtop device on retrospective genomic DNA (gDNA) samples of controls and patients with a clinical suspicion of a mitochondrial DNA disorder. This Ion Torrent Personal Genome Machine platform is a high-throughput sequencer with a fast turnaround time and reasonable running costs. We challenged the chemistry and technology with the analysis and processing of a mutational spectrum composed of samples with single-nucleotide substitutions, indels (insertions and deletions) and large single or multiple deletions, occasionally in heteroplasmy. The output data were compared with previously obtained conventional dideoxy sequencing results and the mitochondrial revised Cambridge Reference Sequence (rCRS). We were able to identify the majority of all nucleotide alterations, but three false-negative results were also encountered in the data set. At the same time, the poor performance of the PGM instrument in regions associated with homopolymeric stretches generated many false-positive miscalls demanding additional manual curation of the data.

  2. The Apis mellifera filamentous virus genome

    USDA-ARS?s Scientific Manuscript database

    A complete reference genome of the Apis mellifera Filamentous virus (AmFV) was determined using Illumina Hiseq sequencing. The AmFV genome is a double strand DNA molecule of approximately 498’500 nucleotides with a GC content of 50.8%. It encompasses 251 non overlapping open reading frames (ORFs), e...

  3. DNA barcode based wildlife forensics for resolving the origin of claw samples using a novel primer cocktail.

    PubMed

    Khedkar, Gulab D; Abhayankar, Shil Bapurao; Nalage, Dinesh; Ahmed, Shaikh Nadeem; Khedkar, Chandraprakash D

    2016-11-01

    Excessive wildlife hunting for commercial purposes can have negative impacts on biodiversity and may result in species extinction. To ensure compliance with legal statutes, forensic identification approaches relying on molecular markers may be used to identify the species of origin of animal material from hairs, claw, blood, bone, or meat. Using this approach, DNA sequences from the COI "barcoding" gene have been used to identify material from a number of domesticated animal species. However, many wild species of carnivores still present great challenges in generating COI barcodes using standard "universal" primer pairs. In the work presented here, the mitochondrial COI gene was successfully amplified using a novel primer cocktail, and the products were sequenced to determine the species of twenty one unknown samples of claw material collected as part of forensic wildlife case investigations. Sixteen of the unknown samples were recognized to have originated from either Panthera leo or P. pardus individuals. The remaining five samples could be identified only to the family level due to the absence of reference animal sequences. This is the first report on the use of COI sequences for the identification of P. pardus and P. leo from claw samples as part of forensic investigations in India. The study also highlights the need for adequate reference material to aid in the resolution of suspected cases of illegal wildlife harvesting.

  4. RefCNV: Identification of Gene-Based Copy Number Variants Using Whole Exome Sequencing.

    PubMed

    Chang, Lun-Ching; Das, Biswajit; Lih, Chih-Jian; Si, Han; Camalier, Corinne E; McGregor, Paul M; Polley, Eric

    2016-01-01

    With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96-0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman's coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis.

  5. Environmental DNA (eDNA) metabarcoding assays to detect invasive invertebrate species in the Great Lakes.

    PubMed

    Klymus, Katy E; Marshall, Nathaniel T; Stepien, Carol A

    2017-01-01

    Describing and monitoring biodiversity comprise integral parts of ecosystem management. Recent research coupling metabarcoding and environmental DNA (eDNA) demonstrate that these methods can serve as important tools for surveying biodiversity, while significantly decreasing the time, expense and resources spent on traditional survey methods. The literature emphasizes the importance of genetic marker development, as the markers dictate the applicability, sensitivity and resolution ability of an eDNA assay. The present study developed two metabarcoding eDNA assays using the mtDNA 16S RNA gene with Illumina MiSeq platform to detect invertebrate fauna in the Laurentian Great Lakes and surrounding waterways, with a focus for use on invasive bivalve and gastropod species monitoring. We employed careful primer design and in vitro testing with mock communities to assess ability of the markers to amplify and sequence targeted species DNA, while retaining rank abundance information. In our mock communities, read abundances reflected the initial input abundance, with regressions having significant slopes (p<0.05) and high coefficients of determination (R2) for all comparisons. Tests on field environmental samples revealed similar ability of our markers to measure relative abundance. Due to the limited reference sequence data available for these invertebrate species, care must be taken when analyzing results and identifying sequence reads to species level. These markers extend eDNA metabarcoding research for molluscs and appear relevant to other invertebrate taxa, such as rotifers and bryozoans. Furthermore, the sphaeriid mussel assay is group-specific, exclusively amplifying bivalves in the Sphaeridae family and providing species-level identification. Our assays provide useful tools for managers and conservation scientists, facilitating early detection of invasive species as well as improving resolution of mollusc diversity.

  6. Environmental DNA (eDNA) metabarcoding assays to detect invasive invertebrate species in the Great Lakes

    PubMed Central

    Klymus, Katy E.; Marshall, Nathaniel T.

    2017-01-01

    Describing and monitoring biodiversity comprise integral parts of ecosystem management. Recent research coupling metabarcoding and environmental DNA (eDNA) demonstrate that these methods can serve as important tools for surveying biodiversity, while significantly decreasing the time, expense and resources spent on traditional survey methods. The literature emphasizes the importance of genetic marker development, as the markers dictate the applicability, sensitivity and resolution ability of an eDNA assay. The present study developed two metabarcoding eDNA assays using the mtDNA 16S RNA gene with Illumina MiSeq platform to detect invertebrate fauna in the Laurentian Great Lakes and surrounding waterways, with a focus for use on invasive bivalve and gastropod species monitoring. We employed careful primer design and in vitro testing with mock communities to assess ability of the markers to amplify and sequence targeted species DNA, while retaining rank abundance information. In our mock communities, read abundances reflected the initial input abundance, with regressions having significant slopes (p<0.05) and high coefficients of determination (R2) for all comparisons. Tests on field environmental samples revealed similar ability of our markers to measure relative abundance. Due to the limited reference sequence data available for these invertebrate species, care must be taken when analyzing results and identifying sequence reads to species level. These markers extend eDNA metabarcoding research for molluscs and appear relevant to other invertebrate taxa, such as rotifers and bryozoans. Furthermore, the sphaeriid mussel assay is group-specific, exclusively amplifying bivalves in the Sphaeridae family and providing species-level identification. Our assays provide useful tools for managers and conservation scientists, facilitating early detection of invasive species as well as improving resolution of mollusc diversity. PMID:28542313

  7. Geographical distribution of genetic polymorphism of the pathogen Histoplasma capsulatum isolated from infected bats, captured in a central zone of Mexico.

    PubMed

    Taylor, Maria Lucia; Chávez-Tapia, Catalina B; Rojas-Martínez, Alberto; del Rocio Reyes-Montes, Maria; del Valle, Mirian Bobadilla; Zúñiga, Gerardo

    2005-09-01

    Fourteen Histoplasma capsulatum isolates recovered from infected bats captured in Mexican caves and two human H. capsulatum reference strains were analyzed using random amplification of polymorphic DNA PCR-based and partial DNA sequences of four genes. Cluster analysis of random amplification of polymorphic DNA-patterns revealed differences for two H. capsulatum isolates of one migratory bat Tadarida brasiliensis. Three groups were identified by distance and maximum-parsimony analyses of arf, H-anti, ole, and tub1 H. capsulatum genes. Group I included most isolates from infected bats and one clinical strain from central Mexico; group II included the two isolates from T. brasiliensis; the human G-217B reference strain from USA formed an independent group III. Isolates from group II showed diversity in relation to groups I and III, suggesting a different H. capsulatum population.

  8. Rhizobium favelukesii sp. nov., isolated from the root nodules of alfalfa (Medicago sativa L).

    PubMed

    Torres Tejerizo, Gonzalo; Rogel, Marco Antonio; Ormeño-Orrillo, Ernesto; Althabegoiti, María Julia; Nilsson, Juliet Fernanda; Niehaus, Karsten; Schlüter, Andreas; Pühler, Alfred; Del Papa, María Florencia; Lagares, Antonio; Martínez-Romero, Esperanza; Pistorio, Mariano

    2016-11-01

    Strains LPU83T and Or191 of the genus Rhizobium were isolated from the root nodules of alfalfa, grown in acid soils from Argentina and the USA. These two strains, which shared the same plasmid pattern, lipopolysaccharide profile, insertion-sequence fingerprint, 16S rRNA gene sequence and PCR-fingerprinting pattern, were different from reference strains representing species of the genus Rhizobium with validly published names. On the basis of previously reported data and from new DNA-DNA hybridization results, phenotypic characterization and phylogenetic analyses, strains LPU83T and Or191 can be considered to be representatives of a novel species of the genus Rhizobium, for which the name Rhizobium favelukesii sp. nov. is proposed. The type strain of this species is LPU83T (=CECT 9014T=LMG 29160T), for which an improved draft-genome sequence is available.

  9. Identification of the skeletal remains of Martin Bormann by mtDNA analysis.

    PubMed

    Anslinger, K; Weichhold, G; Keil, W; Bayer, B; Eisenmenger, W

    2001-01-01

    Contrary to statements of an eye-witness who reported that Martin Bormann, the second most powerful man in the Third Reich, died on 2 May 1945 in Berlin, rumours persisted over the years that he had escaped from Germany after World War II. In 1972, skeletal remains were found during construction work, and by investigating the teeth and the bones experts concluded that they were from Bormann. Nevertheless, new rumours arose and in order to end this speculation we were commissioned to identify the skeletal remains by mitochondrial DNA analysis. The comparison of the sequence of HV1 and HV2 from the skeletal remains and a living maternal relative of Martin Bormann revealed no differences and this sequence was not found in 1,500 Caucasoid reference sequences. Based on this investigation, we support the hypothesis that the skeletal remains are those of Martin Bormann.

  10. Identical mitochondrial somatic mutations unique to chronic periodontitis and coronary artery disease

    PubMed Central

    Pallavi, Tokala; Chandra, Rampalli Viswa; Reddy, Aileni Amarender; Reddy, Bavigadda Harish; Naveen, Anumala

    2016-01-01

    Context: The inflammatory processes involved in chronic periodontitis and coronary artery diseases (CADs) are similar and produce reactive oxygen species that may result in similar somatic mutations in mitochondrial deoxyribonucleic acid (mtDNA). Aims: The aims of the present study were to identify somatic mtDNA mutations in periodontal and cardiac tissues from subjects undergoing coronary artery bypass surgery and determine what fraction was identical and unique to these tissues. Settings and Design: The study population consisted of 30 chronic periodontitis subjects who underwent coronary artery surgery after an angiogram had indicated CAD. Materials and Methods: Gingival tissue samples were taken from the site with deepest probing depth; coronary artery tissue samples were taken during the coronary artery bypass grafting procedures, and blood samples were drawn during this surgical procedure. These samples were stored under aseptic conditions and later transported for mtDNA analysis. Statistical Analysis Used: Complete mtDNA sequences were obtained and aligned with the revised Cambridge reference sequence (NC_012920) using sequence analysis and auto assembler tools. Results: Among the complete mtDNA sequences, a total of 162 variations were spread across the whole mitochondrial genome and present only in the coronary artery and the gingival tissue samples but not in the blood samples. Among the 162 variations, 12 were novel and four of the 12 novel variations were found in mitochondrial NADH dehydrogenase subunit 5 complex I gene (33.3%). Conclusions: Analysis of mtDNA mutations indicated 162 variants unique to periodontitis and CAD. Of these, 12 were novel and may have resulted from destructive oxidative forces common to these two diseases. PMID:27041832

  11. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    PubMed Central

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  12. Prediction of TF target sites based on atomistic models of protein-DNA complexes

    PubMed Central

    Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno

    2008-01-01

    Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190

  13. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    PubMed

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  14. GPU-BSM: A GPU-Based Tool to Map Bisulfite-Treated Reads

    PubMed Central

    Manconi, Andrea; Orro, Alessandro; Manca, Emanuele; Armano, Giuliano; Milanesi, Luciano

    2014-01-01

    Cytosine DNA methylation is an epigenetic mark implicated in several biological processes. Bisulfite treatment of DNA is acknowledged as the gold standard technique to study methylation. This technique introduces changes in the genomic DNA by converting cytosines to uracils while 5-methylcytosines remain nonreactive. During PCR amplification 5-methylcytosines are amplified as cytosine, whereas uracils and thymines as thymine. To detect the methylation levels, reads treated with the bisulfite must be aligned against a reference genome. Mapping these reads to a reference genome represents a significant computational challenge mainly due to the increased search space and the loss of information introduced by the treatment. To deal with this computational challenge we devised GPU-BSM, a tool based on modern Graphics Processing Units. Graphics Processing Units are hardware accelerators that are increasingly being used successfully to accelerate general-purpose scientific applications. GPU-BSM is a tool able to map bisulfite-treated reads from whole genome bisulfite sequencing and reduced representation bisulfite sequencing, and to estimate methylation levels, with the goal of detecting methylation. Due to the massive parallelization obtained by exploiting graphics cards, GPU-BSM aligns bisulfite-treated reads faster than other cutting-edge solutions, while outperforming most of them in terms of unique mapped reads. PMID:24842718

  15. Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments

    PubMed Central

    Sun, Kun; Jiang, Peiyong; Chan, K. C. Allen; Wong, John; Cheng, Yvonne K. Y.; Liang, Raymond H. S.; Chan, Wai-kong; Ma, Edmond S. K.; Chan, Stephen L.; Cheng, Suk Hang; Chan, Rebecca W. Y.; Tong, Yu K.; Ng, Simon S. M.; Wong, Raymond S. M.; Hui, David S. C.; Leung, Tse Ngong; Leung, Tak Y.; Lai, Paul B. S.; Chiu, Rossa W. K.; Lo, Yuk Ming Dennis

    2015-01-01

    Plasma consists of DNA released from multiple tissues within the body. Using genome-wide bisulfite sequencing of plasma DNA and deconvolution of the sequencing data with reference to methylation profiles of different tissues, we developed a general approach for studying the major tissue contributors to the circulating DNA pool. We tested this method in pregnant women, patients with hepatocellular carcinoma, and subjects following bone marrow and liver transplantation. In most subjects, white blood cells were the predominant contributors to the circulating DNA pool. The placental contributions in the plasma of pregnant women correlated with the proportional contributions as revealed by fetal-specific genetic markers. The graft-derived contributions to the plasma in the transplant recipients correlated with those determined using donor-specific genetic markers. Patients with hepatocellular carcinoma showed elevated plasma DNA contributions from the liver, which correlated with measurements made using tumor-associated copy number aberrations. In hepatocellular carcinoma patients and in pregnant women exhibiting copy number aberrations in plasma, comparison of methylation deconvolution results using genomic regions with different copy number status pinpointed the tissue type responsible for the aberrations. In a pregnant woman diagnosed as having follicular lymphoma during pregnancy, methylation deconvolution indicated a grossly elevated contribution from B cells into the plasma DNA pool and localized B cells as the origin of the copy number aberrations observed in plasma. This method may serve as a powerful tool for assessing a wide range of physiological and pathological conditions based on the identification of perturbed proportional contributions of different tissues into plasma. PMID:26392541

  16. Molecular detection and characterization of Anaplasma platys in dogs and ticks in Cuba.

    PubMed

    Silva, Claudia Bezerra da; Santos, Huarrisson Azevedo; Navarrete, Maylín González; Ribeiro, Carla Carolina Dias Uzedo; Gonzalez, Belkis Corona; Zaldivar, Maykelin Fuentes; Pires, Marcus Sandes; Peckle, Maristela; Costa, Renata Lins da; Vitari, Gabriela Lopes Vivas; Massard, Carlos Luiz

    2016-07-01

    Canine cyclic thrombocytopenia, an infectious disease caused by Anaplasma platys is a worldwide dog health problem. This study aimed to detect and characterize A. platys deoxyribonucleic acid (DNA) in dogs and ticks from Cuba using molecular methods. The study was conducted in four cities of Cuba (Habana del Este, Boyeros, Cotorro and San José de las Lajas). Blood samples were collected from 100 dogs in these cities. The animals were inspected for the detection of tick infestation and specimens were collected. Genomic DNA was extracted from dog blood and ticks using a commercial kit. Genomic DNA samples from blood and ticks were tested by a nested polymerase chain reaction (nPCR) to amplify 678 base pairs (bp) from the 16S ribosomal DNA (rDNA) of A. platys. Positive samples in nPCR were also subjected to PCR to amplify a fragment of 580bp from the citrate synthase (gltA) gene and the products were sequenced. Only Rhipicephalus sanguineus sensu lato (s.l.) was found on dogs, and 10.20% (n=5/49) of these ticks plus sixteen percent (16.0%, n=16/100) of dogs were considered positive for A. platys by nPCR targeting the 16S rDNA gene. All analyzed gltA and 16S rDNA sequences showed a 99-100% identity with sequences of A. platys reported in around the world. Phylogenetic analysis showed two defined clusters for the 16S rDNA gene and three defined clusters for the gltA gene. Based on the gltA gene, the deduced amino acid sequence showed two mutations at positions 88 and 168 compared with the sequence DQ525687 (GenBank ID from Italian sample), used as a reference in the alignment. A preliminary study on the epidemiological aspects associated with infection by A. platys showed no statistical association with the variables studied (p>0.05). This is the first evidence of the presence of A. platys in dogs and ticks in Cuba. Further studies are needed to evaluate the epidemiological aspects of A. platys infection in Cuban dogs. Copyright © 2016 Elsevier GmbH. All rights reserved.

  17. A comparative study of ChIP-seq sequencing library preparation methods.

    PubMed

    Sundaram, Arvind Y M; Hughes, Timothy; Biondi, Shea; Bolduc, Nathalie; Bowman, Sarah K; Camilli, Andrew; Chew, Yap C; Couture, Catherine; Farmer, Andrew; Jerome, John P; Lazinski, David W; McUsic, Andrew; Peng, Xu; Shazand, Kamran; Xu, Feng; Lyle, Robert; Gilfillan, Gregor D

    2016-10-21

    ChIP-seq is the primary technique used to investigate genome-wide protein-DNA interactions. As part of this procedure, immunoprecipitated DNA must undergo "library preparation" to enable subsequent high-throughput sequencing. To facilitate the analysis of biopsy samples and rare cell populations, there has been a recent proliferation of methods allowing sequencing library preparation from low-input DNA amounts. However, little information exists on the relative merits, performance, comparability and biases inherent to these procedures. Notably, recently developed single-cell ChIP procedures employing microfluidics must also employ library preparation reagents to allow downstream sequencing. In this study, seven methods designed for low-input DNA/ChIP-seq sample preparation (Accel-NGS® 2S, Bowman-method, HTML-PCR, SeqPlex™, DNA SMART™, TELP and ThruPLEX®) were performed on five replicates of 1 ng and 0.1 ng input H3K4me3 ChIP material, and compared to a "gold standard" reference PCR-free dataset. The performance of each method was examined for the prevalence of unmappable reads, amplification-derived duplicate reads, reproducibility, and for the sensitivity and specificity of peak calling. We identified consistent high performance in a subset of the tested reagents, which should aid researchers in choosing the most appropriate reagents for their studies. Furthermore, we expect this work to drive future advances by identifying and encouraging use of the most promising methods and reagents. The results may also aid judgements on how comparable are existing datasets that have been prepared with different sample library preparation reagents.

  18. Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals.

    PubMed

    Taylor, Jeremy F; Whitacre, Lynsey K; Hoff, Jesse L; Tizioto, Polyana C; Kim, JaeWoo; Decker, Jared E; Schnabel, Robert D

    2016-08-17

    Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual's genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. Assembly errors and a lack of annotation of functional elements significantly limit the utility of the current draft livestock reference assemblies. The Functional Annotation of Animal Genomes initiative seeks to annotate functional elements, while a 70X Pac-Bio assembly for cow is underway and may result in a significantly improved reference assembly.

  19. Analysis of the mitochondrial genome of cheetahs (Acinonyx jubatus) with neurodegenerative disease.

    PubMed

    Burger, Pamela A; Steinborn, Ralf; Walzer, Christian; Petit, Thierry; Mueller, Mathias; Schwarzenberger, Franz

    2004-08-18

    The complete mitochondrial genome of Acinonyx jubatus was sequenced and mitochondrial DNA (mtDNA) regions were screened for polymorphisms as candidates for the cause of a neurodegenerative demyelinating disease affecting captive cheetahs. The mtDNA reference sequences were established on the basis of the complete sequences of two diseased and two nondiseased animals as well as partial sequences of 26 further individuals. The A. jubatus mitochondrial genome is 17,047-bp long and shows a high sequence similarity (91%) to the domestic cat. Based on single nucleotide polymorphisms (SNPs) in the control region (CR) and pedigree information, the 18 myelopathic and 12 non-myelopathic cheetahs included in this study were classified into haplotypes I, II and III. In view of the phenotypic comparability of the neurodegenerative disease observed in cheetahs and human mtDNA-associated diseases, specific coding regions including the tRNAs leucine UUR, lysine, serine UCN, and partial complex I and V sequences were screened. We identified a heteroplasmic and a homoplasmic SNP at codon 507 in the subunit 5 (MTND5) of complex I. The heteroplasmic haplotype I-specific valine to methionine substitution represents a nonconservative amino acid change and was found in 11 myelopathic and eight non-myelopathic cheetahs with levels ranging from 29% to 79%. The homoplasmic conservative amino acid substitution valine to alanine was identified in two myelopathic animals of haplotype II. In addition, a synonymous SNP in the codon 76 of the MTND4L gene was found in the single haplotype III animal. The amino acid exchanges in the MTND5 gene were not associated with the occurrence of neurodegenerative disease in captive cheetahs.

  20. DNA analysis in the case of Kaspar Hauser.

    PubMed

    Weichhold, G M; Bark, J E; Korte, W; Eisenmenger, W; Sullivan, K M

    1998-01-01

    In 1828 a mysterious young man appeared in Nürnberg, Germany, who was barely able to speak or walk but could write down his name, Kaspar Hauser. He quickly became the centre of social interest but also the victim of intrigue. His appearance, his origin and assassination in 1833 were, and still are, the source of much debate. The most widely accepted theory postulates that Kaspar Hauser was the son of Grand Duke Carl von Baden and his wife Stephanie de Beauharnais, an adopted daughter of Napoleon Bonaparte. To check this theory, DNA analysis was performed on the clothes most likely worn by Kaspar Hauser when he was stabbed on December 14th, 1833. A suitable bloodstain from the underpants was divided and analysed independently by the Institute of Legal Medicine, University of Munich (ILM) and the Forensic Science Service Laboratory, Birmingham (FSS). Mitochondrial DNA (mtDNA) was sequenced from the bloodstain and from blood samples obtained from two living maternal relatives of Stephanie de Beauharnais. The sequence from the bloodstained clothing differed from the sequence found in both reference blood samples at seven confirmed positions. This proves that the bloodstain does not originate from a son of Stephanie de Beauharnais. Thus, it is becoming clear that Kaspar Hauser was not the Prince of Baden.

  1. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies.

    PubMed

    Kim, Jeremie S; Senol Cali, Damla; Xin, Hongyi; Lee, Donghyuk; Ghose, Saugata; Alser, Mohammed; Hassan, Hasan; Ergin, Oguz; Alkan, Can; Mutlu, Onur

    2018-05-09

    Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments. We propose a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM). GRIM-Filter quickly filters seed locations by 1) introducing a new representation of coarse-grained segments of the reference genome, and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for a sequence alignment error tolerance of 0.05, GRIM-Filter 1) reduces the false negative rate of filtering by 5.59x-6.41x, and 2) provides an end-to-end read mapper speedup of 1.81x-3.65x, compared to a state-of-the-art read mapper employing the best previous seed location filtering algorithm. GRIM-Filter exploits 3D-stacked memory, which enables the efficient use of processing-in-memory, to overcome the memory bandwidth bottleneck in seed location filtering. We show that GRIM-Filter significantly improves the performance of a state-of-the-art read mapper. GRIM-Filter is a universal seed location filter that can be applied to any read mapper. We hope that our results provide inspiration for new works to design other bioinformatics algorithms that take advantage of emerging technologies and new processing paradigms, such as processing-in-memory using 3D-stacked memory devices.

  2. Early Epstein-Barr Virus Genomic Diversity and Convergence toward the B95.8 Genome in Primary Infection.

    PubMed

    Weiss, Eric R; Lamers, Susanna L; Henderson, Jennifer L; Melnikov, Alexandre; Somasundaran, Mohan; Garber, Manuel; Selin, Liisa; Nusbaum, Chad; Luzuriaga, Katherine

    2018-01-15

    Over 90% of the world's population is persistently infected with Epstein-Barr virus. While EBV does not cause disease in most individuals, it is the common cause of acute infectious mononucleosis (AIM) and has been associated with several cancers and autoimmune diseases, highlighting a need for a preventive vaccine. At present, very few primary, circulating EBV genomes have been sequenced directly from infected individuals. While low levels of diversity and low viral evolution rates have been predicted for double-stranded DNA (dsDNA) viruses, recent studies have demonstrated appreciable diversity in common dsDNA pathogens (e.g., cytomegalovirus). Here, we report 40 full-length EBV genome sequences obtained from matched oral wash and B cell fractions from a cohort of 10 AIM patients. Both intra- and interpatient diversity were observed across the length of the entire viral genome. Diversity was most pronounced in viral genes required for establishing latent infection and persistence, with appreciable levels of diversity also detected in structural genes, including envelope glycoproteins. Interestingly, intrapatient diversity declined significantly over time ( P < 0.01), and this was particularly evident on comparison of viral genomes sequenced from B cell fractions in early primary infection and convalescence ( P < 0.001). B cell-associated viral genomes were observed to converge, becoming nearly identical to the B95.8 reference genome over time (Spearman rank-order correlation test; r = -0.5589, P = 0.0264). The reduction in diversity was most marked in the EBV latency genes. In summary, our data suggest independent convergence of diverse viral genome sequences toward a reference-like strain within a relatively short period following primary EBV infection. IMPORTANCE Identification of viral proteins with low variability and high immunogenicity is important for the development of a protective vaccine. Knowledge of genome diversity within circulating viral populations is a key step in this process, as is the expansion of intrahost genomic variation during infection. We report full-length EBV genomes sequenced from the blood and oral wash of 10 individuals early in primary infection and during convalescence. Our data demonstrate considerable diversity within the pool of circulating EBV strains, as well as within individual patients. Overall viral diversity decreased from early to persistent infection, particularly in latently infected B cells, which serve as the viral reservoir. Reduction in B cell-associated viral genome diversity coincided with a convergence toward a reference-like EBV genotype. Greater convergence positively correlated with time after infection, suggesting that the reference-like genome is the result of selection. Copyright © 2018 American Society for Microbiology.

  3. Differences in expression of retinal pigment epithelium mRNA between normal canines

    PubMed Central

    2004-01-01

    Abstract A reference database of differences in mRNA expression in normal healthy canine retinal pigment epithelium (RPE) has been established. This database identifies non-informative differences in mRNA expression that can be used in screening canine RPE for mutations associated with clinical effects on vision. Complementary DNA (cDNA) pools were prepared from mRNA harvested from RPE, amplified by PCR, and used in a subtractive hybridization protocol (representational differential analysis) to identify differences in RPE mRNA expression between canines. The effect of relatedness of the test canines on the frequency of occurrence of differences was evaluated by using 2 unrelated canines for comparison with 2 female sibling canines of blue heeler/bull terrier lineage. Differentially expressed cDNA species were cloned, sequenced, and identified by comparison to public database entries. The most frequently observed differentially expressed sequence from the unrelated canine comparison was cDNA with 21 base pairs (bp) identical to the human epithelial membrane protein 1 gene (present in 8 of 20 clones). Different clones from the same-sex sibling RPE contained repetitions of several short sequence motifs including the human epithelial membrane protein 1 (4 of 25 clones). Other prevalent differences between sibling RPE included sequences similar to a chicken genetic marker sequence motif (5 of 25), and 6 clones with homology to porcine major histocompatibility loci. In addition to identifying several repetitively occurring, noninformative, differentially expressed RPE mRNA species, the findings confirm that fewer differences occurred between siblings, highlighting the importance of using closely related subjects in representational difference analysis studies. PMID:15352545

  4. Base-resolution detection of N 4-methylcytosine in genomic DNA using 4mC-Tet-assisted-bisulfite-sequencing

    DOE PAGES

    Yu, Miao; Ji, Lexiang; Neumann, Drexel A.; ...

    2015-07-15

    Restriction-modification (R-M) systems pose a major barrier to DNA transformation and genetic engineering of bacterial species. Systematic identification of DNA methylation in R-M systems, including N 6-methyladenine (6mA), 5-methylcytosine (5mC) and N 4-methylcytosine (4mC), will enable strategies to make these species genetically tractable. Although single-molecule, real time (SMRT) sequencing technology is capable of detecting 4mC directly for any bacterial species regardless of whether an assembled genome exists or not, it is not as scalable to profiling hundreds to thousands of samples compared with the commonly used next-generation sequencing technologies. Here, we present 4mC-Tet-assisted bisulfite-sequencing (4mC-TAB-seq), a next-generation sequencing method thatmore » rapidly and cost efficiently reveals the genome-wide locations of 4mC for bacterial species with an available assembled reference genome. In 4mC-TAB-seq, both cytosines and 5mCs are read out as thymines, whereas only 4mCs are read out as cytosines, revealing their specific positions throughout the genome. We applied 4mC-TAB-seq to study the methylation of a member of the hyperthermophilc genus, Caldicellulosiruptor, in which 4mC-related restriction is a major barrier to DNA transformation from other species. Lastly, in combination with MethylC-seq, both 4mC- and 5mC-containing motifs are identified which can assist in rapid and efficient genetic engineering of these bacteria in the future.« less

  5. Diverse Applications of Environmental DNA Methods in Parasitology.

    PubMed

    Bass, David; Stentiford, Grant D; Littlewood, D T J; Hartikainen, Hanna

    2015-10-01

    Nucleic acid extraction and sequencing of genes from organisms within environmental samples encompasses a variety of techniques collectively referred to as environmental DNA or 'eDNA'. The key advantages of eDNA analysis include the detection of cryptic or otherwise elusive organisms, large-scale sampling with fewer biases than specimen-based methods, and generation of data for molecular systematics. These are particularly relevant for parasitology because parasites can be difficult to locate and are morphologically intractable and genetically divergent. However, parasites have rarely been the focus of eDNA studies. Focusing on eukaryote parasites, we review the increasing diversity of the 'eDNA toolbox'. Combining eDNA methods with complementary tools offers much potential to understand parasite communities, disease risk, and parasite roles in broader ecosystem processes such as food web structuring and community assembly. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.

  6. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses

    DOE PAGES

    Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna; ...

    2016-10-30

    Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less

  7. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna

    Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less

  8. DNA barcodes of the native ray-finned fishes in Taiwan.

    PubMed

    Chang, Chia-Hao; Shao, Kwang-Tsao; Lin, Han-Yang; Chiu, Yung-Chieh; Lee, Mao-Ying; Liu, Shih-Hui; Lin, Pai-Lei

    2017-07-01

    Species identification based on the DNA sequence of a fragment of the cytochrome c oxidase subunit I gene in the mitochondrial genome, DNA barcoding, is widely applied to assist in sustainable exploitation of fish resources and the protection of fish biodiversity. The aim of this study was to establish a reliable barcoding reference database of the native ray-finned fishes in Taiwan. A total of 2993 individuals, belonging to 1245 species within 637 genera, 184 families and 29 orders of ray-finned fishes and representing approximately 40% of the recorded ray-finned fishes in Taiwan, were PCR amplified at the barcode region and bidirectionally sequenced. The mean length of the 2993 barcodes is 549 bp. Mean congeneric K2P distance (15.24%) is approximately 10-fold higher than the mean conspecific one (1.51%), but approximately 1.4-fold less than the mean genetic distance between families (20.80%). The Barcode Index Number (BIN) discordance report shows that 2993 specimens represent 1275 BINs and, among them, 86 BINs are singletons, 570 BINs are taxonomically concordant, and the other 619 BINs are taxonomically discordant. Barcode gap analysis also revealed that more than 90% of the collected fishes in this study can be discriminated by DNA barcoding. Overall, the barcoding reference database established by this study reveals the need for taxonomic revisions and voucher specimen rechecks, in addition to assisting in the management of Taiwan's fish resources and diversity. © 2016 John Wiley & Sons Ltd.

  9. DNA barcoding of vouchered xylarium wood specimens of nine endangered Dalbergia species.

    PubMed

    Yu, Min; Jiao, Lichao; Guo, Juan; Wiedenhoeft, Alex C; He, Tuo; Jiang, Xiaomei; Yin, Yafang

    2017-12-01

    ITS2+ trnH - psbA was the best combination of DNA barcode to resolve the Dalbergia wood species studied. We demonstrate the feasibility of building a DNA barcode reference database using xylarium wood specimens. The increase in illegal logging and timber trade of CITES-listed tropical species necessitates the development of unambiguous identification methods at the species level. For these methods to be fully functional and deployable for law enforcement, they must work using wood or wood products. DNA barcoding of wood has been promoted as a promising tool for species identification; however, the main barrier to extensive application of DNA barcoding to wood is the lack of a comprehensive and reliable DNA reference library of barcodes from wood. In this study, xylarium wood specimens of nine Dalbergia species were selected from the Wood Collection of the Chinese Academy of Forestry and DNA was then extracted from them for further PCR amplification of eight potential DNA barcode sequences (ITS2, matK, trnL, trnH-psbA, trnV-trnM1, trnV-trnM2, trnC-petN, and trnS-trnG). The barcodes were tested singly and in combination for species-level discrimination ability by tree-based [neighbor-joining (NJ)] and distance-based (TaxonDNA) methods. We found that the discrimination ability of DNA barcodes in combination was higher than any single DNA marker among the Dalbergia species studied, with the best two-marker combination of ITS2+trnH-psbA analyzed with NJ trees performing the best (100% accuracy). These barcodes are relatively short regions (<350 bp) and amplification reactions were performed with high success (≥90%) using wood as the source material, a necessary factor to apply DNA barcoding to timber trade. The present results demonstrate the feasibility of using vouchered xylarium specimens to build DNA barcoding reference databases.

  10. Quantitative Viral Community DNA Analysis Reveals the Dominance of Single-Stranded DNA Viruses in Offshore Upper Bathyal Sediment from Tohoku, Japan

    PubMed Central

    Yoshida, Mitsuhiro; Mochizuki, Tomohiro; Urayama, Syun-Ichi; Yoshida-Takashima, Yukari; Nishi, Shinro; Hirai, Miho; Nomaki, Hidetaka; Takaki, Yoshihiro; Nunoura, Takuro; Takai, Ken

    2018-01-01

    Previous studies on marine environmental virology have primarily focused on double-stranded DNA (dsDNA) viruses; however, it has recently been suggested that single-stranded DNA (ssDNA) viruses are more abundant in marine ecosystems. In this study, we performed a quantitative viral community DNA analysis to estimate the relative abundance and composition of both ssDNA and dsDNA viruses in offshore upper bathyal sediment from Tohoku, Japan (water depth = 500 m). The estimated dsDNA viral abundance ranged from 3 × 106 to 5 × 106 genome copies per cm3 sediment, showing values similar to the range of fluorescence-based direct virus counts. In contrast, the estimated ssDNA viral abundance ranged from 1 × 108 to 3 × 109 genome copies per cm3 sediment, thus providing an estimation that the ssDNA viral populations represent 96.3–99.8% of the benthic total DNA viral assemblages. In the ssDNA viral metagenome, most of the identified viral sequences were associated with ssDNA viral families such as Circoviridae and Microviridae. The principle components analysis of the ssDNA viral sequence components from the sedimentary ssDNA viral metagenomic libraries found that the different depth viral communities at the study site all exhibited similar profiles compared with deep-sea sediment ones at other reference sites. Our results suggested that deep-sea benthic ssDNA viruses have been significantly underestimated by conventional direct virus counts and that their contributions to deep-sea benthic microbial mortality and geochemical cycles should be further addressed by such a new quantitative approach. PMID:29467725

  11. An integrated pipeline for next generation sequencing and annotation of the complete mitochondrial genome of the giant intestinal fluke, Fasciolopsis buski (Lankester, 1857) Looss, 1899

    PubMed Central

    Biswal, Devendra Kumar; Ghatani, Sudeep; Shylla, Jollin A.; Sahu, Ranjana; Mullapudi, Nandita

    2013-01-01

    Helminths include both parasitic nematodes (roundworms) and platyhelminths (trematode and cestode flatworms) that are abundant, and are of clinical importance. The genetic characterization of parasitic flatworms using advanced molecular tools is central to the diagnosis and control of infections. Although the nuclear genome houses suitable genetic markers (e.g., in ribosomal (r) DNA) for species identification and molecular characterization, the mitochondrial (mt) genome consistently provides a rich source of novel markers for informative systematics and epidemiological studies. In the last decade, there have been some important advances in mtDNA genomics of helminths, especially lung flukes, liver flukes and intestinal flukes. Fasciolopsis buski, often called the giant intestinal fluke, is one of the largest digenean trematodes infecting humans and found primarily in Asia, in particular the Indian subcontinent. Next-generation sequencing (NGS) technologies now provide opportunities for high throughput sequencing, assembly and annotation within a short span of time. Herein, we describe a high-throughput sequencing and bioinformatics pipeline for mt genomics for F. buski that emphasizes the utility of short read NGS platforms such as Ion Torrent and Illumina in successfully sequencing and assembling the mt genome using innovative approaches for PCR primer design as well as assembly. We took advantage of our NGS whole genome sequence data (unpublished so far) for F. buski and its comparison with available data for the Fasciola hepatica mtDNA as the reference genome for design of precise and specific primers for amplification of mt genome sequences from F. buski. A long-range PCR was carried out to create an NGS library enriched in mt DNA sequences. Two different NGS platforms were employed for complete sequencing, assembly and annotation of the F. buski mt genome. The complete mt genome sequences of the intestinal fluke comprise 14,118 bp and is thus the shortest trematode mitochondrial genome sequenced to date. The noncoding control regions are separated into two parts by the tRNA-Gly gene and don’t contain either tandem repeats or secondary structures, which are typical for trematode control regions. The gene content and arrangement are identical to that of F. hepatica. The F. buski mtDNA genome has a close resemblance with F. hepatica and has a similar gene order tallying with that of other trematodes. The mtDNA for the intestinal fluke is reported herein for the first time by our group that would help investigate Fasciolidae taxonomy and systematics with the aid of mtDNA NGS data. More so, it would serve as a resource for comparative mitochondrial genomics and systematic studies of trematode parasites. PMID:24255820

  12. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver.

    PubMed

    Wymant, Chris; Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe

    2018-01-01

    Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.

  13. DNA Barcoding analysis of seafood accuracy in Washington, D.C. restaurants

    PubMed Central

    Stern, David B.; Castro Nallar, Eduardo; Rathod, Jason

    2017-01-01

    In Washington D.C., recent legislation authorizes citizens to test if products are properly represented and, if they are not, to bring a lawsuit for the benefit of the general public. Recent studies revealing the widespread phenomenon of seafood substitution across the United States make it a fertile area for consumer protection testing. DNA barcoding provides an accurate and cost-effective way to perform these tests, especially when tissue alone is available making species identification based on morphology impossible. In this study, we sequenced the 5′ barcoding region of the Cytochrome Oxidase I gene for 12 samples of vertebrate and invertebrate food items across six restaurants in Washington, D.C. and used multiple analytical methods to make identifications. These samples included several ambiguous menu listings, sequences with little genetic variation among closely related species and one sequence with no available reference sequence. Despite these challenges, we were able to make identifications for all samples and found that 33% were potentially mislabeled. While we found a high degree of mislabeling, the errors involved closely related species and we did not identify egregious substitutions as have been found in other cities. This study highlights the efficacy of DNA barcoding and robust analyses in identifying seafood items for consumer protection. PMID:28462038

  14. MULTIPLE ENZYME RESTRICTION FRAGMENT LENGTH POLYMORPHISM ANALYSIS FOR HIGH RESOLUTION DISTINCTION OF PSEUDOMONAS (SENSU STRICTO) 16S RRNA GENES

    EPA Science Inventory

    Pseudomonas specific 16S rDNA PCR amplification and multiple enzyme restriction fragment length polymorphism (MERFLP) analysis using a single digestion mixture of Alu I, Hinf I, Rsa I, and Tru 9I distinguished 150 published sequences and reference strains of authentic Pseudomonas...

  15. RefSeq microbial genomes database: new representation and annotation strategy.

    PubMed

    Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor

    2014-01-01

    The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.

  16. Genetic signature analysis of Perkinsus marinus in Mexico suggests possible translocation from the Atlantic Ocean to the Pacific coast of Mexico.

    PubMed

    Ek-Huchim, Juan Pablo; Aguirre-Macedo, Ma Leopoldina; Améndola-Pimenta, Monica; Vidal-Martínez, Victor Manuel; Pérez-Vega, Juan Antonio; Simá-Alvarez, Raúl; Jiménez-García, Isabel; Zamora-Bustillos, Roberto; Rodríguez-Canul, Rossanna

    2017-08-02

    The protozoan Perkinsus marinus (Mackin, Owen & Collier) Levine, 1978 causes perkinsosis in the American oyster Crassostrea virginica Gmelin, 1791. This pathogen is present in cultured C. virginica from the Gulf of Mexico and has been reported recently in Saccostrea palmula (Carpenter, 1857), Crassostrea corteziensis (Hertlein, 1951) and Crassostrea gigas (Thunberg, 1793) from the Mexican Pacific coast. Transportation of fresh oysters for human consumption and repopulation could be implicated in the transmission and dissemination of this parasite across the Mexican Pacific coast. The aim of this study was two-fold. First, we evaluated the P. marinus infection parameters by PCR and RFTM (Ray's fluid thioglycollate medium) in C. virginica from four major lagoons (Términos Lagoon, Campeche; Carmen-Pajonal-Machona Lagoon complex, Tabasco; Mandinga Lagoon, Veracruz; and La Pesca Lagoon, Tamaulipas) from the Gulf of Mexico. Secondly, we used DNA sequence analyses of the ribosomal non-transcribed spacer (rNTS) region of P. marinus to determine the possible translocation of this species from the Gulf of Mexico to the Mexican Pacific coast. Perkinsus marinus prevalence by PCR was 57.7% (338 out of 586 oysters) and 38.2% (224 out of 586 oysters) by RFTM. The highest prevalence was observed in the Carmen-Pajonal-Machona Lagoon complex in the state of Tabasco (73% by PCR and 58% by RFTM) and the estimated weighted prevalence (WP) was less than 1.0 in the four lagoons. Ten unique rDNA-NTS sequences of P. marinus [termed herein the "P. marinus (Pm) haplotype"] were identified in the Gulf of Mexico sample. They shared 96-100% similarity with 18 rDNA-NTS sequences from the GenBank database which were derived from 16 Mexican Pacific coast infections and two sequences from the USA. The phylogenetic tree and the haplotype network showed that the P. marinus rDNA-NTS sequences from Mexico were distant from the rDNA-NTS sequences of P. marinus reported from the USA. The ten rDNA-NTS sequences described herein were restricted to specific locations displaying different geographical connections within the Gulf of Mexico; the Carmen-Pajonal-Machona Pm1 haplotype from the state of Tabasco shared a cluster with P. marinus isolates reported from the Mexican Pacific coast. The rDNA-NTS sequences of P. marinus from the state of Tabasco shared high similarity with the reference rDNA-NTS sequences from the Mexican Pacific coast. The high similarity suggests a transfer of oysters infected with P. marinus from the Mexican part of the Gulf of Mexico into the Mexican Pacific coast.

  17. Effects of different preservation methods on inter simple sequence repeat (ISSR) and random amplified polymorphic DNA (RAPD) molecular markers in botanic samples.

    PubMed

    Wang, Xiaolong; Li, Lin; Zhao, Jiaxin; Li, Fangliang; Guo, Wei; Chen, Xia

    2017-04-01

    To evaluate the effects of different preservation methods (stored in a -20°C ice chest, preserved in liquid nitrogen and dried in silica gel) on inter simple sequence repeat (ISSR) or random amplified polymorphic DNA (RAPD) analyses in various botanical specimens (including broad-leaved plants, needle-leaved plants and succulent plants) for different times (three weeks and three years), we used a statistical analysis based on the number of bands, genetic index and cluster analysis. The results demonstrate that methods used to preserve samples can provide sufficient amounts of genomic DNA for ISSR and RAPD analyses; however, the effect of different preservation methods on these analyses vary significantly, and the preservation time has little effect on these analyses. Our results provide a reference for researchers to select the most suitable preservation method depending on their study subject for the analysis of molecular markers based on genomic DNA. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.

  18. Are commercial providers a viable option for clinical bacterial sequencing?

    PubMed

    Raven, Kathy; Blane, Beth; Churcher, Carol; Parkhill, Julian; Peacock, Sharon J

    2018-04-05

    Bacterial whole-genome sequencing in the clinical setting has the potential to bring major improvements to infection control and clinical practice. Sequencing instruments are not currently available in the majority of routine microbiology laboratories worldwide, but an alternative is to use external sequencing providers. To foster discussion around this we investigated whether send-out services were a viable option. Four providers offering MiSeq sequencing were selected based on cost and evaluated based on the service provided and sequence data quality. DNA was prepared from five methicillin-resistant Staphylococcus aureus (MRSA) isolates, four of which were investigated during a previously published outbreak in the UK together with a reference MRSA isolate (ST22 HO 5096 0412). Cost of sequencing per isolate ranged from £155 to £342 and turnaround times from DNA postage to arrival of sequence data ranged from 12 to 63 days. Comparison of commercially generated genomes against the original sequence data demonstrated very high concordance, with no more than one single nucleotide polymorphism (SNP) difference on core genome mapping between the original sequences and the new sequence for all four providers. Multilocus sequence type could not be assigned based on assembly for the two cheapest sequence providers due to fragmented assemblies probably caused by a lower output of sequence data per isolate. Our results indicate that external providers returned highly accurate genome data, but that improvements are required in turnaround time to make this a viable option for use in clinical practice.

  19. Analyses of Methylomes Derived from Meso-American Common Bean (Phaseolus vulgaris L.) Using MeDIP-Seq and Whole Genome Sodium Bisulfite-Sequencing.

    PubMed

    Crampton, Mollee; Sripathi, Venkateswara R; Hossain, Khwaja; Kalavacharla, Venu

    2016-01-01

    Common bean (Phaseolus vulgaris L.) is economically important for its high protein, fiber, and micronutrient contents, with a relatively small genome size of ∼587 Mb. Common bean is genetically diverse with two major gene pools, Meso-American and Andean. The phenotypic variability within common bean is partly attributed to the genetic diversity and epigenetic changes that are largely influenced by environmental factors. It is well established that an important epigenetic regulator of gene expression is DNA methylation. Here, we present results generated from two high-throughput sequencing technologies, methylated DNA immunoprecipitation-sequencing (MeDIP-seq) and whole genome bisulfite-sequencing (BS-Seq). Our analyses revealed that this Meso-American common bean displays similar methylation patterns as other previously published plant methylomes, with CG ∼50%, CHG ∼30%, and CHH ∼2.7% methylation, however, these differ from the common bean reference methylome of Andean origin. We identified higher CG methylation levels in both promoter and genic regions than CHG and CHH contexts. Moreover, we found relatively higher CG methylation levels in genes than in promoters. Conversely, the CHG and CHH methylation levels were highest in promoters than in genes. This is the first genome-wide DNA methylation profiling study in a Meso-American common bean cultivar ("Sierra") using NGS approaches. Our long-term goal is to generate genome-wide epigenomic maps in common bean focusing on chromatin accessibility, histone modifications, and DNA methylation.

  20. Analyses of Methylomes Derived from Meso-American Common Bean (Phaseolus vulgaris L.) Using MeDIP-Seq and Whole Genome Sodium Bisulfite-Sequencing

    PubMed Central

    Crampton, Mollee; Sripathi, Venkateswara R.; Hossain, Khwaja; Kalavacharla, Venu

    2016-01-01

    Common bean (Phaseolus vulgaris L.) is economically important for its high protein, fiber, and micronutrient contents, with a relatively small genome size of ∼587 Mb. Common bean is genetically diverse with two major gene pools, Meso-American and Andean. The phenotypic variability within common bean is partly attributed to the genetic diversity and epigenetic changes that are largely influenced by environmental factors. It is well established that an important epigenetic regulator of gene expression is DNA methylation. Here, we present results generated from two high-throughput sequencing technologies, methylated DNA immunoprecipitation-sequencing (MeDIP-seq) and whole genome bisulfite-sequencing (BS-Seq). Our analyses revealed that this Meso-American common bean displays similar methylation patterns as other previously published plant methylomes, with CG ∼50%, CHG ∼30%, and CHH ∼2.7% methylation, however, these differ from the common bean reference methylome of Andean origin. We identified higher CG methylation levels in both promoter and genic regions than CHG and CHH contexts. Moreover, we found relatively higher CG methylation levels in genes than in promoters. Conversely, the CHG and CHH methylation levels were highest in promoters than in genes. This is the first genome-wide DNA methylation profiling study in a Meso-American common bean cultivar (“Sierra”) using NGS approaches. Our long-term goal is to generate genome-wide epigenomic maps in common bean focusing on chromatin accessibility, histone modifications, and DNA methylation. PMID:27199997

  1. MinION Analysis and Reference Consortium: Phase 1 data release and analysis

    PubMed Central

    Eccles, David A.; Zalunin, Vadim; Urban, John M.; Piazza, Paolo; Bowden, Rory J.; Paten, Benedict; Mwaigwisya, Solomon; Batty, Elizabeth M.; Simpson, Jared T.; Snutch, Terrance P.

    2015-01-01

    The advent of a miniaturized DNA sequencing device with a high-throughput contextual sequencing capability embodies the next generation of large scale sequencing tools. The MinION™ Access Programme (MAP) was initiated by Oxford Nanopore Technologies™ in April 2014, giving public access to their USB-attached miniature sequencing device. The MinION Analysis and Reference Consortium (MARC) was formed by a subset of MAP participants, with the aim of evaluating and providing standard protocols and reference data to the community. Envisaged as a multi-phased project, this study provides the global community with the Phase 1 data from MARC, where the reproducibility of the performance of the MinION was evaluated at multiple sites. Five laboratories on two continents generated data using a control strain of Escherichia coli K-12, preparing and sequencing samples according to a revised ONT protocol. Here, we provide the details of the protocol used, along with a preliminary analysis of the characteristics of typical runs including the consistency, rate, volume and quality of data produced. Further analysis of the Phase 1 data presented here, and additional experiments in Phase 2 of E. coli from MARC are already underway to identify ways to improve and enhance MinION performance. PMID:26834992

  2. Validation of the (GTG)(5)-rep-PCR fingerprinting technique for rapid classification and identification of acetic acid bacteria, with a focus on isolates from Ghanaian fermented cocoa beans.

    PubMed

    De Vuyst, Luc; Camu, Nicholas; De Winter, Tom; Vandemeulebroecke, Katrien; Van de Perre, Vincent; Vancanneyt, Marc; De Vos, Paul; Cleenwerck, Ilse

    2008-06-30

    Amplification of repetitive bacterial DNA elements through the polymerase chain reaction (rep-PCR fingerprinting) using the (GTG)(5) primer, referred to as (GTG)(5)-PCR fingerprinting, was found a promising genotypic tool for rapid and reliable speciation of acetic acid bacteria (AAB). The method was evaluated with 64 AAB reference strains, including 31 type strains, and 132 isolates from Ghanaian, fermented cocoa beans, and was validated with DNA:DNA hybridization data. Most reference strains, except for example all Acetobacter indonesiensis strains and Gluconacetobacter liquefaciens LMG 1509, grouped according to their species designation, indicating the usefulness of this technique for identification to the species level. Moreover, exclusive patterns were obtained for most strains, suggesting that the technique can also be used for characterization below species level or typing of AAB strains. The (GTG)(5)-PCR fingerprinting allowed us to differentiate four major clusters among the fermented cocoa bean isolates, namely A. pasteurianus (cluster I, 100 isolates), A. syzygii- or A. lovaniensis-like (cluster II, 23 isolates), and A. tropicalis-like (clusters III and IV containing 4 and 5 isolates, respectively). A. syzygii-like and A. tropicalis-like strains from cocoa bean fermentations were reported for the first time. Validation of the method and indications for reclassifications of AAB species and existence of new Acetobacter species were obtained through 16S rRNA sequencing analyses and DNA:DNA hybridizations. Reclassifications refer to A. aceti LMG 1531, Ga. xylinus LMG 1518, and Ga. xylinus subsp. sucrofermentans LMG 18788(T).

  3. LINE-1 Elements in Structural Variation and Disease

    PubMed Central

    Beck, Christine R.; Garcia-Perez, José Luis; Badge, Richard M.; Moran, John V.

    2014-01-01

    The completion of the human genome reference sequence ushered in a new era for the study and discovery of human transposable elements. It now is undeniable that transposable elements, historically dismissed as junk DNA, have had an instrumental role in sculpting the structure and function of our genomes. In particular, long interspersed element-1 (LINE-1 or L1) and short interspersed elements (SINEs) continue to affect our genome, and their movement can lead to sporadic cases of disease. Here, we briefly review the types of transposable elements present in the human genome and their mechanisms of mobility. We next highlight how advances in DNA sequencing and genomic technologies have enabled the discovery of novel retrotransposons in individual genomes. Finally, we discuss how L1-mediated retrotransposition events impact human genomes. PMID:21801021

  4. Molecular and morphologic data reveal multiple species in Peromyscus pectoralis

    PubMed Central

    Bradley, Robert D.; Schmidly, David J.; Amman, Brian R.; Platt, Roy N.; Neumann, Kathy M.; Huynh, Howard M.; Muñiz-Martínez, Raúl; López-González, Celia; Ordóñez-Garza, Nicté

    2015-01-01

    DNA sequence and morphometric data were used to re-evaluate the taxonomy and systematics of Peromyscus pectoralis. Phylogenetic analyses (maximum likelihood and Bayesian inference) of DNA sequences from the mitochondrial cytochrome-b gene in 44 samples of P. pectoralis indicated 2 well-supported monophyletic clades. The 1st clade contained specimens from Texas historically assigned to P. p. laceianus; the 2nd was comprised of specimens previously referable to P. p. collinus, P. p. laceianus, and P. p. pectoralis obtained from northern and eastern Mexico. Levels of genetic variation (~7%) between these 2 clades indicated that the genetic divergence typically exceeded that reported for other species of Peromyscus. Samples of P. p. laceianus north and south of the Río Grande were not monophyletic. In addition, samples representing P. p. collinus and P. p. pectoralis formed 2 clades that differed genetically by 7.14%. Multivariate analyses of external and cranial measurements from 63 populations of P. pectoralis revealed 4 morpho-groups consistent with clades in the DNA sequence analysis: 1 from Texas and New Mexico assignable to P. p. laceianus; a 2nd from western and southern Mexico assignable to P. p. pectoralis; a 3rd from northern and central Mexico previously assigned to P. p. pectoralis but herein shown to represent an undescribed taxon; and a 4th from southeastern Mexico assignable to P. p. collinus. Based on the concordance of these results, populations from the United States are referred to as P. laceianus, whereas populations from Mexico are referred to as P. pectoralis (including some samples historically assigned to P. p. collinus, P. p. laceianus, and P. p. pectoralis). A new subspecies is described to represent populations south of the Río Grande in northern and central Mexico. Additional research is needed to discern if P. p. collinus warrants species recognition. PMID:26937045

  5. Differential mitochondrial DNA and gene expression in inherited retinal dysplasia in miniature Schnauzer dogs.

    PubMed

    Appleyard, Greg D; Forsyth, George W; Kiehlbauch, Laura M; Sigfrid, Kristen N; Hanik, Heather L J; Quon, Anita; Loewen, Matthew E; Grahn, Bruce H

    2006-05-01

    To investigate the molecular basis of inherited retinal dysplasia in miniature Schnauzers. Retina and retinal pigment epithelial tissues were collected from canine subjects at the age of 3 weeks. Total RNA isolated from these tissues was reverse transcribed to make representative cDNA pools that were compared for differences in gene expression by using a subtractive hybridization technique referred to as representational difference analysis (RDA). Expression differences identified by RDA were confirmed and quantified by real-time reverse-transcription PCR. Mitochondrial morphology from leukocytes and skeletal muscle of normal and affected miniature Schnauzers was examined by transmission electron microscopy. RDA screening of retinal pigment epithelial cDNA identified differences in mRNA transcript coding for two mitochondrial (mt) proteins--cytochrome oxidase subunit 1 and NADH dehydrogenase subunit 6--in affected dogs. Contrary to expectations, these identified sequences did not contain mutations. Based on the implication of mt-DNA-encoded proteins by the RDA experiments we used real-time PCR to compare the relative amounts of mt-DNA template in white blood cells from normal and affected dogs. White blood cells of affected dogs contained less than 30% of the normal amount of two specific mtDNA sequences, compared with the content of the nuclear-encoded glyceraldehyde-3-phosphate dehydrogenase (GA-3-PDH) reference gene. Retina and RPE tissue from affected dogs had reduced mRNA transcript levels for the two mitochondrial genes detected in the RDA experiment. Transcript levels for another mtDNA-encoded gene as well as the nuclear-encoded mitochondrial Tfam transcription factor were reduced in these tissues in affected dogs. Mitochondria from affected dogs were reduced in number and size and were unusually electron dense. Reduced levels of nuclear and mitochondrial transcripts in the retina and RPE of miniature Schnauzers affected with retinal dysplasia suggest that the pathogenesis of the disorder may arise from a lowered energy supply to the retina and RPE.

  6. Technical Report: Algorithm and Implementation for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McLoughlin, Kevin

    2016-01-11

    This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive featuremore » of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.« less

  7. Scar-less multi-part DNA assembly design automation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hillson, Nathan J.

    The present invention provides a method of a method of designing an implementation of a DNA assembly. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which to assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding flanking homology sequences to each of the DNA oligos. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which tomore » assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding optimized overhang sequences to each of the DNA oligos.« less

  8. Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.

    PubMed

    Ma, Eddie Y T; Ratnasingham, Sujeevan; Kremer, Stefan C

    2018-01-01

    This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).

  9. Simple methodology to directly genotype Trypanosoma cruzi discrete typing units in single and mixed infections from human blood samples.

    PubMed

    Bontempi, Iván A; Bizai, María L; Ortiz, Sylvia; Manattini, Silvia; Fabbro, Diana; Solari, Aldo; Diez, Cristina

    2016-09-01

    Different DNA markers to genotype Trypanosoma cruzi are now available. However, due to the low quantity of parasites present in biological samples, DNA markers with high copy number like kinetoplast minicircles are needed. The aim of this study was to complete a DNA assay called minicircle lineage specific-PCR (MLS-PCR) previously developed to genotype the T. cruzi DTUs TcV and TcVI, in order to genotype DTUs TcI and TcII and to improve TcVI detection. We screened kinetoplast minicircle hypervariable sequences from cloned PCR products from reference strains belonging to the mentioned DTUs using specific kDNA probes. With the four highly specific sequences selected, we designed primers to be used in the MLS-PCR to directly genotype T. cruzi from biological samples. High specificity and sensitivity were obtained when we evaluated the new approach for TcI, TcII, TcV and TcVI genotyping in twenty two T. cruzi reference strains. Afterward, we compared it with hybridization tests using specific kDNA probes in 32 blood samples from chronic chagasic patients from North Eastern Argentina. With both tests we were able to genotype 94% of the samples and the concordance between them was very good (kappa=0.855). The most frequent T. cruzi DTUs detected were TcV and TcVI, followed by TcII and much lower TcI. A unique T. cruzi DTU was detected in 18 samples meantime more than one in the remaining; being TcV and TcVI the most frequent association. A high percentage of mixed detections were obtained with both assays and its impact was discussed. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Dynamic DNA cytosine methylation in the Populus trichocarpa genome: tissue-level variation and relationship to gene expression

    PubMed Central

    2012-01-01

    Background DNA cytosine methylation is an epigenetic modification that has been implicated in many biological processes. However, large-scale epigenomic studies have been applied to very few plant species, and variability in methylation among specialized tissues and its relationship to gene expression is poorly understood. Results We surveyed DNA methylation from seven distinct tissue types (vegetative bud, male inflorescence [catkin], female catkin, leaf, root, xylem, phloem) in the reference tree species black cottonwood (Populus trichocarpa). Using 5-methyl-cytosine DNA immunoprecipitation followed by Illumina sequencing (MeDIP-seq), we mapped a total of 129,360,151 36- or 32-mer reads to the P. trichocarpa reference genome. We validated MeDIP-seq results by bisulfite sequencing, and compared methylation and gene expression using published microarray data. Qualitative DNA methylation differences among tissues were obvious on a chromosome scale. Methylated genes had lower expression than unmethylated genes, but genes with methylation in transcribed regions ("gene body methylation") had even lower expression than genes with promoter methylation. Promoter methylation was more frequent than gene body methylation in all tissues except male catkins. Male catkins differed in demethylation of particular transposable element categories, in level of gene body methylation, and in expression range of genes with methylated transcribed regions. Tissue-specific gene expression patterns were correlated with both gene body and promoter methylation. Conclusions We found striking differences among tissues in methylation, which were apparent at the chromosomal scale and when genes and transposable elements were examined. In contrast to other studies in plants, gene body methylation had a more repressive effect on transcription than promoter methylation. PMID:22251412

  11. Genetic analysis of Fasciola isolates from cattle in Korea based on second internal transcribed spacer (ITS-2) sequence of nuclear ribosomal DNA.

    PubMed

    Choe, Se-Eun; Nguyen, Thuy Thi-Dieu; Kang, Tae-Gyu; Kweon, Chang-Hee; Kang, Seung-Won

    2011-09-01

    Nuclear ribosomal DNA sequence of the second internal transcribed spacer (ITS-2) has been used efficiently to identify the liver fluke species collected from different hosts and various geographic regions. ITS-2 sequences of 19 Fasciola samples collected from Korean native cattle were determined and compared. Sequence comparison including ITS-2 sequences of isolates from this study and reference sequences from Fasciola hepatica and Fasciola gigantica and intermediate Fasciola in Genbank revealed seven identical variable sites of investigated isolates. Among 19 samples, 12 individuals had ITS-2 sequences completely identical to that of pure F. hepatica, five possessed the sequences identical to F. gigantica type, whereas two shared the sequence of both F. hepatica and F. gigantica. No variations in length and nucleotide composition of ITS-2 sequence were observed within isolates that belonged to F. hepatica or F. gigantica. At the position of 218, five Fasciola containing a single-base substitution (C>T) formed a distinct branch inside the F. gigantica-type group which was similar to those of Asian-origin isolates. The phylogenetic tree of the Fasciola spp. based on complete ITS-2 sequences from this study and other representative isolates in different locations clearly showed that pure F. hepatica, F. gigantica type and intermediate Fasciola were observed. The result also provided additional genetic evidence for the existence of three forms of Fasciola isolated from native cattle in Korea by genetic approach using ITS-2 sequence.

  12. Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier.

    PubMed

    Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Rao, A R

    2016-11-05

    DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays.

    PubMed

    Mak, Angel C Y; Lai, Yvonne Y Y; Lam, Ernest T; Kwok, Tsz-Piu; Leung, Alden K Y; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W C; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J K; Li, Catherine M L; Li, Jing-Woei; Yim, Aldrin K Y; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y; Xiao, Ming; Kwok, Pui-Yan

    2016-01-01

    Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. Copyright © 2016 by the Genetics Society of America.

  14. Current status and future perspectives on molecular and serological methods in diagnostic mycology.

    PubMed

    Lau, Anna; Chen, Sharon; Sleiman, Sue; Sorrell, Tania

    2009-11-01

    Invasive fungal infections are an important cause of infectious morbidity. Nonculture-based methods are increasingly used for rapid, accurate diagnosis to improve patient outcomes. New and existing DNA amplification platforms have high sensitivity and specificity for direct detection and identification of fungi in clinical specimens. Since laboratories are increasingly reliant on DNA sequencing for fungal identification, measures to improve sequence interpretation should support validation of reference isolates and quality control in public gene repositories. Novel technologies (e.g., isothermal and PNA FISH methods), platforms enabling high-throughput analyses (e.g., DNA microarrays and Luminex xMAP) and/or commercial PCR assays warrant further evaluation for routine diagnostic use. Notwithstanding the advantages of molecular tests, serological assays remain clinically useful for patient management. The serum Aspergillus galactomannan test has been incorporated into diagnostic algorithms of invasive aspergillosis. Both the galactomannan and the serum beta-D-glucan test have value for diagnosing infection and monitoring therapeutic response.

  15. DNA Microarray for Detection of Macrolide Resistance Genes

    PubMed Central

    Cassone, Marco; D'Andrea, Marco M.; Iannelli, Francesco; Oggioni, Marco R.; Rossolini, Gian Maria; Pozzi, Gianni

    2006-01-01

    A DNA microarray was developed to detect bacterial genes conferring resistance to macrolides and related antibiotics. A database containing 65 nonredundant genes selected from publicly available DNA sequences was constructed and used to design 100 oligonucleotide probes that could specifically detect and discriminate all 65 genes. Probes were spotted on a glass slide, and the array was reacted with DNA templates extracted from 20 reference strains of eight different bacterial species (Streptococcus pneumoniae, Streptococcus pyogenes, Enterococcus faecalis, Enterococcus faecium, Staphylococcus aureus, Staphylococcus haemolyticus, Escherichia coli, and Bacteroides fragilis) known to harbor 29 different macrolide resistance genes. Hybridization results showed that probes reacted with, and only with, the expected DNA templates and allowed discovery of three unexpected genes, including msr(SA) in B. fragilis, an efflux gene that has not yet been described for gram-negative bacteria. PMID:16723563

  16. A reference human genome dataset of the BGISEQ-500 sequencer.

    PubMed

    Huang, Jie; Liang, Xinming; Xuan, Yuankai; Geng, Chunyu; Li, Yuxiang; Lu, Haorong; Qu, Shoufang; Mei, Xianglin; Chen, Hongbo; Yu, Ting; Sun, Nan; Rao, Junhua; Wang, Jiahao; Zhang, Wenwei; Chen, Ying; Liao, Sha; Jiang, Hui; Liu, Xin; Yang, Zhaopeng; Mu, Feng; Gao, Shangxian

    2017-05-01

    BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform. © The Authors 2017. Published by Oxford University Press.

  17. Bridging two scholarly islands enriches both: COI DNA barcodes for species identification versus human mitochondrial variation for the study of migrations and pathologies.

    PubMed

    Thaler, David S; Stoeckle, Mark Y

    2016-10-01

    DNA barcodes for species identification and the analysis of human mitochondrial variation have developed as independent fields even though both are based on sequences from animal mitochondria. This study finds questions within each field that can be addressed by reference to the other. DNA barcodes are based on a 648-bp segment of the mitochondrially encoded cytochrome oxidase I. From most species, this segment is the only sequence available. It is impossible to know whether it fairly represents overall mitochondrial variation. For modern humans, the entire mitochondrial genome is available from thousands of healthy individuals. SNPs in the human mitochondrial genome are evenly distributed across all protein-encoding regions arguing that COI DNA barcode is representative. Barcode variation among related species is largely based on synonymous codons. Data on human mitochondrial variation support the interpretation that most - possibly all - synonymous substitutions in mitochondria are selectively neutral. DNA barcodes confirm reports of a low variance in modern humans compared to nonhuman primates. In addition, DNA barcodes allow the comparison of modern human variance to many other extant animal species. Birds are a well-curated group in which DNA barcodes are coupled with census and geographic data. Putting modern human variation in the context of intraspecies variation among birds shows humans to be a single breeding population of average variance.

  18. Mining metadata from unidentified ITS sequences in GenBank: A case study in Inocybe (Basidiomycota)

    PubMed Central

    2008-01-01

    Background The lack of reference sequences from well-identified mycorrhizal fungi often poses a challenge to the inference of taxonomic affiliation of sequences from environmental samples, and many environmental sequences are thus left unidentified. Such unidentified sequences belonging to the widely distributed ectomycorrhizal fungal genus Inocybe (Basidiomycota) were retrieved from GenBank and divided into species that were identified in a phylogenetic context using a reference dataset from an ongoing study of the genus. The sequence metadata of the unidentified Inocybe sequences stored in GenBank, as well as data from the corresponding original papers, were compiled and used to explore the ecology and distribution of the genus. In addition, the relative occurrence of Inocybe was contrasted to that of other mycorrhizal genera. Results Most species of Inocybe were found to have less than 3% intraspecific variability in the ITS2 region of the nuclear ribosomal DNA. This cut-off value was used jointly with phylogenetic analysis to delimit and identify unidentified Inocybe sequences to species level. A total of 177 unidentified Inocybe ITS sequences corresponding to 98 species were recovered, 32% of which were successfully identified to species level in this study. These sequences account for an unexpectedly large proportion of the publicly available unidentified fungal ITS sequences when compared with other mycorrhizal genera. Eight Inocybe species were reported from multiple hosts and some even from hosts forming arbutoid or orchid mycorrhizae. Furthermore, Inocybe sequences have been reported from four continents and in climate zones ranging from cold temperate to equatorial climate. Out of the 19 species found in more than one study, six were found in both Europe and North America and one was found in both Europe and Japan, indicating that at least many north temperate species have a wide distribution. Conclusion Although DNA-based species identification and circumscription are associated with practical and conceptual difficulties, they also offer new possibilities and avenues for research. Metadata assembly holds great potential to synthesize valuable information from community studies for use in a species and taxonomy-oriented framework. PMID:18282272

  19. Targeted genomic enrichment and sequencing of CyHV-3 from carp tissues confirms low nucleotide diversity and mixed genotype infections.

    PubMed

    Hammoumi, Saliha; Vallaeys, Tatiana; Santika, Ayi; Leleux, Philippe; Borzym, Ewa; Klopp, Christophe; Avarre, Jean-Christophe

    2016-01-01

    Koi herpesvirus disease (KHVD) is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3), also known as koi herpesvirus (KHV). Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984) as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×10 7 . The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity). By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3.

  20. Targeted genomic enrichment and sequencing of CyHV-3 from carp tissues confirms low nucleotide diversity and mixed genotype infections

    PubMed Central

    Hammoumi, Saliha; Vallaeys, Tatiana; Santika, Ayi; Leleux, Philippe; Borzym, Ewa; Klopp, Christophe

    2016-01-01

    Koi herpesvirus disease (KHVD) is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3), also known as koi herpesvirus (KHV). Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984) as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×107. The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity). By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3. PMID:27703859

  1. Test of synthetic DNA tracers in a periodic hydrodynamic system for time-variable transit time distribution assessment

    NASA Astrophysics Data System (ADS)

    Dahlke, H. E.; Wang, C.; McNew, C.; McLaughlin, S.; Lyon, S. W.

    2016-12-01

    Recent research on time-varying transport through hydrologic systems proposed using decomposed over-printed tracer breakthrough curves to directly observe transport through complex flow systems. This method, also known as the PERTH (Periodic Tracer Hierarchy) method requires periodic flow and multiple tracer injections to reveal changes in flow pathways and transport behavior. Time-variable transit time distributions (TTD) estimated from tracer breakthrough curves often vary with the storage state of the system, which in turn is influenced by internal and external variabilities, such as the arrangement of flow pathways and fluctuations in system inputs. Deciphering internal from external variabilities in TTDs might help to advance the use of TTDs for estimating the physical state of a system; however, thus far the finite number of unique conservative tracers available for tracing has limited deeper insights. Synthetic DNA tracers consisting of short strands of synthetic DNA encapsulated by polylactic acid (PLA) microspheres could potentially provide multiple unique tracers with identical transport properties needed to explore time varying transport through hydrologic systems in more detail. An experiment was conducted on the miniLeo hillslope, a 1 m3 sloping lysimeter, within the Biosphere 2 Landscape Evolution Observatory near Tucson, AZ to investigate transit time variability. The goal of the experiment was to 1) test the suitability of using synthetic DNA tracers for estimating TTDs in a hydrologic system and 2) to determine the TTDs of individual tracer pulses under periodic steady-state conditions. Five DNA tracers, consisting of four unique, encapsulated DNA sequences and one free/non-encapsulated DNA sequence, were applied as reference and probe tracers together with deuterium, using the PERTH method. The lysimeter received three 2-hour pulses of rainfall at a rate of 30 mm/hr for 10 days. Initial results show that both the encapsulated and free DNA tracers were successfully transported in a pulsed manner through the system, but had overall longer breakthrough times than the reference deuterium tracer. Comparison of the DNA probe tracers indicate differences in transit times, likely related to differences in tracer mobilization in response to the time-variant rainfall input.

  2. Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

    DOEpatents

    Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S

    2013-06-25

    A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.

  3. Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome

    PubMed Central

    2010-01-01

    Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar), but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST) resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius) ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate. PMID:20433749

  4. Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

    DOEpatents

    Gardner, Shea N [San Leandro, CA; Mariella, Jr., Raymond P.; Christian, Allen T [Tracy, CA; Young, Jennifer A [Berkeley, CA; Clague, David S [Livermore, CA

    2011-01-18

    A method of fabricating a DNA molecule of user-defined sequence. The method comprises the steps of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an even or odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths. In one embodiment starting sequence fragments are of different lengths, n, n+1, n+2, etc.

  5. DNA Barcode Analysis of Thrips (Thysanoptera) Diversity in Pakistan Reveals Cryptic Species Complexes.

    PubMed

    Iftikhar, Romana; Ashfaq, Muhammad; Rasool, Akhtar; Hebert, Paul D N

    2016-01-01

    Although thrips are globally important crop pests and vectors of viral disease, species identifications are difficult because of their small size and inconspicuous morphological differences. Sequence variation in the mitochondrial COI-5' (DNA barcode) region has proven effective for the identification of species in many groups of insect pests. We analyzed barcode sequence variation among 471 thrips from various plant hosts in north-central Pakistan. The Barcode Index Number (BIN) system assigned these sequences to 55 BINs, while the Automatic Barcode Gap Discovery detected 56 partitions, a count that coincided with the number of monophyletic lineages recognized by Neighbor-Joining analysis and Bayesian inference. Congeneric species showed an average of 19% sequence divergence (range = 5.6% - 27%) at COI, while intraspecific distances averaged 0.6% (range = 0.0% - 7.6%). BIN analysis suggested that all intraspecific divergence >3.0% actually involved a species complex. In fact, sequences for three major pest species (Haplothrips reuteri, Thrips palmi, Thrips tabaci), and one predatory thrips (Aeolothrips intermedius) showed deep intraspecific divergences, providing evidence that each is a cryptic species complex. The study compiles the first barcode reference library for the thrips of Pakistan, and examines global haplotype diversity in four important pest thrips.

  6. A rare variant of the mtDNA HVS1 sequence in the hairs of Napoléon's family.

    PubMed

    Lucotte, Gérard

    2010-10-04

    This paper describes the finding of a rare variant in the sequence of the hypervariable segment (HVS1) of mitochondrial (mtDNA) extracted from two preserved hairs, authenticated as belonging to the French Emperor Napoléon I (Napoléon Bonaparte). This rare variant is a mutation that changes the base C to T at position 16,184 (16184C→T), and it constitutes the only mutation found in this HVS1 sequence. This mutation is rare, because it was not found in a reference database (P < 0.05). In a personal database (M. Pala) comprising 37,000 different sequences, the 16184C→T mutation was found in only three samples, thus in this database the mutation frequency was 0.00008%. This mutation 16184C→T was also the only variant found subsequently in the HVS1 sequences of mtDNAs extracted from Napoléon's mother (Letizia) and from his youngest sister (Caroline), confirming that this mutation is maternally inherited. This 16184C→T variant could be used for genetic verification to authenticate any doubtful material and determine whether it should indeed be attributed to Napoléon.

  7. A rare variant of the mtDNA HVS1 sequence in the hairs of Napoléon's family

    PubMed Central

    2010-01-01

    This paper describes the finding of a rare variant in the sequence of the hypervariable segment (HVS1) of mitochondrial (mtDNA) extracted from two preserved hairs, authenticated as belonging to the French Emperor Napoléon I (Napoléon Bonaparte). This rare variant is a mutation that changes the base C to T at position 16,184 (16184C→T), and it constitutes the only mutation found in this HVS1 sequence. This mutation is rare, because it was not found in a reference database (P < 0.05). In a personal database (M. Pala) comprising 37,000 different sequences, the 16184C→T mutation was found in only three samples, thus in this database the mutation frequency was 0.00008%. This mutation 16184C→T was also the only variant found subsequently in the HVS1 sequences of mtDNAs extracted from Napoléon's mother (Letizia) and from his youngest sister (Caroline), confirming that this mutation is maternally inherited. This 16184C→T variant could be used for genetic verification to authenticate any doubtful material and determine whether it should indeed be attributed to Napoléon. PMID:21092341

  8. Iterative dictionary construction for compression of large DNA data sets.

    PubMed

    Kuruppu, Shanika; Beresford-Smith, Bryan; Conway, Thomas; Zobel, Justin

    2012-01-01

    Genomic repositories increasingly include individual as well as reference sequences, which tend to share long identical and near-identical strings of nucleotides. However, the sequential processing used by most compression algorithms, and the volumes of data involved, mean that these long-range repetitions are not detected. An order-insensitive, disk-based dictionary construction method can detect this repeated content and use it to compress collections of sequences. We explore a dictionary construction method that improves repeat identification in large DNA data sets. Our adaptation, COMRAD, of an existing disk-based method identifies exact repeated content in collections of sequences with similarities within and across the set of input sequences. COMRAD compresses the data over multiple passes, which is an expensive process, but allows COMRAD to compress large data sets within reasonable time and space. COMRAD allows for random access to individual sequences and subsequences without decompressing the whole data set. COMRAD has no competitor in terms of the size of data sets that it can compress (extending to many hundreds of gigabytes) and, even for smaller data sets, the results are competitive compared to alternatives; as an example, 39 S. cerevisiae genomes compressed to 0.25 bits per base.

  9. Extensive gene conversion at the PMS2 DNA mismatch repair locus.

    PubMed

    Hayward, Bruce E; De Vos, Michel; Valleley, Elizabeth M A; Charlton, Ruth S; Taylor, Graham R; Sheridan, Eamonn; Bonthron, David T

    2007-05-01

    Mutations of the PMS2 DNA repair gene predispose to a characteristic range of malignancies, with either childhood onset (when both alleles are mutated) or a partially penetrant adult onset (if heterozygous). These mutations have been difficult to detect, due to interference from a family of pseudogenes located on chromosome 7. One of these, the PMS2CL pseudogene, lies within a 100-kb inverted duplication (inv dup), 700 kb centromeric to PMS2 itself on 7p22. Here, we show that the reference genomic sequences cannot be relied upon to distinguish PMS2 from PMS2CL, because of sequence transfer between the two loci. The 7p22 inv dup occurred prior to the divergence of modern ape species (15 million years ago [Mya]), but has undergone extensive sequence homogenization. This process appears to be ongoing, since there is considerable allelic diversity within the duplicated region, much of it derived from sequence exchange between PMS2 and PMS2CL. This sequence diversity can result in both false-positive and false-negative mutation analysis at this locus. Great caution is still needed in the design and interpretation of PMS2 mutation screens. 2007 Wiley-Liss, Inc.

  10. Preliminary Identification and Typing of Pathogenic and Toxigenic Fusarium Species Using Restriction Digestion of ITS1-5.8S rDNA-ITS2 Region.

    PubMed

    Mirhendi, H; Ghiasian, A; Vismer, Hf; Asgary, Mr; Jalalizand, N; Arendrup, Mc; Makimura, K

    2010-01-01

    Fusarium species are capable of causing a wide range of crop plants infections as well as uncommon human infections. Many species of the genus produce mycotoxins, which are responsible for acute or chronic diseases in animals and humans. Identification of Fusaria to the species level is necessary for biological, epidemiological, pathological, and toxicological purposes. In this study, we undertook a computer-based analysis of ITS1-5.8SrDNA-ITS2 in 192 GenBank sequences from 36 Fusarium species to achieve data for establishing a molecular method for specie-specific identification. Sequence data and 610 restriction enzymes were analyzed for choosing RFLP profiles, and subsequently designed and validated a PCR-restriction enzyme system for identification and typing of species. DNA extracted from 32 reference strains of 16 species were amplified using ITS1 and ITS4 universal primers followed by sequencing and restriction enzyme digestion of PCR products. The following 3 restriction enzymes TasI, ItaI and CfoI provide the best discriminatory power. Using ITS1 and ITS4 primers a product of approximately 550bp was observed for all Fusarium strains, as expected regarding the sequence analyses. After RFLP of the PCR products, some species were definitely identified by the method and some strains had different patterns in same species. Our profile has potential not only for identification of species, but also for genotyping of strains. On the other hand, some Fusarium species were 100% identical in their ITS-5.8SrDNA-ITS2 sequences, therefore differentiation of these species is impossible regarding this target alone. ITS-PCR-RFLP method might be useful for preliminary differentiation and typing of most common Fusarium species.

  11. A comprehensive DNA barcode database for Central European beetles with a focus on Germany: adding more than 3500 identified species to BOLD.

    PubMed

    Hendrich, Lars; Morinière, Jérôme; Haszprunar, Gerhard; Hebert, Paul D N; Hausmann, Axel; Köhler, Frank; Balke, Michael

    2015-07-01

    Beetles are the most diverse group of animals and are crucial for ecosystem functioning. In many countries, they are well established for environmental impact assessment, but even in the well-studied Central European fauna, species identification can be very difficult. A comprehensive and taxonomically well-curated DNA barcode library could remedy this deficit and could also link hundreds of years of traditional knowledge with next generation sequencing technology. However, such a beetle library is missing to date. This study provides the globally largest DNA barcode reference library for Coleoptera for 15 948 individuals belonging to 3514 well-identified species (53% of the German fauna) with representatives from 97 of 103 families (94%). This study is the first comprehensive regional test of the efficiency of DNA barcoding for beetles with a focus on Germany. Sequences ≥500 bp were recovered from 63% of the specimens analysed (15 948 of 25 294) with short sequences from another 997 specimens. Whereas most specimens (92.2%) could be unambiguously assigned to a single known species by sequence diversity at CO1, 1089 specimens (6.8%) were assigned to more than one Barcode Index Number (BIN), creating 395 BINs which need further study to ascertain if they represent cryptic species, mitochondrial introgression, or simply regional variation in widespread species. We found 409 specimens (2.6%) that shared a BIN assignment with another species, most involving a pair of closely allied species as 43 BINs were involved. Most of these taxa were separated by barcodes although sequence divergences were low. Only 155 specimens (0.97%) show identical or overlapping clusters. © 2014 John Wiley & Sons Ltd.

  12. Beyond the Colours: Discovering Hidden Diversity in the Nymphalidae of the Yucatan Peninsula in Mexico through DNA Barcoding

    PubMed Central

    Prado, Blanca R.; Pozo, Carmen; Valdez-Moreno, Martha; Hebert, Paul D. N.

    2011-01-01

    Background Recent studies have demonstrated the utility of DNA barcoding in the discovery of overlooked species and in the connection of immature and adult stages. In this study, we use DNA barcoding to examine diversity patterns in 121 species of Nymphalidae from the Yucatan Peninsula in Mexico. Our results suggest the presence of cryptic species in 8 of these 121 taxa. As well, the reference database derived from the analysis of adult specimens allowed the identification of nymphalid caterpillars providing new details on host plant use. Methodology/Principal Findings We gathered DNA barcode sequences from 857 adult Nymphalidae representing 121 different species. This total includes four species (Adelpha iphiclus, Adelpha malea, Hamadryas iphtime and Taygetis laches) that were initially overlooked because of their close morphological similarity to other species. The barcode results showed that each of the 121 species possessed a diagnostic array of barcode sequences. In addition, there was evidence of cryptic taxa; seven species included two barcode clusters showing more than 2% sequence divergence while one species included three clusters. All 71 nymphalid caterpillars were identified to a species level by their sequence congruence to adult sequences. These caterpillars represented 16 species, and included Hamadryas julitta, an endemic species from the Yucatan Peninsula whose larval stages and host plant (Dalechampia schottii, also endemic to the Yucatan Peninsula) were previously unknown. Conclusions/Significance This investigation has revealed overlooked species in a well-studied museum collection of nymphalid butterflies and suggests that there is a substantial incidence of cryptic species that await full characterization. The utility of barcoding in the rapid identification of caterpillars also promises to accelerate the assembly of information on life histories, a particularly important advance for hyperdiverse tropical insect assemblages. PMID:22132140

  13. Bartonella vinsonii subsp. berkhoffii endocarditis in a dog from Saskatchewan

    PubMed Central

    Cockwill, Ken R.; Taylor, Susan M.; Philibert, Helene M.; Breitschwerdt, Edward B.; Maggi, Ricardo G.

    2007-01-01

    A dog referred for lameness was diagnosed with culture-negative endocarditis. Antibodies to Bartonella spp. were detected. Antibiotic treatment resulted in transient clinical improvement, but the dog developed cardiac failure and was euthanized. Bartonella vinsonii subsp. berkhoffii genotype IV was identified within the aortic heart valve lesions by PCR amplification and DNA sequencing. PMID:17824328

  14. Comments on papers by Reichert and Wong. [information theory for molecular taxonomy

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Papers of Reichert and Wong (1971) and Reichert (1972) are referred to as erroneous in stating that in their work with cytochrome c sequences they demonstrated that vertebrates have achieved a higher organized genetic message with a higher fidelity in the DNA-to-protein information processing channel. Arguments are given to contest that statement.

  15. Metagenomic ventures into outer sequence space.

    PubMed

    Dutilh, Bas E

    Sequencing DNA or RNA directly from the environment often results in many sequencing reads that have no homologs in the database. These are referred to as "unknowns," and reflect the vast unexplored microbial sequence space of our biosphere, also known as "biological dark matter." However, unknowns also exist because metagenomic datasets are not optimally mined. There is a pressure on researchers to publish and move on, and the unknown sequences are often left for what they are, and conclusions drawn based on reads with annotated homologs. This can cause abundant and widespread genomes to be overlooked, such as the recently discovered human gut bacteriophage crAssphage. The unknowns may be enriched for bacteriophage sequences, the most abundant and genetically diverse component of the biosphere and of sequence space. However, it remains an open question, what is the actual size of biological sequence space? The de novo assembly of shotgun metagenomes is the most powerful tool to address this question.

  16. XS: a FASTQ read simulator.

    PubMed

    Pratas, Diogo; Pinho, Armando J; Rodrigues, João M O S

    2014-01-16

    The emerging next-generation sequencing (NGS) is bringing, besides the natural huge amounts of data, an avalanche of new specialized tools (for analysis, compression, alignment, among others) and large public and private network infrastructures. Therefore, a direct necessity of specific simulation tools for testing and benchmarking is rising, such as a flexible and portable FASTQ read simulator, without the need of a reference sequence, yet correctly prepared for producing approximately the same characteristics as real data. We present XS, a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores). XS provides an efficient and convenient method for fast simulation of FASTQ files, such as those from Ion Torrent (currently uncovered by other simulators), Roche-454, Illumina and ABI-SOLiD sequencing machines. This tool is publicly available at http://bioinformatics.ua.pt/software/xs/.

  17. Unlabeled probes for the detection and typing of herpes simplex virus.

    PubMed

    Dames, Shale; Pattison, David C; Bromley, L Kathryn; Wittwer, Carl T; Voelkerding, Karl V

    2007-10-01

    Unlabeled probe detection with a double-stranded DNA (dsDNA) binding dye is one method to detect and confirm target amplification after PCR. Unlabeled probes and amplicon melting have been used to detect small deletions and single-nucleotide polymorphisms in assays where template is in abundance. Unlabeled probes have not been applied to low-level target detection, however. Herpes simplex virus (HSV) was chosen as a model to compare the unlabeled probe method to an in-house reference assay using dual-labeled, minor groove binding probes. A saturating dsDNA dye (LCGreen Plus) was used for real-time PCR. HSV-1, HSV-2, and an internal control were differentiated by PCR amplicon and unlabeled probe melting analysis after PCR. The unlabeled probe technique displayed 98% concordance with the reference assay for the detection of HSV from a variety of archived clinical samples (n = 182). HSV typing using unlabeled probes was 99% concordant (n = 104) to sequenced clinical samples and allowed for the detection of sequence polymorphisms in the amplicon and under the probe. Unlabeled probes and amplicon melting can be used to detect and genotype as few as 10 copies of target per reaction, restricted only by stochastic limitations. The use of unlabeled probes provides an attractive alternative to conventional fluorescence-labeled, probe-based assays for genotyping and detection of HSV and might be useful for other low-copy targets where typing is informative.

  18. Beta-haemolytic group A streptococci emm75 carrying altered pyrogenic exotoxin A linked to scarlet fever in adults.

    PubMed

    Dong, Hongjun; Xu, Guozhang; Li, Shuhua; Song, Qifa; Liu, Shijian; Lin, Hui; Chai, Yibiao; Zhou, Aimin; Fang, Ting; Zhang, Hongwei; Jin, Chunguang; Lu, Wei; Cao, Guangwen

    2008-04-01

    To determine the etiological cause of a food-borne outbreak of scarlet fever in adults. Swabs from the throats of the patients and asymptomatic control were cultured on blood agar plates individually. Biochemical identification of all isolates was performed with a VITEX automated system. Antibiotic susceptibility was examined by using the Kirby-Bauer disc diffusion method. emm gene and extracellular pyrogenic exotoxins of each isolate were amplified by using polymerase chain reaction and subjected to DNA sequencing. Sequence differences between the isolated and the highly similar reference sequences were compared on BLAST. Bioinformatics was used to predict protein structures. Beta-haemolytic group A streptococci (GAS) emm75 were identified from 10 of 13 available patients. The isolates were susceptible to penicillin, ampicillin, vancomycin, cefatriaxone, ofloxacin, linezolid and quinupristin. All of the isolates carried pyrogenic exotoxin A (speA) and cysteine protease (speB). Isolated speA was phylogenetically different from 30 highly similar references on BLAST. Differences in the primary sequence of the deduced protein were 14.37-20.12% between the speA and each of 11 references. Secondary protein structure of the speA was different from the references at the N-terminal. GAS emm75 encoding altered speA was responsible for the food-borne outbreak of scarlet fever in adults.

  19. Gentle Masking of Low-Complexity Sequences Improves Homology Search

    PubMed Central

    Frith, Martin C.

    2011-01-01

    Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with “gentle” masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is , where is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to “harsh” masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search. PMID:22205972

  20. Analysis of protein-coding genetic variation in 60,706 humans.

    PubMed

    Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G

    2016-08-18

    Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

  1. Compression of next-generation sequencing quality scores using memetic algorithm

    PubMed Central

    2014-01-01

    Background The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. Results In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. Conclusions The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files. PMID:25474747

  2. First report of human parvovirus 4 detection in Iran.

    PubMed

    Asiyabi, Sanaz; Nejati, Ahmad; Shoja, Zabihollah; Shahmahmoodi, Shohreh; Jalilvand, Somayeh; Farahmand, Mohammad; Gorzin, Ali-Akbar; Najafi, Alireza; Haji Mollahoseini, Mostafa; Marashi, Sayed Mahdi

    2016-08-01

    Parvovirus 4 (PARV4) is an emerging and intriguing virus that currently received many attentions. High prevalence of PARV4 infection in high-risk groups such as HIV infected patients highlights the potential clinical outcomes that this virus might have. Molecular techniques were used to determine both the presence and the genotype of circulating PARV4 on previously collected serum samples from 133 HIV infected patients and 120 healthy blood donors. Nested PCR was applied to assess the presence of PARV4 DNA genome in both groups. PARV4 DNA was detected in 35.3% of HIV infected patients compared to 16.6% healthy donors. To genetically characterize the PARV4 genotype in these groups, positive samples were randomly selected and subjected for sequencing and phylogenetic analysis. All PARV4 sequences were found to be genotype 1 and clustered with the reference sequences of PARV4 genotype 1. J. Med. Virol. 88:1314-1318, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  3. Characteristic component profiling and identification of different Uncaria species based on high-performance liquid chromatography-photodiode array detection tandem ion trap and time of flight mass spectrometry coupled with rDNA ITS sequence.

    PubMed

    Zhao, Bingqiang; Huang, Yanjun; Chen, Qiulan; Chen, Qizhao; Miao, Hui; Zhu, Shuang; Zeng, Changqing

    2018-03-01

    Uncaria is a multi-source herb and its species identification has become a bottleneck in quality control. To study the identification method of different Uncaria species herbs through HPLC-MS coupled with rDNA Internal Transcribed Spacer (rDNA ITS) sequence, both plant morphological traits and molecular identification were used to determine the species of every collected Uncaria herb. The genetic analysis of different Uncaria species was performed using their rDNA ITS sequence as a molecular marker. Meanwhile, the phylogenetic relationships of 22 samples from six Uncaria species were divided and classified clearly. By optimizing the chromatographic conditions, a practical HPLC method to differentiate various varieties of Uncaria herbs was set up based on a set of characteristic components across each species. A high-performance liquid chromatography-photodiode array detector tandem ion trap and time of flight mass spectrometry technique combined with reference substances was utilized to derive 21 characteristic compounds containing six groups of six Uncaria species in China. Thus, this study provides a feasible method to solve the current problem of confusion in Uncaria species, and makes a significant step forward in the appropriate clinical use, in-depth research and further utilization of different Uncaria species. Copyright © 2017 John Wiley & Sons, Ltd.

  4. COBRA-Seq: Sensitive and Quantitative Methylome Profiling

    PubMed Central

    Varinli, Hilal; Statham, Aaron L.; Clark, Susan J.; Molloy, Peter L.; Ross, Jason P.

    2015-01-01

    Combined Bisulfite Restriction Analysis (COBRA) quantifies DNA methylation at a specific locus. It does so via digestion of PCR amplicons produced from bisulfite-treated DNA, using a restriction enzyme that contains a cytosine within its recognition sequence, such as TaqI. Here, we introduce COBRA-seq, a genome wide reduced methylome method that requires minimal DNA input (0.1–1.0 μg) and can either use PCR or linear amplification to amplify the sequencing library. Variants of COBRA-seq can be used to explore CpG-depleted as well as CpG-rich regions in vertebrate DNA. The choice of enzyme influences enrichment for specific genomic features, such as CpG-rich promoters and CpG islands, or enrichment for less CpG dense regions such as enhancers. COBRA-seq coupled with linear amplification has the additional advantage of reduced PCR bias by producing full length fragments at high abundance. Unlike other reduced representative methylome methods, COBRA-seq has great flexibility in the choice of enzyme and can be multiplexed and tuned, to reduce sequencing costs and to interrogate different numbers of sites. Moreover, COBRA-seq is applicable to non-model organisms without the reference genome and compatible with the investigation of non-CpG methylation by using restriction enzymes containing CpA, CpT, and CpC in their recognition site. PMID:26512698

  5. The genome of Eimeria spp., with special reference to Eimeria tenella--a coccidium from the chicken.

    PubMed

    Shirley, M W

    2000-04-10

    Eimeria spp. contain at least four genomes. The nuclear genome is best studied in the avian species Eimeria tenella and comprises about 60 Mbp DNA contained within ca. 14 chromosomes; other avian and lupine species appear to possess a nuclear genome of similar size. In addition, sequence data and hybridisation studies have provided direct evidence for extrachromosomal mitochondrial and plastid DNA genomes, and double-stranded RNA segments have also been described. The unique phenotype of "precocious" development that characterises some selected lines of Eimeria spp. not only provides the basis for the first generation of live attenuated vaccines, but offers a significant entrée into studies on the regulation of an apicomplexan life-cycle. With a view to identifying loci implicated in the trait of precocious development, a genetic linkage map of the genome of E. tenella is being constructed in this laboratory from analyses of the inheritance of over 400 polymorphic DNA markers in the progeny of a cross between complementary drug-resistant and precocious parents. Other projects that impinge directly or indirectly on the genome and/or genetics of Eimeria spp. are currently in progress in several laboratories, and include the derivation of expressed sequence tag data and the development of ancillary technologies such as transfection techniques. No large-scale genomic DNA sequencing projects have been reported.

  6. Defining operational taxonomic units using DNA barcode data.

    PubMed

    Blaxter, Mark; Mann, Jenna; Chapman, Tom; Thomas, Fran; Whitton, Claire; Floyd, Robin; Abebe, Eyualem

    2005-10-29

    The scale of diversity of life on this planet is a significant challenge for any scientific programme hoping to produce a complete catalogue, whatever means is used. For DNA barcoding studies, this difficulty is compounded by the realization that any chosen barcode sequence is not the gene 'for' speciation and that taxa have evolutionary histories. How are we to disentangle the confounding effects of reticulate population genetic processes? Using the DNA barcode data from meiofaunal surveys, here we discuss the benefits of treating the taxa defined by barcodes without reference to their correspondence to 'species', and suggest that using this non-idealist approach facilitates access to taxon groups that are not accessible to other methods of enumeration and classification. Major issues remain, in particular the methodologies for taxon discrimination in DNA barcode data.

  7. DNA and histone methylation in gastric carcinogenesis

    PubMed Central

    Calcagno, Danielle Queiroz; Gigek, Carolina Oliveira; Chen, Elizabeth Suchi; Burbano, Rommel Rodriguez; Smith, Marília de Arruda Cardoso

    2013-01-01

    Epigenetic alterations contribute significantly to the development and progression of gastric cancer, one of the leading causes of cancer death worldwide. Epigenetics refers to the number of modifications of the chromatin structure that affect gene expression without altering the primary sequence of DNA, and these changes lead to transcriptional activation or silencing of the gene. Over the years, the study of epigenetic processes has increased, and novel therapeutic approaches that target DNA methylation and histone modifications have emerged. A greater understanding of epigenetics and the therapeutic potential of manipulating these processes is necessary for gastric cancer treatment. Here, we review recent research on the effects of aberrant DNA and histone methylation on the onset and progression of gastric tumors and the development of compounds that target enzymes that regulate the epigenome. PMID:23482412

  8. A comparison of honey bee-collected pollen from working agricultural lands using light microscopy and ITS metabarcoding

    USGS Publications Warehouse

    Smart, Matthew; Cornman, Robert S.; Iwanowicz, Deborah; McDermott-Kubeczko, Margaret; Pettis, Jeff S; Spivak, Marla S; Otto, Clint R.

    2017-01-01

    Taxonomic identification of pollen has historically been accomplished via light microscopy but requires specialized knowledge and reference collections, particularly when identification to lower taxonomic levels is necessary. Recently, next-generation sequencing technology has been used as a cost-effective alternative for identifying bee-collected pollen; however, this novel approach has not been tested on a spatially or temporally robust number of pollen samples. Here, we compare pollen identification results derived from light microscopy and DNA sequencing techniques with samples collected from honey bee colonies embedded within a gradient of intensive agricultural landscapes in the Northern Great Plains throughout the 2010–2011 growing seasons. We demonstrate that at all taxonomic levels, DNA sequencing was able to discern a greater number of taxa, and was particularly useful for the identification of infrequently detected species. Importantly, substantial phenological overlap did occur for commonly detected taxa using either technique, suggesting that DNA sequencing is an appropriate, and enhancing, substitutive technique for accurately capturing the breadth of bee-collected species of pollen present across agricultural landscapes. We also show that honey bees located in high and low intensity agricultural settings forage on dissimilar plants, though with overlap of the most abundantly collected pollen taxa. We highlight practical applications of utilizing sequencing technology, including addressing ecological issues surrounding land use, climate change, importance of taxa relative to abundance, and evaluating the impact of conservation program habitat enhancement efforts.

  9. Diversity of Basidiomycetes in Michigan Agricultural Soils▿

    PubMed Central

    Lynch, Michael D. J.; Thorn, R. Greg

    2006-01-01

    We analyzed the communities of soil basidiomycetes in agroecosystems that differ in tillage history at the Kellogg Biological Station Long-Term Ecological Research site near Battle Creek, Michigan. The approach combined soil DNA extraction through a bead-beating method modified to increase recovery of fungal DNA, PCR amplification with basidiomycete-specific primers, cloning and restriction fragment length polymorphism screening of mixed PCR products, and sequencing of unique clones. Much greater diversity was detected than was anticipated in this habitat on the basis of culture-based methods or surveys of fruiting bodies. With “species” defined as organisms yielding PCR products with ≥99% identity in the 5′ 650 bases of the nuclear large-subunit ribosomal DNA, 241 “species” were detected among 409 unique basidiomycete sequences recovered. Almost all major clades of basidiomycetes from basidiomycetous yeasts and other heterobasidiomycetes through polypores and euagarics (gilled mushrooms and relatives) were represented, with a majority from the latter clade. Only 24 of 241 “species” had 99% or greater sequence similarity to named reference sequences in GenBank, and several clades with multiple “species” could not be identified at the genus level by phylogenetic comparisons with named sequences. The total estimated “species” richness for this 11.2-ha site was 367 “species” of basidiomycetes. Since >99% of the study area has not been sampled, the accuracy of our diversity estimate is uncertain. Replication in time and space is required to detect additional diversity and the underlying community structure. PMID:16950900

  10. Relative quantification in seed GMO analysis: state of art and bottlenecks.

    PubMed

    Chaouachi, Maher; Bérard, Aurélie; Saïd, Khaled

    2013-06-01

    Reliable quantitative methods are needed to comply with current EU regulations on the mandatory labeling of genetically modified organisms (GMOs) and GMO-derived food and feed products with a minimum GMO content of 0.9 %. The implementation of EU Commission Recommendation 2004/787/EC on technical guidance for sampling and detection which meant as a helpful tool for the practical implementation of EC Regulation 1830/2003, which states that "the results of quantitative analysis should be expressed as the number of target DNA sequences per target taxon specific sequences calculated in terms of haploid genomes". This has led to an intense debate on the type of calibrator best suitable for GMO quantification. The main question addressed in this review is whether reference materials and calibrators should be matrix based or whether pure DNA analytes should be used for relative quantification in GMO analysis. The state of the art, including the advantages and drawbacks, of using DNA plasmid (compared to genomic DNA reference materials) as calibrators, is widely described. In addition, the influence of the genetic structure of seeds on real-time PCR quantitative results obtained for seed lots is discussed. The specific composition of a seed kernel, the mode of inheritance, and the ploidy level ensure that there is discordance between a GMO % expressed as a haploid genome equivalent and a GMO % based on numbers of seeds. This means that a threshold fixed as a percentage of seeds cannot be used as such for RT-PCR. All critical points that affect the expression of the GMO content in seeds are discussed in this paper.

  11. Training alignment parameters for arbitrary sequencers with LAST-TRAIN.

    PubMed

    Hamada, Michiaki; Ono, Yukiteru; Asai, Kiyoshi; Frith, Martin C

    2017-03-15

    LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. the source code is freely available at http://last.cbrc.jp/. mhamada@waseda.jp or mcfrith@edu.k.u-tokyo.ac.jp. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  12. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses.

    PubMed

    Paez-Espino, David; Chen, I-Min A; Palaniappan, Krishna; Ratner, Anna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Huang, Jinghua; Markowitz, Victor M; Nielsen, Torben; Huntemann, Marcel; K Reddy, T B; Pavlopoulos, Georgios A; Sullivan, Matthew B; Campbell, Barbara J; Chen, Feng; McMahon, Katherine; Hallam, Steve J; Denef, Vincent; Cavicchioli, Ricardo; Caffrey, Sean M; Streit, Wolfgang R; Webster, John; Handley, Kim M; Salekdeh, Ghasem H; Tsesmetzis, Nicolas; Setubal, Joao C; Pope, Phillip B; Liu, Wen-Tso; Rivers, Adam R; Ivanova, Natalia N; Kyrpides, Nikos C

    2017-01-04

    Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Accuracy of the high-throughput amplicon sequencing to identify species within the genus Aspergillus.

    PubMed

    Lee, Seungeun; Yamamoto, Naomichi

    2015-12-01

    This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  14. Development of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing

    PubMed Central

    Dasgupta, Modhumita Ghosh; Dharanishanthi, Veeramuthu; Agarwal, Ishangi; Krutovsky, Konstantin V.

    2015-01-01

    The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family– based QTL and association analysis in Eucalyptus. PMID:25602379

  15. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    NASA Astrophysics Data System (ADS)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  16. Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II

    PubMed Central

    Norman, Paul J.; Norberg, Steven J.; Guethlein, Lisbeth A.; Nemat-Gorgani, Neda; Royce, Thomas; Wroblewski, Emily E.; Dunn, Tamsen; Mann, Tobias; Alicata, Claudia; Hollenbach, Jill A.; Chang, Weihua; Shults Won, Melissa; Gunderson, Kevin L.; Abi-Rached, Laurent; Ronaghi, Mostafa; Parham, Peter

    2017-01-01

    The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome. PMID:28360230

  17. A DNA Barcode Library for North American Pyraustinae (Lepidoptera: Pyraloidea: Crambidae).

    PubMed

    Yang, Zhaofu; Landry, Jean-François; Hebert, Paul D N

    2016-01-01

    Although members of the crambid subfamily Pyraustinae are frequently important crop pests, their identification is often difficult because many species lack conspicuous diagnostic morphological characters. DNA barcoding employs sequence diversity in a short standardized gene region to facilitate specimen identifications and species discovery. This study provides a DNA barcode reference library for North American pyraustines based upon the analysis of 1589 sequences recovered from 137 nominal species, 87% of the fauna. Data from 125 species were barcode compliant (>500bp, <1% n), and 99 of these taxa formed a distinct cluster that was assigned to a single BIN. The other 26 species were assigned to 56 BINs, reflecting frequent cases of deep intraspecific sequence divergence and a few instances of barcode sharing, creating a total of 155 BINs. Two systems for OTU designation, ABGD and BIN, were examined to check the correspondence between current taxonomy and sequence clusters. The BIN system performed better than ABGD in delimiting closely related species, while OTU counts with ABGD were influenced by the value employed for relative gap width. Different species with low or no interspecific divergence may represent cases of unrecognized synonymy, whereas those with high intraspecific divergence require further taxonomic scrutiny as they may involve cryptic diversity. The barcode library developed in this study will also help to advance understanding of relationships among species of Pyraustinae.

  18. The First Chameleon Transcriptome: Comparative Genomic Analysis of the OXPHOS System Reveals Loss of COX8 in Iguanian Lizards

    PubMed Central

    Bar-Yaacov, Dan; Bouskila, Amos; Mishmar, Dan

    2013-01-01

    Recently, we found dramatic mitochondrial DNA divergence of Israeli Chamaeleo chamaeleon populations into two geographically distinct groups. We aimed to examine whether the same pattern of divergence could be found in nuclear genes. However, no genomic resource is available for any chameleon species. Here we present the first chameleon transcriptome, obtained using deep sequencing (SOLiD). Our analysis identified 164,000 sequence contigs of which 19,000 yielded unique BlastX hits. To test the efficacy of our sequencing effort, we examined whether the chameleon and other available reptilian transcriptomes harbored complete sets of genes comprising known biochemical pathways, focusing on the nDNA-encoded oxidative phosphorylation (OXPHOS) genes as a model. As a reference for the screen, we used the human 86 (including isoforms) known structural nDNA-encoded OXPHOS subunits. Analysis of 34 publicly available vertebrate transcriptomes revealed orthologs for most human OXPHOS genes. However, OXPHOS subunit COX8 (Cytochrome C oxidase subunit 8), including all its known isoforms, was consistently absent in transcriptomes of iguanian lizards, implying loss of this subunit during the radiation of this suborder. The lack of COX8 in the suborder Iguania is intriguing, since it is important for cellular respiration and ATP production. Our sequencing effort added a new resource for comparative genomic studies, and shed new light on the evolutionary dynamics of the OXPHOS system. PMID:24009133

  19. The first Chameleon transcriptome: comparative genomic analysis of the OXPHOS system reveals loss of COX8 in Iguanian lizards.

    PubMed

    Bar-Yaacov, Dan; Bouskila, Amos; Mishmar, Dan

    2013-01-01

    Recently, we found dramatic mitochondrial DNA divergence of Israeli Chamaeleo chamaeleon populations into two geographically distinct groups. We aimed to examine whether the same pattern of divergence could be found in nuclear genes. However, no genomic resource is available for any chameleon species. Here we present the first chameleon transcriptome, obtained using deep sequencing (SOLiD). Our analysis identified 164,000 sequence contigs of which 19,000 yielded unique BlastX hits. To test the efficacy of our sequencing effort, we examined whether the chameleon and other available reptilian transcriptomes harbored complete sets of genes comprising known biochemical pathways, focusing on the nDNA-encoded oxidative phosphorylation (OXPHOS) genes as a model. As a reference for the screen, we used the human 86 (including isoforms) known structural nDNA-encoded OXPHOS subunits. Analysis of 34 publicly available vertebrate transcriptomes revealed orthologs for most human OXPHOS genes. However, OXPHOS subunit COX8 (Cytochrome C oxidase subunit 8), including all its known isoforms, was consistently absent in transcriptomes of iguanian lizards, implying loss of this subunit during the radiation of this suborder. The lack of COX8 in the suborder Iguania is intriguing, since it is important for cellular respiration and ATP production. Our sequencing effort added a new resource for comparative genomic studies, and shed new light on the evolutionary dynamics of the OXPHOS system.

  20. Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis

    PubMed Central

    Bussemaker, Harmen J.; Li, Hao; Siggia, Eric D.

    2000-01-01

    The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners. PMID:10944202

  1. Asexual-sexual morph connection in the type species of Berkleasmium.

    PubMed

    Tanney, Joey; Miller, Andrew N

    2017-06-01

    Berkleasmium is a polyphyletic genus comprising 37 dematiaceous hyphomycetous species. In this study, independent collections of the type species, B. concinnum , were made from Eastern North America. Nuclear internal transcribed spacer rDNA (ITS) and partial nuc 28S large subunit rDNA (LSU) sequences obtained from collections and subsequent cultures showed that Berkleasmium concinnum is the asexual morph of Neoacanthostigma septoconstrictum ( Tubeufiaceae , Tubeufiales ). Phylogenies inferred from Bayesian inference and maximum likelihood analyses of ITS-LSU sequence data confirmed this asexual-sexual morph connection and a re-examination of fungarium reference specimens also revealed the co-occurrence of N. septoconstrictum ascomata and B. concinnum sporodochia. Neoacanthostigma septoconstrictum is therefore synonymized under B. concinnum on the basis of priority. A specimen identified as N. septoconstrictum from Thailand is described as N. thailandicum sp. nov., based on morphological and genetic distinctiveness.

  2. The reference strain Aeromonas hydrophicla CIP 57.50 should be reclassified as Aeromonas salmonicida CIP 57.50.

    PubMed

    Miñana-Galbis, David; Farfàn, Maribel; Lorén, J Gaspar; Fusté, M Carmen

    2010-03-01

    The use of reference strains is a critical element for the quality control of different assays, from the development of molecular methods to the evaluation of antimicrobial activities. Most of the strains used in these assays are not type strains and some of them are cited erroneously because of subsequent reclassifications and descriptions of novel species. In this study, we propose that the reference strain Aeromonas hydrophila CIP 57.50 be reclassified as Aeromonas salmonicida CIP 57.50 based on phenotypic characterization and sequence analyses of the cpn60, dnaJ, gyrB and rpoD genes.

  3. Rapid identification of fungal pathogens in BacT/ALERT, BACTEC, and BBL MGIT media using polymerase chain reaction and DNA sequencing of the internal transcribed spacer regions.

    PubMed

    Pryce, Todd M; Palladino, Silvano; Price, Diane M; Gardam, Dianne J; Campbell, Peter B; Christiansen, Keryn J; Murray, Ronan J

    2006-04-01

    We report a direct polymerase chain reaction/sequence (d-PCRS)-based method for the rapid identification of clinically significant fungi from 5 different types of commercial broth enrichment media inoculated with clinical specimens. Media including BacT/ALERT FA (BioMérieux, Marcy l'Etoile, France) (n = 87), BACTEC Plus Aerobic/F (Becton Dickinson, Microbiology Systems, Sparks, MD) (n = 16), BACTEC Peds Plus/F (Becton Dickinson) (n = 15), BACTEC Lytic/10 Anaerobic/F (Becton Dickinson) (n = 11) bottles, and BBL MGIT (Becton Dickinson) (n = 11) were inoculated with specimens from 138 patients. A universal DNA extraction method was used combining a novel pretreatment step to remove PCR inhibitors with a column-based DNA extraction kit. Target sequences in the noncoding internal transcribed spacer regions of the rRNA gene were amplified by PCR and sequenced using a rapid (24 h) automated capillary electrophoresis system. Using sequence alignment software, fungi were identified by sequence similarity with sequences derived from isolates identified by upper-level reference laboratories or isolates defined as ex-type strains. We identified Candida albicans (n = 14), Candida parapsilosis (n = 8), Candida glabrata (n = 7), Candida krusei (n = 2), Scedosporium prolificans (n = 4), and 1 each of Candida orthopsilosis, Candida dubliniensis, Candida kefyr, Candida tropicalis, Candida guilliermondii, Saccharomyces cerevisiae, Cryptococcus neoformans, Aspergillus fumigatus, Histoplasma capsulatum, and Malassezia pachydermatis by d-PCRS analysis. All d-PCRS identifications from positive broths were in agreement with the final species identification of the isolates grown from subculture. Earlier identification of fungi using d-PCRS may facilitate prompt and more appropriate antifungal therapy.

  4. Haplotype Detection from Next-Generation Sequencing in High-Ploidy-Level Species: 45S rDNA Gene Copies in the Hexaploid Spartina maritima

    PubMed Central

    Boutte, Julien; Aliaga, Benoît; Lima, Oscar; Ferreira de Carvalho, Julie; Ainouche, Abdelkader; Macas, Jiri; Rousseau-Gueutin, Mathieu; Coriton, Olivier; Ainouche, Malika; Salmon, Armel

    2015-01-01

    Gene and whole-genome duplications are widespread in plant nuclear genomes, resulting in sequence heterogeneity. Identification of duplicated genes may be particularly challenging in highly redundant genomes, especially when there are no diploid parents as a reference. Here, we developed a pipeline to detect the different copies in the ribosomal RNA gene family in the hexaploid grass Spartina maritima from next-generation sequencing (Roche-454) reads. The heterogeneity of the different domains of the highly repeated 45S unit was explored by identifying single nucleotide polymorphisms (SNPs) and assembling reads based on shared polymorphisms. SNPs were validated using comparisons with Illumina sequence data sets and by cloning and Sanger (re)sequencing. Using this approach, 29 validated polymorphisms and 11 validated haplotypes were reported (out of 34 and 20, respectively, that were initially predicted by our program). The rDNA domains of S. maritima have similar lengths as those found in other Poaceae, apart from the 5′-ETS, which is approximately two-times longer in S. maritima. Sequence homogeneity was encountered in coding regions and both internal transcribed spacers (ITS), whereas high intragenomic variability was detected in the intergenic spacer (IGS) and the external transcribed spacer (ETS). Molecular cytogenetic analysis by fluorescent in situ hybridization (FISH) revealed the presence of one pair of 45S rDNA signals on the chromosomes of S. maritima instead of three expected pairs for a hexaploid genome, indicating loss of duplicated homeologous loci through the diploidization process. The procedure developed here may be used at any ploidy level and using different sequencing technologies. PMID:26530424

  5. A massively parallel strategy for STR marker development, capture, and genotyping.

    PubMed

    Kistler, Logan; Johnson, Stephen M; Irwin, Mitchell T; Louis, Edward E; Ratan, Aakrosh; Perry, George H

    2017-09-06

    Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci-97.3-99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.

  6. A reliable DNA barcode reference library for the identification of the North European shelf fish fauna.

    PubMed

    Knebelsberger, Thomas; Landi, Monica; Neumann, Hermann; Kloppmann, Matthias; Sell, Anne F; Campbell, Patrick D; Laakmann, Silke; Raupach, Michael J; Carvalho, Gary R; Costa, Filipe O

    2014-09-01

    Valid fish species identification is an essential step both for fundamental science and fisheries management. The traditional identification is mainly based on external morphological diagnostic characters, leading to inconsistent results in many cases. Here, we provide a sequence reference library based on mitochondrial cytochrome c oxidase subunit I (COI) for a valid identification of 93 North Atlantic fish species originating from the North Sea and adjacent waters, including many commercially exploited species. Neighbour-joining analysis based on K2P genetic distances formed nonoverlapping clusters for all species with a ≥99% bootstrap support each. Identification was successful for 100% of the species as the minimum genetic distance to the nearest neighbour always exceeded the maximum intraspecific distance. A barcoding gap was apparent for the whole data set. Within-species distances ranged from 0 to 2.35%, while interspecific distances varied between 3.15 and 28.09%. Distances between congeners were on average 51-fold higher than those within species. The validation of the sequence library by applying BOLDs barcode index number (BIN) analysis tool and a ranking system demonstrated high taxonomic reliability of the DNA barcodes for 85% of the investigated fish species. Thus, the sequence library presented here can be confidently used as a benchmark for identification of at least two-thirds of the typical fish species recorded for the North Sea. © 2014 John Wiley & Sons Ltd.

  7. Evaluation of the stability of reference genes in bone mesenchymal stem cells from patients with avascular necrosis of the femoral head.

    PubMed

    Wang, X N; Yang, Q W; Du, Z W; Yu, T; Qin, Y G; Song, Y; Xu, M; Wang, J C

    2016-05-25

    This study aimed to evaluate 12 genes (18S, GAPDH, B2M, ACTB, ALAS1, GUSB, HPRT1, PBGD, PPIA, PUM1, RPL29, and TBP) for their reliability and stability as reference sequences for real-time quantitative PCR (RT-qPCR) in bone marrow-derived mesenchymal stem cells (BMSCs) isolated from patients with avascular necrosis of the femoral head (ANFH). BMSCs were isolated from 20 ANFH patients divided into four groups according to etiology, and four donors with femoral neck fractures. Total RNA was isolated from BMSCs and reverse transcribed into complementary DNA, which served as a template for RT-qPCR. Three commonly used programs were then used to analyze the results. Reference gene expression varied within each group, between specific groups, and among all five groups. Based on comparisons of all five groups, two of the programs used suggested that HPRT1 was the most stable reference gene, while 18S and ACTB were the most variable. Among the 12 candidate reference genes, HPRT1 exhibited the greatest reliability, followed by PPIA. Thus, these sequences could be used as references for the normalization of RT-qPCR results.

  8. DNA Barcoding Identifies Argentine Fishes from Marine and Brackish Waters

    PubMed Central

    Mabragaña, Ezequiel; Díaz de Astarloa, Juan Martín; Hanner, Robert; Zhang, Junbin; González Castro, Mariano

    2011-01-01

    Background DNA barcoding has been advanced as a promising tool to aid species identification and discovery through the use of short, standardized gene targets. Despite extensive taxonomic studies, for a variety of reasons the identification of fishes can be problematic, even for experts. DNA barcoding is proving to be a useful tool in this context. However, its broad application is impeded by the need to construct a comprehensive reference sequence library for all fish species. Here, we make a regional contribution to this grand challenge by calibrating the species discrimination efficiency of barcoding among 125 Argentine fish species, representing nearly one third of the known fauna, and examine the utility of these data to address several key taxonomic uncertainties pertaining to species in this region. Methodology/Principal Findings Specimens were collected and morphologically identified during crusies conducted between 2005 and 2008. The standard BARCODE fragment of COI was amplified and bi-directionally sequenced from 577 specimens (mean of 5 specimens/species), and all specimens and sequence data were archived and interrogated using analytical tools available on the Barcode of Life Data System (BOLD; www.barcodinglife.org). Nearly all species exhibited discrete clusters of closely related haplogroups which permitted the discrimination of 95% of the species (i.e. 119/125) examined while cases of shared haplotypes were detected among just three species-pairs. Notably, barcoding aided the identification of a new species of skate, Dipturus argentinensis, permitted the recognition of Genypterus brasiliensis as a valid species and questions the generic assignment of Paralichthys isosceles. Conclusions/Significance This study constitutes a significant contribution to the global barcode reference sequence library for fishes and demonstrates the utility of barcoding for regional species identification. As an independent assessment of alpha taxonomy, barcodes provide robust support for most morphologically based taxon concepts and also highlight key areas of taxonomic uncertainty worthy of reappraisal. PMID:22174860

  9. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  10. Synthesis of DNA

    DOEpatents

    Mariella, Jr., Raymond P.

    2008-11-18

    A method of synthesizing a desired double-stranded DNA of a predetermined length and of a predetermined sequence. Preselected sequence segments that will complete the desired double-stranded DNA are determined. Preselected segment sequences of DNA that will be used to complete the desired double-stranded DNA are provided. The preselected segment sequences of DNA are assembled to produce the desired double-stranded DNA.

  11. Nanopore Technology: A Simple, Inexpensive, Futuristic Technology for DNA Sequencing.

    PubMed

    Gupta, P D

    2016-10-01

    In health care, importance of DNA sequencing has been fully established. Sanger's Capillary Electrophoresis DNA sequencing methodology is time consuming, cumbersome, hence become more expensive. Lately, because of its versatility DNA sequencing became house hold name, and therefore, there is an urgent need of simple, fast, inexpensive, DNA sequencing technology. In the beginning of this century efforts were made, and Nanopore DNA sequencing technology was developed; still it is infancy, nevertheless, it is the futuristic technology.

  12. The genome-wide DNA sequence specificity of the anti-tumour drug bleomycin in human cells.

    PubMed

    Murray, Vincent; Chen, Jon K; Tanaka, Mark M

    2016-07-01

    The cancer chemotherapeutic agent, bleomycin, cleaves DNA at specific sites. For the first time, the genome-wide DNA sequence specificity of bleomycin breakage was determined in human cells. Utilising Illumina next-generation DNA sequencing techniques, over 200 million bleomycin cleavage sites were examined to elucidate the bleomycin genome-wide DNA selectivity. The genome-wide bleomycin cleavage data were analysed by four different methods to determine the cellular DNA sequence specificity of bleomycin strand breakage. For the most highly cleaved DNA sequences, the preferred site of bleomycin breakage was at 5'-GT* dinucleotide sequences (where the asterisk indicates the bleomycin cleavage site), with lesser cleavage at 5'-GC* dinucleotides. This investigation also determined longer bleomycin cleavage sequences, with preferred cleavage at 5'-GT*A and 5'- TGT* trinucleotide sequences, and 5'-TGT*A tetranucleotides. For cellular DNA, the hexanucleotide DNA sequence 5'-RTGT*AY (where R is a purine and Y is a pyrimidine) was the most highly cleaved DNA sequence. It was striking that alternating purine-pyrimidine sequences were highly cleaved by bleomycin. The highest intensity cleavage sites in cellular and purified DNA were very similar although there were some minor differences. Statistical nucleotide frequency analysis indicated a G nucleotide was present at the -3 position (relative to the cleavage site) in cellular DNA but was absent in purified DNA.

  13. Large-scale transcriptome characterization and mass discovery of SNPs in globe artichoke and its related taxa.

    PubMed

    Scaglione, Davide; Lanteri, Sergio; Acquadro, Alberto; Lai, Zhao; Knapp, Steven J; Rieseberg, Loren; Portis, Ezio

    2012-10-01

    Cynara cardunculus (2n = 2× = 34) is a member of the Asteraceae family that contributes significantly to the agricultural economy of the Mediterranean basin. The species includes two cultivated varieties, globe artichoke and cardoon, which are grown mainly for food. Cynara cardunculus is an orphan crop species whose genome/transcriptome has been relatively unexplored, especially in comparison to other Asteraceae crops. Hence, there is a significant need to improve its genomic resources through the identification of novel genes and sequence-based markers, to design new breeding schemes aimed at increasing quality and crop productivity. We report the outcome of cDNA sequencing and assembly for eleven accessions of C. cardunculus. Sequencing of three mapping parental genotypes using Roche 454-Titanium technology generated 1.7 × 10⁶ reads, which were assembled into 38,726 reference transcripts covering 32 Mbp. Putative enzyme-encoding genes were annotated using the KEGG-database. Transcription factors and candidate resistance genes were surveyed as well. Paired-end sequencing was done for cDNA libraries of eight other representative C. cardunculus accessions on an Illumina Genome Analyzer IIx, generating 46 × 10⁶ reads. Alignment of the IGA and 454 reads to reference transcripts led to the identification of 195,400 SNPs with a Bayesian probability exceeding 95%; a validation rate of 90% was obtained by Sanger-sequencing of a subset of contigs. These results demonstrate that the integration of data from different NGS platforms enables large-scale transcriptome characterization, along with massive SNP discovery. This information will contribute to the dissection of key agricultural traits in C. cardunculus and facilitate the implementation of marker-assisted selection programs. © 2012 The Authors. Plant Biotechnology Journal © 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.

  14. Differential gene expression in dentate granule cells in mesial temporal lobe epilepsy with and without hippocampal sclerosis.

    PubMed

    Griffin, Nicole G; Wang, Yu; Hulette, Christine M; Halvorsen, Matt; Cronin, Kenneth D; Walley, Nicole M; Haglund, Michael M; Radtke, Rodney A; Skene, J H Pate; Sinha, Saurabh R; Heinzen, Erin L

    2016-03-01

    Hippocampal sclerosis is the most common neuropathologic finding in cases of medically intractable mesial temporal lobe epilepsy. In this study, we analyzed the gene expression profiles of dentate granule cells of patients with mesial temporal lobe epilepsy with and without hippocampal sclerosis to show that next-generation sequencing methods can produce interpretable genomic data from RNA collected from small homogenous cell populations, and to shed light on the transcriptional changes associated with hippocampal sclerosis. RNA was extracted, and complementary DNA (cDNA) was prepared and amplified from dentate granule cells that had been harvested by laser capture microdissection from surgically resected hippocampi from patients with mesial temporal lobe epilepsy with and without hippocampal sclerosis. Sequencing libraries were sequenced, and the resulting sequencing reads were aligned to the reference genome. Differential expression analysis was used to ascertain expression differences between patients with and without hippocampal sclerosis. Greater than 90% of the RNA-Seq reads aligned to the reference. There was high concordance between transcriptional profiles obtained for duplicate samples. Principal component analysis revealed that the presence or absence of hippocampal sclerosis was the main determinant of the variance within the data. Among the genes up-regulated in the hippocampal sclerosis samples, there was significant enrichment for genes involved in oxidative phosphorylation. By analyzing the gene expression profiles of dentate granule cells from surgically resected hippocampal specimens from patients with mesial temporal lobe epilepsy with and without hippocampal sclerosis, we have demonstrated the utility of next-generation sequencing methods for producing biologically relevant results from small populations of homogeneous cells, and have provided insight on the transcriptional changes associated with this pathology. Wiley Periodicals, Inc. © 2016 International League Against Epilepsy.

  15. Polymorphic design of DNA origami structures through mechanical control of modular components.

    PubMed

    Lee, Chanseok; Lee, Jae Young; Kim, Do-Nyun

    2017-12-12

    Scaffolded DNA origami enables the bottom-up fabrication of diverse DNA nanostructures by designing hundreds of staple strands, comprised of complementary sequences to the specific binding locations of a scaffold strand. Despite its exceptionally high design flexibility, poor reusability of staples has been one of the major hurdles to fabricate assorted DNA constructs in an effective way. Here we provide a rational module-based design approach to create distinct bent shapes with controllable geometries and flexibilities from a single, reference set of staples. By revising the staple connectivity within the desired module, we can control the location, stiffness, and included angle of hinges precisely, enabling the construction of dozens of single- or multiple-hinge structures with the replacement of staple strands up to 12.8% only. Our design approach, combined with computational shape prediction and analysis, can provide a versatile and cost-effective procedure in the design of DNA origami shapes with stiffness-tunable units.

  16. Non-invasive prenatal diagnosis.

    PubMed

    Meaney, Cathy; Norbury, Gail

    2011-01-01

    The discovery of cell-free fetal DNA in the maternal plasma of pregnant women has facilitated the development of non-invasive prenatal diagnosis (NIPD). This has been successfully implemented in diagnostic laboratories for Rhesus typing and fetal sex determination for X-linked disorders and congenital adrenal hyperplasia (CAH) from 7 weeks gestation. Using real-time PCR, fluorescently labelled target gene specific probes can identify and quantify low copy number fetal-specific sequences in a high background of maternal DNA in the cell-free DNA extracted from maternal plasma.NIPD to detect specific fetal mutations in single gene disorders, currently by standard PCR techniques, can only be undertaken for paternally derived or de novo mutations because of the background maternal DNA. For routine use, this testing is limited by the large amounts of cell-free maternal DNA in the sample, the lack of universal fetal markers, and appropriate reference materials.

  17. Evaluation of four endogenous reference genes and their real-time PCR assays for common wheat quantification in GMOs detection.

    PubMed

    Huang, Huali; Cheng, Fang; Wang, Ruoan; Zhang, Dabing; Yang, Litao

    2013-01-01

    Proper selection of endogenous reference genes and their real-time PCR assays is quite important in genetically modified organisms (GMOs) detection. To find a suitable endogenous reference gene and its real-time PCR assay for common wheat (Triticum aestivum L.) DNA content or copy number quantification, four previously reported wheat endogenous reference genes and their real-time PCR assays were comprehensively evaluated for the target gene sequence variation and their real-time PCR performance among 37 common wheat lines. Three SNPs were observed in the PKABA1 and ALMT1 genes, and these SNPs significantly decreased the efficiency of real-time PCR amplification. GeNorm analysis of the real-time PCR performance of each gene among common wheat lines showed that the Waxy-D1 assay had the lowest M values with the best stability among all tested lines. All results indicated that the Waxy-D1 gene and its real-time PCR assay were most suitable to be used as an endogenous reference gene for common wheat DNA content quantification. The validated Waxy-D1 gene assay will be useful in establishing accurate and creditable qualitative and quantitative PCR analysis of GM wheat.

  18. Evaluation of Four Endogenous Reference Genes and Their Real-Time PCR Assays for Common Wheat Quantification in GMOs Detection

    PubMed Central

    Huang, Huali; Cheng, Fang; Wang, Ruoan; Zhang, Dabing; Yang, Litao

    2013-01-01

    Proper selection of endogenous reference genes and their real-time PCR assays is quite important in genetically modified organisms (GMOs) detection. To find a suitable endogenous reference gene and its real-time PCR assay for common wheat (Triticum aestivum L.) DNA content or copy number quantification, four previously reported wheat endogenous reference genes and their real-time PCR assays were comprehensively evaluated for the target gene sequence variation and their real-time PCR performance among 37 common wheat lines. Three SNPs were observed in the PKABA1 and ALMT1 genes, and these SNPs significantly decreased the efficiency of real-time PCR amplification. GeNorm analysis of the real-time PCR performance of each gene among common wheat lines showed that the Waxy-D1 assay had the lowest M values with the best stability among all tested lines. All results indicated that the Waxy-D1 gene and its real-time PCR assay were most suitable to be used as an endogenous reference gene for common wheat DNA content quantification. The validated Waxy-D1 gene assay will be useful in establishing accurate and creditable qualitative and quantitative PCR analysis of GM wheat. PMID:24098735

  19. Identification of Clinical Isolates of Actinomyces Species by Amplified 16S Ribosomal DNA Restriction Analysis

    PubMed Central

    Hall, Val; Talbot, P. R.; Stubbs, S. L.; Duerden, B. I.

    2001-01-01

    Amplified 16S ribosomal DNA (rDNA) restriction analysis (ARDRA), using enzymes HaeIII and HpaII, was applied to 176 fresh and 299 stored clinical isolates of putative Actinomyces spp. referred to the Anaerobe Reference Unit of the Public Health Laboratory Service for confirmation of identity. Results were compared with ARDRA results obtained previously for reference strains and with conventional phenotypic reactions. Identities of some strains were confirmed by analysis of partial 16S rDNA sequences. Of the 475 isolates, 331 (70%) were clearly assigned to recognized Actinomyces species, including 94 isolates assigned to six recently described species. A further 52 isolates in 12 ARDRA profiles were designated as apparently resembling recognized species, and 44 isolates, in 18 novel profiles, were confirmed as members of genera other than Actinomyces. The identities of 48 isolates in nine profiles remain uncertain, and they may represent novel species of Actinomyces. For the majority of species, phenotypic results, published reactions for the species, and ARDRA profiles concurred. However, of 113 stored isolates originally identified as A. meyeri or resembling A. meyeri by phenotypic tests, only 21 were confirmed as A. meyeri by ARDRA; 63 were reassigned as A. turicensis, 7 as other recognized species, and 22 as unidentified actinomycetes. Analyses of incidence and clinical associations of Actinomyces spp. add to the currently sparse knowledge of some recently described species. PMID:11574572

  20. Genomic resources and draft assemblies of the human and porcine varieties of scabies mites, Sarcoptes scabiei var. hominis and var. suis.

    PubMed

    Mofiz, Ehtesham; Holt, Deborah C; Seemann, Torsten; Currie, Bart J; Fischer, Katja; Papenfuss, Anthony T

    2016-06-02

    The scabies mite, Sarcoptes scabiei, is a parasitic arachnid and cause of the infectious skin disease scabies in humans and mange in other animal species. Scabies infections are a major health problem, particularly in remote Indigenous communities in Australia, where secondary group A streptococcal and Staphylococcus aureus infections of scabies sores are thought to drive the high rate of rheumatic heart disease and chronic kidney disease. We sequenced the genome of two samples of Sarcoptes scabiei var. hominis obtained from unrelated patients with crusted scabies located in different parts of northern Australia using the Illumina HiSeq. We also sequenced samples of Sarcoptes scabiei var. suis from a pig model. Because of the small size of the scabies mite, these data are derived from pools of thousands of mites and are metagenomic, including host and microbiome DNA. We performed cleaning and de novo assembly and present Sarcoptes scabiei var. hominis and var. suis draft reference genomes. We have constructed a preliminary annotation of this reference comprising 13,226 putative coding sequences based on sequence similarity to known proteins. We have developed extensive genomic resources for the scabies mite, including reference genomes and a preliminary annotation.

  1. Deep Investigation of Arabidopsis thaliana Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter

    PubMed Central

    Maumus, Florian; Quesneville, Hadi

    2014-01-01

    Eukaryotic genomes contain highly variable amounts of DNA with no apparent function. This so-called junk DNA is composed of two components: repeated and repeat-derived sequences (together referred to as the repeatome), and non-annotated sequences also known as genomic dark matter. Because of their high duplication rates as compared to other genomic features, transposable elements are predominant contributors to the repeatome and the products of their decay is thought to be a major source of genomic dark matter. Determining the origin and composition of junk DNA is thus important to help understanding genome evolution as well as host biology. In this study, we have used a combination of tools enabling to show that the repeatome from the small and reducing A. thaliana genome is significantly larger than previously thought. Furthermore, we present the concepts and results from a series of innovative approaches suggesting that a significant amount of the A. thaliana dark matter is of repetitive origin. As a tentative standard for the community, we propose a deep compendium annotation of the A. thaliana repeatome that may help addressing farther genome evolution as well as transcriptional and epigenetic regulation in this model plant. PMID:24709859

  2. Epigenetic regulatory mechanisms in vertebrate eye development and disease

    PubMed Central

    Cvekl, A; Mitton, KP

    2014-01-01

    Eukaryotic DNA is organized as a nucleoprotein polymer termed chromatin with nucleosomes serving as its repetitive architectural units. Cellular differentiation is a dynamic process driven by activation and repression of specific sets of genes, partitioning the genome into transcriptionally active and inactive chromatin domains. Chromatin architecture at individual genes/loci may remain stable through cell divisions, from a single mother cell to its progeny during mitosis, and represents an example of epigenetic phenomena. Epigenetics refers to heritable changes caused by mechanisms distinct from the primary DNA sequence. Recent studies have shown a number of links between chromatin structure, gene expression, extracellular signaling, and cellular differentiation during eye development. This review summarizes recent advances in this field, and the relationship between sequence-specific DNA-binding transcription factors and their roles in recruitment of chromatin remodeling enzymes. In addition, lens and retinal differentiation is accompanied by specific changes in the nucleolar organization, expression of non-coding RNAs, and DNA methylation. Epigenetic regulatory mechanisms in ocular tissues represent exciting areas of research that have opened new avenues for understanding normal eye development, inherited eye diseases and eye diseases related to aging and the environment. PMID:20179734

  3. VIP Barcoding: composition vector-based software for rapid species identification based on DNA barcoding.

    PubMed

    Fan, Long; Hui, Jerome H L; Yu, Zu Guo; Chu, Ka Hou

    2014-07-01

    Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time-consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user-friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two-stage algorithm. First, an alignment-free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment-based K2P distance nearest-neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment-free methods and (ii) higher scalability than alignment-based distance methods and character-based methods. These results suggest that this platform is able to deal with both large-scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/. © 2014 John Wiley & Sons Ltd.

  4. Sequence and Structure Dependent DNA-DNA Interactions

    NASA Astrophysics Data System (ADS)

    Kopchick, Benjamin; Qiu, Xiangyun

    Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.

  5. 7 CFR 340.2 - Groups of organisms which are or contain plant pests and exemptions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... within the group listed are included as organisms that may be or may contain plant pests, and are... deemed a plant pest for purposes of § 340.2, if the scientific literature refers to the organism as a... DNA or RNA sequences, organelles, plasmids, parts, copies, and/or analogs, of or from any of the...

  6. 7 CFR 340.2 - Groups of organisms which are or contain plant pests and exemptions.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... within the group listed are included as organisms that may be or may contain plant pests, and are... deemed a plant pest for purposes of § 340.2, if the scientific literature refers to the organism as a... DNA or RNA sequences, organelles, plasmids, parts, copies, and/or analogs, of or from any of the...

  7. 7 CFR 340.2 - Groups of organisms which are or contain plant pests and exemptions.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... within the group listed are included as organisms that may be or may contain plant pests, and are... deemed a plant pest for purposes of § 340.2, if the scientific literature refers to the organism as a... DNA or RNA sequences, organelles, plasmids, parts, copies, and/or analogs, of or from any of the...

  8. 7 CFR 340.2 - Groups of organisms which are or contain plant pests and exemptions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... within the group listed are included as organisms that may be or may contain plant pests, and are... deemed a plant pest for purposes of § 340.2, if the scientific literature refers to the organism as a... DNA or RNA sequences, organelles, plasmids, parts, copies, and/or analogs, of or from any of the...

  9. 7 CFR 340.2 - Groups of organisms which are or contain plant pests and exemptions.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... within the group listed are included as organisms that may be or may contain plant pests, and are... deemed a plant pest for purposes of § 340.2, if the scientific literature refers to the organism as a... DNA or RNA sequences, organelles, plasmids, parts, copies, and/or analogs, of or from any of the...

  10. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver

    PubMed Central

    Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Cornelissen, Marion; Kellam, Paul; Reiss, Peter

    2018-01-01

    Abstract Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver. PMID:29876136

  11. GENE-07. MOLECULAR NEUROPATHOLOGY 2.0 - INCREASING DIAGNOSTIC ACCURACY IN PEDIATRIC NEUROONCOLOGY

    PubMed Central

    Sturm, Dominik; Jones, David T.W.; Capper, David; Sahm, Felix; von Deimling, Andreas; Rutkoswki, Stefan; Warmuth-Metz, Monika; Bison, Brigitte; Gessi, Marco; Pietsch, Torsten; Pfister, Stefan M.

    2017-01-01

    Abstract The classification of central nervous system (CNS) tumors into clinically and biologically distinct entities and subgroups is challenging. Children and adolescents can be affected by >100 histological variants with very variable outcomes, some of which are exceedingly rare. The current WHO classification has introduced a number of novel molecular markers to aid routine neuropathological diagnostics, and DNA methylation profiling is emerging as a powerful tool to distinguish CNS tumor classes. The Molecular Neuropathology 2.0 study aims to integrate genome wide (epi-)genetic diagnostics with reference neuropathological assessment for all newly-diagnosed pediatric brain tumors in Germany. To date, >350 patients have been enrolled. A molecular diagnosis is established by epigenetic tumor classification through DNA methylation profiling and targeted panel sequencing of >130 genes to detect diagnostically and/or therapeutically useful DNA mutations, structural alterations, and fusion events. Results are aligned with the reference neuropathological diagnosis, and discrepant findings are discussed in a multi-disciplinary tumor board including reference neuroradiological evaluation. Ten FFPE sections as input material are sufficient to establish a molecular diagnosis in >95% of tumors. Alignment with reference pathology results in four broad categories: a) concordant classification (~77%), b) discrepant classification resolvable by tumor board discussion and/or additional data (~5%), c) discrepant classification without currently available options to resolve (~8%), and d) cases currently unclassifiable by molecular diagnostics (~10%). Discrepancies are enriched in certain histopathological entities, such as histological high grade gliomas with a molecularly low grade profile. Gene panel sequencing reveals predisposing germline events in ~10% of patients. Genome wide (epi-)genetic analyses add a valuable layer of information to routine neuropathological diagnostics. Our study provides insight into CNS tumors with divergent histopathological and molecular classification, opening new avenues for research discoveries and facilitating optimization of clinical management for affected patients in the future.

  12. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

    PubMed Central

    Degner, Jacob F.; Marioni, John C.; Pai, Athma A.; Pickrell, Joseph K.; Nkadori, Everlyne; Gilad, Yoav; Pritchard, Jonathan K.

    2009-01-01

    Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE). Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Availability: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE18156. Contact: jdegner@uchicago.edu; marioni@uchicago.edu; gilad@uchicago.edu; pritch@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19808877

  13. Automated Gene Ontology annotation for anonymous sequence data.

    PubMed

    Hennig, Steffen; Groth, Detlef; Lehrach, Hans

    2003-07-01

    Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.

  14. DeF-GPU: Efficient and effective deletions finding in hepatitis B viral genomic DNA using a GPU architecture.

    PubMed

    Cheng, Chun-Pei; Lan, Kuo-Lun; Liu, Wen-Chun; Chang, Ting-Tsung; Tseng, Vincent S

    2016-12-01

    Hepatitis B viral (HBV) infection is strongly associated with an increased risk of liver diseases like cirrhosis or hepatocellular carcinoma (HCC). Many lines of evidence suggest that deletions occurring in HBV genomic DNA are highly associated with the activity of HBV via the interplay between aberrant viral proteins release and human immune system. Deletions finding on the HBV whole genome sequences is thus a very important issue though there exist underlying the challenges in mining such big and complex biological data. Although some next generation sequencing (NGS) tools are recently designed for identifying structural variations such as insertions or deletions, their validity is generally committed to human sequences study. This design may not be suitable for viruses due to different species. We propose a graphics processing unit (GPU)-based data mining method called DeF-GPU to efficiently and precisely identify HBV deletions from large NGS data, which generally contain millions of reads. To fit the single instruction multiple data instructions, sequencing reads are referred to as multiple data and the deletion finding procedure is referred to as a single instruction. We use Compute Unified Device Architecture (CUDA) to parallelize the procedures, and further validate DeF-GPU on 5 synthetic and 1 real datasets. Our results suggest that DeF-GPU outperforms the existing commonly-used method Pindel and is able to exactly identify the deletions of our ground truth in few seconds. The source code and other related materials are available at https://sourceforge.net/projects/defgpu/. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. High Quality Maize Centromere 10 Sequence Reveals Evidence of Frequent Recombination Events

    PubMed Central

    Wolfgruber, Thomas K.; Nakashima, Megan M.; Schneider, Kevin L.; Sharma, Anupma; Xie, Zidian; Albert, Patrice S.; Xu, Ronghui; Bilinski, Paul; Dawe, R. Kelly; Ross-Ibarra, Jeffrey; Birchler, James A.; Presting, Gernot G.

    2016-01-01

    The ancestral centromeres of maize contain long stretches of the tandemly arranged CentC repeat. The abundance of tandem DNA repeats and centromeric retrotransposons (CR) has presented a significant challenge to completely assembling centromeres using traditional sequencing methods. Here, we report a nearly complete assembly of the 1.85 Mb maize centromere 10 from inbred B73 using PacBio technology and BACs from the reference genome project. The error rates estimated from overlapping BAC sequences are 7 × 10−6 and 5 × 10−5 for mismatches and indels, respectively. The number of gaps in the region covered by the reassembly was reduced from 140 in the reference genome to three. Three expressed genes are located between 92 and 477 kb from the inferred ancestral CentC cluster, which lies within the region of highest centromeric repeat density. The improved assembly increased the count of full-length CR from 5 to 55 and revealed a 22.7 kb segmental duplication that occurred approximately 121,000 years ago. Our analysis provides evidence of frequent recombination events in the form of partial retrotransposons, deletions within retrotransposons, chimeric retrotransposons, segmental duplications including higher order CentC repeats, a deleted CentC monomer, centromere-proximal inversions, and insertion of mitochondrial sequences. Double-strand DNA break (DSB) repair is the most plausible mechanism for these events and may be the major driver of centromere repeat evolution and diversity. In many cases examined here, DSB repair appears to be mediated by microhomology, suggesting that tandem repeats may have evolved to efficiently repair frequent DSBs in centromeres. PMID:27047500

  16. Escherichia coli O-Antigen Gene Clusters of Serogroups O62, O68, O131, O140, O142, and O163: DNA Sequences and Similarity between O62 and O68, and PCR-Based Serogrouping

    PubMed Central

    Liu, Yanhong; Yan, Xianghe; DebRoy, Chitrita; Fratamico, Pina M.; Needleman, David S.; Li, Robert W.; Wang, Wei; Losada, Liliana; Brinkac, Lauren; Radune, Diana; Toro, Magaly; Hegde, Narasimha; Meng, Jianghong

    2015-01-01

    The DNA sequence of the O-antigen gene clusters of Escherichia coli serogroups O62, O68, O131, O140, O142, and O163 was determined, and primers based on the wzx (O-antigen flippase) and/or wzy (O-antigen polymerase) genes within the O-antigen gene clusters were designed and used in PCR assays to identify each serogroup. Specificity was tested with E. coli reference strains, field isolates belonging to the target serogroups, and non-E. coli bacteria. The PCR assays were highly specific for the respective serogroups; however, the PCR assay targeting the O62 wzx gene reacted positively with strains belonging to E. coli O68, which was determined by serotyping. Analysis of the O-antigen gene cluster sequences of serogroups O62 and O68 reference strains showed that they were 94% identical at the nucleotide level, although O62 contained an insertion sequence (IS) element located between the rmlA and rmlC genes within the O-antigen gene cluster. A PCR assay targeting the rmlA and rmlC genes flanking the IS element was used to differentiate O62 and O68 serogroups. The PCR assays developed in this study can be used for the detection and identification of E. coli O62/O68, O131, O140, O142, and O163 strains isolated from different sources. PMID:25664526

  17. Genome-Wide Profiling of RNA–Protein Interactions Using CLIP-Seq

    PubMed Central

    Stork, Cheryl; Zheng, Sika

    2017-01-01

    UV crosslinking immunoprecipitation (CLIP) is an increasingly popular technique to study protein–RNA interactions in tissues and cells. Whole cells or tissues are ultraviolet irradiated to generate a covalent bond between RNA and proteins that are in close contact. After partial RNase digestion, antibodies specific to an RNA binding protein (RBP) or a protein–epitope tag is then used to immunoprecipitate the protein–RNA complexes. After stringent washing and gel separation the RBP–RNA complex is excised. The RBP is protease digested to allow purification of the bound RNA. Reverse transcription of the RNA followed by high-throughput sequencing of the cDNA library is now often used to identify protein bound RNA on a genome-wide scale. UV irradiation can result in cDNA truncations and/or mutations at the crosslink sites, which complicates the alignment of the sequencing library to the reference genome and the identification of the crosslinking sites. Meanwhile, one or more amino acids of a crosslinked RBP can remain attached to its bound RNA due to incomplete digestion of the protein. As a result, reverse transcriptase may not read through the crosslink sites, and produce cDNA ending at the crosslinked nucleotide. This is harnessed by one variant of CLIP methods to identify crosslinking sites at a nucleotide resolution. This method, individual nucleotide resolution CLIP (iCLIP) circularizes cDNA to capture the truncated cDNA and also increases the efficiency of ligating sequencing adapters to the library. Here, we describe the detailed procedure of iCLIP. PMID:26965263

  18. Co-located hAT transposable element and 5S rDNA in an interstitial telomeric sequence suggest the formation of Robertsonian fusion in armored catfish.

    PubMed

    Glugoski, Larissa; Giuliano-Caetano, Lucia; Moreira-Filho, Orlando; Vicari, Marcelo R; Nogaroto, Viviane

    2018-04-15

    Co-located 5S rDNA genes and interstitial telomeric sites (ITS) revealed the involvement of multiple 5S rDNA clusters in chromosome rearrangements of Loricariidae. Interstitial (TTAGGG)n vestiges, in addition to telomeric sites, can coincide with locations of chromosomal rearrangements, and they are considered to be hotspots for chromosome breaks. This study aimed the molecular characterization of 5S rDNA in two Rineloricaria latirostris populations and examination of roles of 5S rDNA in breakpoint sites and its in situ localization. Rineloricaria latirostris from Brazil's Das Pedras river (2n = 46 chromosomes) presented five pairs identified using a 5S rDNA probe, in addition to a pair bearing a co-located ITS/5S rDNA. Rineloricaria latirostris from the Piumhi river (2n = 48 chromosomes) revealed two pairs containing 5S rDNA, without ITS. A 702-bp amplified sequence, using 5S rDNA primers, revealed an insertion of the hAT transposable element (TE), referred to as a degenerate 5S rDNA. Double-FISH (fluorescence in situ hybridization) demonstrated co-localization of 5S rDNA/degenerate 5S rDNA, 5S rDNA/hAT and ITS/5S rDNA from the Das Pedras river population. Piumhi river isolates possessed only 5S rDNA sites. We suggest that the degenerate 5S rDNA was generated by unequal crossing over, which was driven by invasion of hAT, establishing a breakpoint region susceptible to chromosome breakage, non-homologous recombination and Robertsonian (Rb) fusion. Furthermore, the presence of clusters of 5S rDNA at fusion points in other armored catfish species suggests its re-use and that these regions represent hotspots for evolutionary rearrangements within Loricariidae genomes. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences

    PubMed Central

    Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L.

    2017-01-01

    An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5′-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5′-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. PMID:28628204

  20. A statistical method for the detection of variants from next-generation resequencing of DNA pools.

    PubMed

    Bansal, Vikas

    2010-06-15

    Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.

  1. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Elucidating the diet of the island flying fox (Pteropus hypomelanus) in Peninsular Malaysia through Illumina Next-Generation Sequencing.

    PubMed

    Aziz, Sheema Abdul; Clements, Gopalasamy Reuben; Peng, Lee Yin; Campos-Arceiz, Ahimsa; McConkey, Kim R; Forget, Pierre-Michel; Gan, Han Ming

    2017-01-01

    There is an urgent need to identify and understand the ecosystem services of pollination and seed dispersal provided by threatened mammals such as flying foxes. The first step towards this is to obtain comprehensive data on their diet. However, the volant and nocturnal nature of bats presents a particularly challenging situation, and conventional microhistological approaches to studying their diet can be laborious and time-consuming, and provide incomplete information. We used Illumina Next-Generation Sequencing (NGS) as a novel, non-invasive method for analysing the diet of the island flying fox ( Pteropus hypomelanus ) on Tioman Island, Peninsular Malaysia. Through DNA metabarcoding of plants in flying fox droppings, using primers targeting the rbcL gene, we identified at least 29 Operationally Taxonomic Units (OTUs) comprising the diet of this giant pteropodid. OTU sequences matched at least four genera and 14 plant families from online reference databases based on a conservative Least Common Ancestor approach, and eight species from our site-specific plant reference collection. NGS was just as successful as conventional microhistological analysis in detecting plant taxa from droppings, but also uncovered six additional plant taxa. The island flying fox's diet appeared to be dominated by figs ( Ficus sp.), which was the most abundant plant taxon detected in the droppings every single month. Our study has shown that NGS can add value to the conventional microhistological approach in identifying food plant species from flying fox droppings. At this point in time, more accurate genus- and species-level identification of OTUs not only requires support from databases with more representative sequences of relevant plant DNA, but probably necessitates in situ collection of plant specimens to create a reference collection. Although this method cannot be used to quantify true abundance or proportion of plant species, nor plant parts consumed, it ultimately provides a very important first step towards identifying plant taxa and spatio-temporal patterns in flying fox diets.

  3. Elucidating the diet of the island flying fox (Pteropus hypomelanus) in Peninsular Malaysia through Illumina Next-Generation Sequencing

    PubMed Central

    Clements, Gopalasamy Reuben; Peng, Lee Yin; Campos-Arceiz, Ahimsa; McConkey, Kim R.; Forget, Pierre-Michel; Gan, Han Ming

    2017-01-01

    There is an urgent need to identify and understand the ecosystem services of pollination and seed dispersal provided by threatened mammals such as flying foxes. The first step towards this is to obtain comprehensive data on their diet. However, the volant and nocturnal nature of bats presents a particularly challenging situation, and conventional microhistological approaches to studying their diet can be laborious and time-consuming, and provide incomplete information. We used Illumina Next-Generation Sequencing (NGS) as a novel, non-invasive method for analysing the diet of the island flying fox (Pteropus hypomelanus) on Tioman Island, Peninsular Malaysia. Through DNA metabarcoding of plants in flying fox droppings, using primers targeting the rbcL gene, we identified at least 29 Operationally Taxonomic Units (OTUs) comprising the diet of this giant pteropodid. OTU sequences matched at least four genera and 14 plant families from online reference databases based on a conservative Least Common Ancestor approach, and eight species from our site-specific plant reference collection. NGS was just as successful as conventional microhistological analysis in detecting plant taxa from droppings, but also uncovered six additional plant taxa. The island flying fox’s diet appeared to be dominated by figs (Ficus sp.), which was the most abundant plant taxon detected in the droppings every single month. Our study has shown that NGS can add value to the conventional microhistological approach in identifying food plant species from flying fox droppings. At this point in time, more accurate genus- and species-level identification of OTUs not only requires support from databases with more representative sequences of relevant plant DNA, but probably necessitates in situ collection of plant specimens to create a reference collection. Although this method cannot be used to quantify true abundance or proportion of plant species, nor plant parts consumed, it ultimately provides a very important first step towards identifying plant taxa and spatio-temporal patterns in flying fox diets. PMID:28413729

  4. Defining operational taxonomic units using DNA barcode data

    PubMed Central

    Blaxter, Mark; Mann, Jenna; Chapman, Tom; Thomas, Fran; Whitton, Claire; Floyd, Robin; Abebe, Eyualem

    2005-01-01

    Abstract The scale of diversity of life on this planet is a significant challenge for any scientific programme hoping to produce a complete catalogue, whatever means is used. For DNA barcoding studies, this difficulty is compounded by the realization that any chosen barcode sequence is not the gene ‘for’ speciation and that taxa have evolutionary histories. How are we to disentangle the confounding effects of reticulate population genetic processes? Using the DNA barcode data from meiofaunal surveys, here we discuss the benefits of treating the taxa defined by barcodes without reference to their correspondence to ‘species’, and suggest that using this non-idealist approach facilitates access to taxon groups that are not accessible to other methods of enumeration and classification. Major issues remain, in particular the methodologies for taxon discrimination in DNA barcode data. PMID:16214751

  5. Ribosomal RNA Genes Contribute to the Formation of Pseudogenes and Junk DNA in the Human Genome.

    PubMed

    Robicheau, Brent M; Susko, Edward; Harrigan, Amye M; Snyder, Marlene

    2017-02-01

    Approximately 35% of the human genome can be identified as sequence devoid of a selected-effect function, and not derived from transposable elements or repeated sequences. We provide evidence supporting a known origin for a fraction of this sequence. We show that: 1) highly degraded, but near full length, ribosomal DNA (rDNA) units, including both 45S and Intergenic Spacer (IGS), can be found at multiple sites in the human genome on chromosomes without rDNA arrays, 2) that these rDNA sequences have a propensity for being centromere proximal, and 3) that sequence at all human functional rDNA array ends is divergent from canonical rDNA to the point that it is pseudogenic. We also show that small sequence strings of rDNA (from 45S + IGS) can be found distributed throughout the genome and are identifiable as an "rDNA-like signal", representing 0.26% of the q-arm of HSA21 and ∼2% of the total sequence of other regions tested. The size of sequence strings found in the rDNA-like signal intergrade into the size of sequence strings that make up the full-length degrading rDNA units found scattered throughout the genome. We conclude that the displaced and degrading rDNA sequences are likely of a similar origin but represent different stages in their evolution towards random sequence. Collectively, our data suggests that over vast evolutionary time, rDNA arrays contribute to the production of junk DNA. The concept that the production of rDNA pseudogenes is a by-product of concerted evolution represents a previously under-appreciated process; we demonstrate here its importance. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Sexual and asexual states of some endophytic Phialocephala species of Picea.

    PubMed

    Tanney, Joey B; Douglas, Brian; Seifert, Keith A

    2016-01-01

    Unidentified DNA sequences in isolation-based or culture-free studies of conifer endophytes are a persistent problem that requires a field approach to resolve. An investigation of foliar endophytes of Picea glauca, P. mariana, P. rubens and Pinus strobus in eastern Canada, using a combined field, morphological, cultural and DNA sequencing approach, resulted in the frequent isolation of Phialocephala spp. and the first verified discovery of their mollisia-like sexual states in the field. Phialocephala scopiformis and Ph. piceae were the most frequent species isolated as endophytes from healthy conifer needles. Corresponding Mollisia or mollisioid sexual states for Ph. scopiformis, Ph. piceae and several undescribed species in a clade containing Ph. dimorphospora were collected in the sampling area and characterized by analysis of the nuc internal transcribed spacer rDNA (ITS) and gene for the largest subunit of RNA polymerase II (RPB1) loci. Four novel species and one new combination in a clade containing Ph. dimorphospora, the type of Phialocephala, are presented, accompanied by descriptions of apothecia and previously undocumented synanamorphs. An epitype culture and corresponding reference sequences for Phialocephala dimorphospora are proposed. The resulting ITS barcodes linked with robust taxonomic species concepts are an important resource for future research on forest ecosystems and endophytes. © 2016 by The Mycological Society of America.

  7. Technical Report on Modeling for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McLoughlin, K.

    2016-01-11

    The overall aim of this project is to develop a software package, called MetaQuant, that can determine the constituents of a complex microbial sample and estimate their relative abundances by analysis of metagenomic sequencing data. The goal for Task 1 is to create a generative model describing the stochastic process underlying the creation of sequence read pairs in the data set. The stages in this generative process include the selection of a source genome sequence for each read pair, with probability dependent on its abundance in the sample. The other stages describe the evolution of the source genome from itsmore » nearest common ancestor with a reference genome, breakage of the source DNA into short fragments, and the errors in sequencing the ends of the fragments to produce read pairs.« less

  8. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

    PubMed

    Yin, Changchuan

    2015-04-01

    To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.

  9. Single-cell genomic sequencing using Multiple Displacement Amplification.

    PubMed

    Lasken, Roger S

    2007-10-01

    Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction. The few femtograms of DNA in a bacterium are amplified into micrograms of high molecular weight DNA suitable for DNA library construction and Sanger sequencing. The MDA-generated DNA also performs well when used directly as template for pyrosequencing by the 454 Life Sciences method. While MDA from single cells loses some of the genomic sequence, this approach will greatly accelerate the pace of sequencing from uncultured microbes. The genetically linked sequences from single cells are also a powerful tool to be used in guiding genomic assembly of shotgun sequences of multiple organisms from environmental DNA extracts (metagenomic sequences).

  10. One Novel Multiple-Target Plasmid Reference Molecule Targeting Eight Genetically Modified Canola Events for Genetically Modified Canola Detection.

    PubMed

    Li, Zhuqing; Li, Xiang; Wang, Canhua; Song, Guiwen; Pi, Liqun; Zheng, Lan; Zhang, Dabing; Yang, Litao

    2017-09-27

    Multiple-target plasmid DNA reference materials have been generated and utilized as good substitutes of matrix-based reference materials in the analysis of genetically modified organisms (GMOs). Herein, we report the construction of one multiple-target plasmid reference molecule, pCAN, which harbors eight GM canola event-specific sequences (RF1, RF2, MS1, MS8, Topas 19/2, Oxy235, RT73, and T45) and a partial sequence of the canola endogenous reference gene PEP. The applicability of this plasmid reference material in qualitative and quantitative PCR assays of the eight GM canola events was evaluated, including the analysis of specificity, limit of detection (LOD), limit of quantification (LOQ), and performance of pCAN in the analysis of various canola samples, etc. The LODs are 15 copies for RF2, MS1, and RT73 assays using pCAN as the calibrator and 10 genome copies for the other events. The LOQ in each event-specific real-time PCR assay is 20 copies. In quantitative real-time PCR analysis, the PCR efficiencies of all event-specific and PEP assays are between 91% and 97%, and the squared regression coefficients (R 2 ) are all higher than 0.99. The quantification bias values varied from 0.47% to 20.68% with relative standard deviation (RSD) from 1.06% to 24.61% in the quantification of simulated samples. Furthermore, 10 practical canola samples sampled from imported shipments in the port of Shanghai, China, were analyzed employing pCAN as the calibrator, and the results were comparable with those assays using commercial certified materials as the calibrator. Concluding from these results, we believe that this newly developed pCAN plasmid is one good candidate for being a plasmid DNA reference material in the detection and quantification of the eight GM canola events in routine analysis.

  11. First complete genome sequence of infectious laryngotracheitis virus

    PubMed Central

    2011-01-01

    Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528

  12. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation.

    PubMed

    Howe, Glenn T; Yu, Jianbin; Knaus, Brian; Cronn, Richard; Kolpak, Scott; Dolan, Peter; Lorenz, W Walter; Dean, Jeffrey F D

    2013-02-28

    Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array-more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change.

  13. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation

    PubMed Central

    2013-01-01

    Background Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. Results We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Conclusions Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array—more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change. PMID:23445355

  14. Sequence homology and expression profile of genes associated with DNA repair pathways in Mycobacterium leprae.

    PubMed

    Sharma, Mukul; Vedithi, Sundeep Chaitanya; Das, Madhusmita; Roy, Anindya; Ebenezer, Mannam

    2017-01-01

    Survival of Mycobacterium leprae, the causative bacteria for leprosy, in the human host is dependent to an extent on the ways in which its genome integrity is retained. DNA repair mechanisms protect bacterial DNA from damage induced by various stress factors. The current study is aimed at understanding the sequence and functional annotation of DNA repair genes in M. leprae. T he genome of M. leprae was annotated using sequence alignment tools to identify DNA repair genes that have homologs in Mycobacterium tuberculosis and Escherichia coli. A set of 96 genes known to be involved in DNA repair mechanisms in E. coli and Mycobacteriaceae were chosen as a reference. Among these, 61 were identified in M. leprae based on sequence similarity and domain architecture. The 61 were classified into 36 characterized gene products (59%), 11 hypothetical proteins (18%), and 14 pseudogenes (23%). All these genes have homologs in M. tuberculosis and 49 (80.32%) in E. coli. A set of 12 genes which are absent in E. coli were present in M. leprae and in Mycobacteriaceae. These 61 genes were further investigated for their expression profiles in the whole transcriptome microarray data of M. leprae which was obtained from the signal intensities of 60bp probes, tiling the entire genome with 10bp overlaps. It was noted that transcripts corresponding to all the 61 genes were identified in the transcriptome data with varying expression levels ranging from 0.18 to 2.47 fold (normalized with 16SrRNA). The mRNA expression levels of a representative set of seven genes ( four annotated and three hypothetical protein coding genes) were analyzed using quantitative Polymerase Chain Reaction (qPCR) assays with RNA extracted from skin biopsies of 10 newly diagnosed, untreated leprosy cases. It was noted that RNA expression levels were higher for genes involved in homologous recombination whereas the genes with a low level of expression are involved in the direct repair pathway. This study provided preliminary information on the potential DNA repair pathways that are extant in M. leprae and the associated genes.

  15. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

    PubMed

    Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

    2016-07-01

    The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  16. Genetic characterisation of Taenia multiceps cysts from ruminants in Greece.

    PubMed

    Al-Riyami, Shumoos; Ioannidou, Evi; Koehler, Anson V; Hussain, Muhammad H; Al-Rawahi, Abdulmajeed H; Giadinis, Nektarios D; Lafi, Shawkat Q; Papadopoulos, Elias; Jabbar, Abdul

    2016-03-01

    This study was designed to genetically characterise the larval stage (coenurus) of Taenia multiceps from ruminants in Greece, utilising DNA regions within the cytochrome c oxidase subunit 1 (partial cox1) and NADH dehydrogenase 1 (pnad1) mitochondrial (mt) genes, respectively. A molecular-phylogenetic approach was used to analyse the pcox1 and pnad1 amplicons derived from genomic DNA samples from individual cysts (n=105) from cattle (n=3), goats (n=5) and sheep (n=97). Results revealed five and six distinct electrophoretic profiles for pcox1 and pnad1, respectively, using single-strand conformation polymorphism. Direct sequencing of selected amplicons representing each of these profiles defined five haplotypes each for pcox1 and pnad1, among all 105 isolates. Phylogenetic analysis of individual sequence data for each locus, including a range of well-defined reference sequences, inferred that all isolates of T. multiceps cysts from ruminants in Greece clustered with previously published sequences from different continents. The present study provides a foundation for future large-scale studies on the epidemiology of T. multiceps in ruminants as well as dogs in Greece. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. A Comparison of Honey Bee-Collected Pollen From Working Agricultural Lands Using Light Microscopy and ITS Metabarcoding.

    PubMed

    Smart, M D; Cornman, R S; Iwanowicz, D D; McDermott-Kubeczko, M; Pettis, J S; Spivak, M S; Otto, C R V

    2017-02-01

    Taxonomic identification of pollen has historically been accomplished via light microscopy but requires specialized knowledge and reference collections, particularly when identification to lower taxonomic levels is necessary. Recently, next-generation sequencing technology has been used as a cost-effective alternative for identifying bee-collected pollen; however, this novel approach has not been tested on a spatially or temporally robust number of pollen samples. Here, we compare pollen identification results derived from light microscopy and DNA sequencing techniques with samples collected from honey bee colonies embedded within a gradient of intensive agricultural landscapes in the Northern Great Plains throughout the 2010-2011 growing seasons. We demonstrate that at all taxonomic levels, DNA sequencing was able to discern a greater number of taxa, and was particularly useful for the identification of infrequently detected species. Importantly, substantial phenological overlap did occur for commonly detected taxa using either technique, suggesting that DNA sequencing is an appropriate, and enhancing, substitutive technique for accurately capturing the breadth of bee-collected species of pollen present across agricultural landscapes. We also show that honey bees located in high and low intensity agricultural settings forage on dissimilar plants, though with overlap of the most abundantly collected pollen taxa. We highlight practical applications of utilizing sequencing technology, including addressing ecological issues surrounding land use, climate change, importance of taxa relative to abundance, and evaluating the impact of conservation program habitat enhancement efforts. Published by Oxford University Press on behalf of Entomological Society of America 2016. This work is written by US Government employees and is in the public domain in the US.

  18. Vesicular monoamine transporter-1 (VMAT-1) mRNA and immunoreactive proteins in mouse brain.

    PubMed

    Ashe, Karen M; Chiu, Wan-Ling; Khalifa, Ahmed M; Nicolas, Antoine N; Brown, Bonnie L; De Martino, Randall R; Alexander, Clayton P; Waggener, Christopher T; Fischer-Stenger, Krista; Stewart, Jennifer K

    2011-01-01

    Vesicular monoamine transporter 1 (VMAT-1) mRNA and protein were examined (1) to determine whether adult mouse brain expresses full-length VMAT-1 mRNA that can be translated to functional transporter protein and (2) to compare immunoreactive VMAT-1 proteins in brain and adrenal. VMAT-1 mRNA was detected in mouse brain with RT-PCR. The cDNA was sequenced, cloned into an expression vector, transfected into COS-1 cells, and cell protein was assayed for VMAT-1 activity. Immunoreactive proteins were examined on western blots probed with four different antibodies to VMAT-1. Sequencing confirmed identity of the entire coding sequences of VMAT-1 cDNA from mouse medulla oblongata/pons and adrenal to a Gen-Bank reference sequence. Transfection of the brain cDNA into COS-1 cells resulted in transporter activity that was blocked by the VMAT inhibitor reserpine and a proton ionophore, but not by tetrabenazine, which has a high affinity for VMAT-2. Antibodies to either the C- or N- terminus of VMAT-1 detected two proteins (73 and 55 kD) in transfected COS-1 cells. The C-terminal antibodies detected both proteins in extracts of mouse medulla/pons, cortex, hypothalamus, and cerebellum but only the 73 kD protein and higher molecular weight immunoreactive proteins in mouse adrenal and rat PC12 cells, which are positive controls for rodent VMAT-1. These findings demonstrate that a functional VMAT-1 mRNA coding sequence is expressed in mouse brain and suggest processing of VMAT-1 protein differs in mouse adrenal and brain.

  19. Geoseq: a tool for dissecting deep-sequencing datasets.

    PubMed

    Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi

    2010-10-12

    Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  20. Novel primers for complete mitochondrial cytochrome b genesequencing in mammals

    USGS Publications Warehouse

    Naidu, Ashwin; Fitak, Robert R.; Munguia-Vega, Adrian; Culver, Melanie

    2011-01-01

    Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sample are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identified a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submitted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification.

  1. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  2. Acquisition of New DNA Sequences After Infection of Chicken Cells with Avian Myeloblastosis Virus

    PubMed Central

    Shoyab, M.; Baluda, M. A.; Evans, R.

    1974-01-01

    DNA-RNA hybridization studies between 70S RNA from avian myeloblastosis virus (AMV) and an excess of DNA from (i) AMV-induced leukemic chicken myeloblasts or (ii) a mixture of normal and of congenitally infected K-137 chicken embryos producing avian leukosis viruses revealed the presence of fast- and slow-hybridizing virus-specific DNA sequences. However, the leukemic cells contained twice the level of AMV-specific DNA sequences observed in normal chicken embryonic cells. The fast-reacting sequences were two to three times more numerous in leukemic DNA than in DNA from the mixed embryos. The slow-reacting sequences had a reiteration frequency of approximately 9 and 6, in the two respective systems. Both the fast- and the slow-reacting DNA sequences in leukemic cells exhibited a higher Tm (2 C) than the respective DNA sequences in normal cells. In normal and leukemic cells the slow hybrid sequences appeared to have a Tm which was 2 C higher than that of the fast hybrid sequences. Individual non-virus-producing chicken embryos, either group-specific antigen positive or negative, contained 40 to 100 copies of the fast sequences and 2 to 6 copies of the slowly hybridizing sequences per cell genome. Normal rat cells did not contain DNA that hybridized with AMV RNA, whereas non-virus-producing rat cells transformed by B-77 avian sarcoma virus contained only the slowly reacting sequences. The results demonstrate that leukemic cells transformed by AMV contain new AMV-specific DNA sequences which were not present before infection. PMID:16789139

  3. Cytochrome b sequences in black-crowned night-herons (Nycticorax nycticorax) from heronries exposed to genotoxic contaminants

    USGS Publications Warehouse

    Dahl, Christopher R.; Bickham, John W.; Wickliffe, Jeffery K.; Custer, Thomas W.

    2001-01-01

    DNA sequence analysis of a 215 base-pair region of the mitochondrial cytochrome b gene was used to examine genetic variation and search for evidence of an increased mutation rate in black-crowned night-herons. We examined five populations exposed to environmental contamination (primarily PAHs and PCBs) and one reference population from the eastern U.S. There was no evidence of a high mutation rate even within populations previously shown to exhibit increased variation in DNA content among somatic cells as a result of petroleum exposure. Three haplotypes were observed among 99 individuals. The low level of variability could be evidence for a genetic bottleneck, or that cytochrome b is too conservative for use in population genetic studies of this species. With the exception of one population from Louisiana, pair-wise Phist estimates were very low, indicative of little population structure and potentially high rates of effective migration among populations.

  4. Forensics and mitochondrial DNA: applications, debates, and foundations.

    PubMed

    Budowle, Bruce; Allard, Marc W; Wilson, Mark R; Chakraborty, Ranajit

    2003-01-01

    Debate on the validity and reliability of scientific methods often arises in the courtroom. When the government (i.e., the prosecution) is the proponent of evidence, the defense is obliged to challenge its admissibility. Regardless, those who seek to use DNA typing methodologies to analyze forensic biological evidence have a responsibility to understand the technology and its applications so a proper foundation(s) for its use can be laid. Mitochondrial DNA (mtDNA), an extranuclear genome, has certain features that make it desirable for forensics, namely, high copy number, lack of recombination, and matrilineal inheritance. mtDNA typing has become routine in forensic biology and is used to analyze old bones, teeth, hair shafts, and other biological samples where nuclear DNA content is low. To evaluate results obtained by sequencing the two hypervariable regions of the control region of the human mtDNA genome, one must consider the genetically related issues of nomenclature, reference population databases, heteroplasmy, paternal leakage, recombination, and, of course, interpretation of results. We describe the approaches, the impact some issues may have on interpretation of mtDNA analyses, and some issues raised in the courtroom.

  5. Harnessing mtDNA variation to resolve ambiguity in ‘Redfish’ sold in Europe

    PubMed Central

    Moore, Lauren; Pampoulie, Christophe; Di Muri, Cristina; Vandamme, Sara; Mariani, Stefano

    2017-01-01

    Morphology-based identification of North Atlantic Sebastes has long been controversial and misidentification may produce misleading data, with cascading consequences that negatively affect fisheries management and seafood labelling. North Atlantic Sebastes comprises of four species, commonly known as ‘redfish’, but little is known about the number, identity and labelling accuracy of redfish species sold across Europe. We used a molecular approach to identify redfish species from ‘blind’ specimens to evaluate the performance of the Barcode of Life (BOLD) and Genbank databases, as well as carrying out a market product accuracy survey from retailers across Europe. The conventional BOLD approach proved ambiguous, and phylogenetic analysis based on mtDNA control region sequences provided a higher resolution for species identification. By sampling market products from four countries, we found the presence of two species of redfish (S. norvegicus and S. mentella) and one unidentified Pacific rockfish marketed in Europe. Furthermore, public databases revealed the existence of inaccurate reference sequences, likely stemming from species misidentification from previous studies, which currently hinders the efficacy of DNA methods for the identification of Sebastes market samples. PMID:29018597

  6. Genomic resources for gene discovery, functional genome annotation, and evolutionary studies of maize and its close relatives.

    PubMed

    Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong

    2013-11-01

    Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.

  7. DNA Barcoding Green Microalgae Isolated from Neotropical Inland Waters

    PubMed Central

    Hadi, Sámed I. I. A.; Santana, Hugo; Brunale, Patrícia P. M.; Gomes, Taísa G.; Oliveira, Márcia D.; Matthiensen, Alexandre; Oliveira, Marcos E. C.; Silva, Flávia C. P.; Brasil, Bruno S. A. F.

    2016-01-01

    This study evaluated the feasibility of using the Ribulose Bisphosphate Carboxylase Large subunit gene (rbcL) and the Internal Transcribed Spacers 1 and 2 of the nuclear rDNA (nuITS1 and nuITS2) markers for identifying a very diverse, albeit poorly known group, of green microalgae from neotropical inland waters. Fifty-one freshwater green microalgae strains isolated from Brazil, the largest biodiversity reservoir in the neotropics, were submitted to DNA barcoding. Currently available universal primers for ITS1-5.8S-ITS2 region amplification were sufficient to successfully amplify and sequence 47 (92%) of the samples. On the other hand, new sets of primers had to be designed for rbcL, which allowed 96% of the samples to be sequenced. Thirty-five percent of the strains could be unambiguously identified to the species level based either on nuITS1 or nuITS2 sequences’ using barcode gap calculations. nuITS2 Compensatory Base Change (CBC) and ITS1-5.8S-ITS2 region phylogenetic analysis, together with morphological inspection, confirmed the identification accuracy. In contrast, only 6% of the strains could be assigned to the correct species based solely on rbcL sequences. In conclusion, the data presented here indicates that either nuITS1 or nuITS2 are useful markers for DNA barcoding of freshwater green microalgae, with advantage for nuITS2 due to the larger availability of analytical tools and reference barcodes deposited at databases for this marker. PMID:26900844

  8. DNA barcoding discriminates freshwater fishes from southeastern Nigeria and provides river system-level phylogeographic resolution within some species.

    PubMed

    Nwani, Christopher D; Becker, Sven; Braid, Heather E; Ude, Emmanuel F; Okogwu, Okechukwu I; Hanner, Robert

    2011-10-01

    Fishes are the main animal protein source for human beings and play a vital role in aquatic ecosystems and food webs. Fish identification can be challenging, especially in the tropics (due to high diversity), and this is particularly true for larval forms or fragmentary remains. DNA barcoding, which uses the 5' region of the mitochondrial cytochrome c oxidase subunit I (COI) as a target gene, is an efficient method for standardized species-level identification for biodiversity assessment and conservation, pending the establishment of reference sequence libraries. In this study, fishes were collected from three rivers in southeastern Nigeria, identified morphologically, and imaged digitally. DNA was extracted, PCR-amplified, and the standard barcode region was bidirectionally sequenced for 363 individuals belonging to 70 species in 38 genera. All specimen provenance data and associated sequence information were recorded in the barcode of life data systems (BOLD; www.barcodinglife.org ). Analytical tools on BOLD were used to assess the performance of barcoding to identify species. Using neighbor-joining distance comparison, the average genetic distance was 60-fold higher between species than within species, as pairwise genetic distance estimates averaged 10.29% among congeners and only 0.17% among conspecifics. Despite low levels of divergence within species, we observed river system-specific haplotype partitioning within eight species (11.4% of all species). Our preliminary results suggest that DNA barcoding is very effective for species identification of Nigerian freshwater fishes.

  9. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  10. Distortion of genetically modified organism quantification in processed foods: influence of particle size compositions and heat-induced DNA degradation.

    PubMed

    Moreano, Francisco; Busch, Ulrich; Engel, Karl-Heinz

    2005-12-28

    Milling fractions from conventional and transgenic corn were prepared at laboratory scale and used to study the influence of sample composition and heat-induced DNA degradation on the relative quantification of genetically modified organisms (GMO) in food products. Particle size distributions of the obtained fractions (coarse grits, regular grits, meal, and flour) were characterized using a laser diffraction system. The application of two DNA isolation protocols revealed a strong correlation between the degree of comminution of the milling fractions and the DNA yield in the extracts. Mixtures of milling fractions from conventional and transgenic material (1%) were prepared and analyzed via real-time polymerase chain reaction. Accurate quantification of the adjusted GMO content was only possible in mixtures containing conventional and transgenic material in the form of analogous milling fractions, whereas mixtures of fractions exhibiting different particle size distributions delivered significantly over- and underestimated GMO contents depending on their compositions. The process of heat-induced nucleic acid degradation was followed by applying two established quantitative assays showing differences between the lengths of the recombinant and reference target sequences (A, deltal(A) = -25 bp; B, deltal(B) = +16 bp; values related to the amplicon length of the reference gene). Data obtained by the application of method A resulted in underestimated recoveries of GMO contents in the samples of heat-treated products, reflecting the favored degradation of the longer target sequence used for the detection of the transgene. In contrast, data yielded by the application of method B resulted in increasingly overestimated recoveries of GMO contents. The results show how commonly used food technological processes may lead to distortions in the results of quantitative GMO analyses.

  11. An improved SELEX technique for selection of DNA aptamers binding to M-type 11 of Streptococcus pyogenes.

    PubMed

    Hamula, Camille L A; Peng, Hanyong; Wang, Zhixin; Tyrrell, Gregory J; Li, Xing-Fang; Le, X Chris

    2016-03-15

    Streptococcus pyogenes is a clinically important pathogen consisting of various serotypes determined by different M proteins expressed on the cell surface. The M type is therefore a useful marker to monitor the spread of invasive S. pyogenes in a population. Serotyping and nucleic acid amplification/sequencing methods for the identification of M types are laborious, inconsistent, and usually confined to reference laboratories. The primary objective of this work is to develop a technique that enables generation of aptamers binding to specific M-types of S. pyogenes. We describe here an in vitro technique that directly used live bacterial cells and the Systematic Evolution of Ligands by Exponential Enrichment (SELEX) strategy. Live S. pyogenes cells were incubated with DNA libraries consisting of 40-nucleotides randomized sequences. Those sequences that bound to the cells were separated, amplified using polymerase chain reaction (PCR), purified using gel electrophoresis, and served as the input DNA pool for the next round of SELEX selection. A specially designed forward primer containing extended polyA20/5Sp9 facilitated gel electrophoresis purification of ssDNA after PCR amplification. A counter-selection step using non-target cells was introduced to improve selectivity. DNA libraries of different starting sequence diversity (10(16) and 10(14)) were compared. Aptamer pools from each round of selection were tested for their binding to the target and non-target cells using flow cytometry. Selected aptamer pools were then cloned and sequenced. Individual aptamer sequences were screened on the basis of their binding to the 10 M-types that were used as targets. Aptamer pools obtained from SELEX rounds 5-8 showed high affinity to the target S. pyogenes cells. Tests against non-target Streptococcus bovis, Streptococcus pneumoniae, and Enterococcus species demonstrated selectivity of these aptamers for binding to S. pyogenes. Several aptamer sequences were found to bind preferentially to the M11 M-type of S. pyogenes. Estimated binding dissociation constants (Kd) were in the low nanomolar range for the M11 specific sequences; for example, sequence E-CA20 had a Kd of 7±1 nM. These affinities are comparable to those of a monoclonal antibody. The improved bacterial cell-SELEX technique is successful in generating aptamers selective for S. pyogenes and some of its M-types. These aptamers are potentially useful for detecting S. pyogenes, achieving binding profiles of the various M-types, and developing new M-typing technologies for non-specialized laboratories or point-of-care testing. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Rapid Bacterial Whole-Genome Sequencing to Enhance Diagnostic and Public Health Microbiology

    PubMed Central

    Reuter, Sandra; Ellington, Matthew J.; Cartwright, Edward J. P.; Köser, Claudio U.; Török, M. Estée; Gouliouris, Theodore; Harris, Simon R.; Brown, Nicholas M.; Holden, Matthew T. G.; Quail, Mike; Parkhill, Julian; Smith, Geoffrey P.; Bentley, Stephen D.; Peacock, Sharon J.

    2014-01-01

    IMPORTANCE The latest generation of benchtop DNA sequencing platforms can provide an accurate whole-genome sequence (WGS) for a broad range of bacteria in less than a day. These could be used to more effectively contain the spread of multidrug-resistant pathogens. OBJECTIVE To compare WGS with standard clinical microbiology practice for the investigation of nosocomial outbreaks caused by multidrug-resistant bacteria, the identification of genetic determinants of antimicrobial resistance, and typing of other clinically important pathogens. DESIGN, SETTING, AND PARTICIPANTS A laboratory-based study of hospital inpatients with a range of bacterial infections at Cambridge University Hospitals NHS Foundation Trust, a secondary and tertiary referral center in England, comparing WGS with standard diagnostic microbiology using stored bacterial isolates and clinical information. MAIN OUTCOMES AND MEASURES Specimens were taken and processed as part of routine clinical care, and cultured isolates stored and referred for additional reference laboratory testing as necessary. Isolates underwent DNA extraction and library preparation prior to sequencing on the Illumina MiSeq platform. Bioinformatic analyses were performed by persons blinded to the clinical, epidemiologic, and antimicrobial susceptibility data. RESULTS We investigated 2 putative nosocomial outbreaks, one caused by vancomycin-resistant Enterococcus faecium and the other by carbapenem-resistant Enterobacter cloacae; WGS accurately discriminated between outbreak and nonoutbreak isolates and was superior to conventional typing methods. We compared WGS with standard methods for the identification of the mechanism of carbapenem resistance in a range of gram-negative bacteria (Acinetobacter baumannii, E cloacae, Escherichia coli, and Klebsiella pneumoniae). This demonstrated concordance between phenotypic and genotypic results, and the ability to determine whether resistance was attributable to the presence of carbapenemases or other resistance mechanisms. Whole-genome sequencing was used to recapitulate reference laboratory typing of clinical isolates of Neisseria meningitidis and to provide extended phylogenetic analyses of these. CONCLUSIONS AND RELEVANCE The speed, accuracy, and depth of information provided by WGS platforms to confirm or refute outbreaks in hospitals and the community, and to accurately define transmission of multidrug-resistant and other organisms, represents an important advance. PMID:23857503

  13. Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq.

    PubMed

    Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G

    2014-11-29

    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.

  14. Genomic characterization of Indian isolates of egg drop syndrome 1976 virus.

    PubMed

    Raj, G D; Sivakumar, S; Sudharsan, S; Mohan, A C; Nachimuthu, K

    2001-02-01

    Five Indian isolates of egg drop syndrome (EDS) 1976 virus and the reference strain 127 were compared by restriction enzyme analysis of viral DNA, and the hexon gene amplified by polymerase chain reaction. Using these techniques, no differences were seen among these viruses. However, partial sequencing of the hexon gene revealed major differences (4.6%) in one of the isolates sequenced, EDS Kerala. Phylogenetic analysis also placed this isolate in a different lineage compared with the other isolates. The need for constant monitoring of the genetic nature of the field isolates of EDS viruses is emphasized.

  15. Chromosome-encoded narrow-spectrum Ambler class A beta-lactamase GIL-1 from Citrobacter gillenii.

    PubMed

    Naas, Thierry; Aubert, Daniel; Ozcan, Ayla; Nordmann, Patrice

    2007-04-01

    A novel beta-lactamase gene was cloned from the whole-cell DNA of an enterobacterial Citrobacter gillenii reference strain that displayed a weak narrow-spectrum beta-lactam-resistant phenotype and was expressed in Escherichia coli. It encoded a clavulanic acid-inhibited Ambler class A beta-lactamase, GIL-1, with a pI value of 7.5 and a molecular mass of ca. 29 kDa. GIL-1 had the highest percent amino acid sequence identity with TEM-1 and SHV-1, 77%, and 67%, respectively, and only 46%, 31%, and 32% amino acid sequence identity with CKO-1 (C. koseri), CdiA1 (C. diversus), and SED-1 (C. sedlaki), respectively. The substrate profile of the purified GIL-1 was similar to that of beta-lactamases TEM-1 and SHV-1. The blaGIL-1 gene was chromosomally located, as revealed by I-CeuI experiments, and was constitutively expressed at a low level in C. gillenii. No gene homologous to the regulatory ampR genes of chromosomal class C beta-lactamases was found upstream of the blaGIL-1 gene, which fits the noninducibility of beta-lactamase expression in C. gillenii. Rapid amplification of DNA 5' ends analysis of the promoter region revealed putative promoter sequences that diverge from what has been identified as the consensus sequence in E. coli. The blaGIL-1 gene was part of a 5.5-kb DNA fragment bracketed by a 9-bp duplication and inserted between the d-lactate dehydrogenase gene and the ydbH genes; this DNA fragment was absent in other Citrobacter species. This work further illustrates the heterogeneity of beta-lactamases in Citrobacter spp., which may indicate that the variability of Citrobacter species is greater than expected.

  16. Verrucosispora sonchi sp. nov., a novel endophytic actinobacterium isolated from the leaves of common sowthistle (Sonchus oleraceus L.).

    PubMed

    Ma, Zhaoxu; Zhao, Shanshan; Cao, Tingting; Liu, Chongxi; Huang, Ying; Gao, Yuhang; Yan, Kai; Xiang, Wensheng; Wang, Xiangjing

    2016-12-01

    A novel actinobacterium, designated strain NEAU-QY3T, was isolated from the leaves of Sonchus oleraceus L. and examined using a polyphasic taxonomic approach. The organism formed single spores with smooth surface on substrate mycelia. Phylogenetic analysis based on the 16S rRNA gene sequence indicated that the strain had a close association with the genus Verrucosispora and shared the highest sequence similarity with Verrucosispora qiuiae RtIII47T (99.17 %), an association that was supported by a bootstrap value of 94 % in the neighbour-joining tree and also recovered with the maximum-likelihood algorithm. The strain also showed high 16S rRNA gene sequence similarities to Xiangella phaseoli NEAU-J5T (98.78 %), Jishengella endophytica 202201T (98.51 %), Micromonospora eburnea LK2-10T (98.28 %), Verrucosispora lutea YIM 013T (98.23 %) and Salinispora pacifica CNR-114T (98.23 %). Furthermore, phylogenetic analysis based on the gyrB gene sequences supported the conclusion that strain NEAU-QY3T should be assigned to the genus Verrucosispora. However, the DNA-DNA hybridization relatedness values between strain NEAU-QY3T and V. qiuiae RtIII47T and V. lutea YIM 013T were below 70 %. With reference to phenotypic characteristics, phylogenetic data and DNA-DNA hybridization results, strain NEAU-QY3T was readily distinguished from its most closely related strains and classified as a new species, for which the name Verrucosispora sonchi sp. nov. is proposed. The type strain is NEAU-QY3T (=CGMCC 4.7312T=DSM 101530T).

  17. Structural and thermodynamic insight into E. coli UvrABC mediated incision of cluster di-acetylaminofluorene adducts on the NarI sequence

    PubMed Central

    Jain, Vipin; Hilton, Benjamin; Lin, Bin; Jain, Anshu; MacKerell, Alexander D.; Zou, Yue; Cho, Bongsup P.

    2014-01-01

    Cluster DNA damage refers to two or more lesions in a single turn of the DNA helix. Such clustering may occur with bulky DNA lesions, which may be responsible for their sequence dependent repair and mutational outcomes. Here we prepared three 16-mer cluster duplexes in which two fluoroacetylaminofluorene adducts (dG-FAAF) are separated by none, one and two nucleotides in the E. coli NarI mutational hot spot (5'-CTCTCG1G2CG3CCATCAC-3'): i.e. 5'-- CG1*G2*CG3CC--3', 5'--CG1G2*CG3*CC--3', and 5'--CG1*G2CG3*CC--3' [G*=dG-FAAF], respectively. We conducted spectroscopic, thermodynamic, and molecular dynamics studies of these di-FAAF duplexes and the results were compared with those of the corresponding mono- FAAF adducts in the same NarI sequence (Nucleic Acids Res. 2012, 3939–3951). Our nucleotide excision repair results showed greater reparability of the di-adducts in comparison to the corresponding mono-adducts. Moreover, we observed dramatic flanking base sequence effects on their repair efficiency in the order of NarI-G2G3 > -G1G3 > -G1G2. The NMR/CD/UV-melting and MD-simulation results revealed that in contrast to the mono-adducts, di-adducts produced synergistic effect on duplex destabilization. In addition, dG-FAAF at G2G3 and G1G3 destack the neighboring bases with greater destabilization occurring with the former. Overall, the results indicate the importance of base stacking and related thermal/thermodynamic destabilization in the repair of bulky cluster arylamine DNA adducts. PMID:23841451

  18. Optically Mapping Multiple Bacterial Genomes Simultaneously in a Single Run

    DTIC Science & Technology

    2011-11-21

    sequence orientation. We have demonstrated mapping of Shigella dysenteriae and Escherichia coli simultaneously, despite their very close phylogenetic...relationship ( Shigella and Escherichia coli are generally considered to be within a single species, but are segregated at the genus level for historical...reasons [4]); two clones of Shigella would likely not map together successfully using the mixed DNA method. Similarly, based on reference maps being

  19. Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.

    PubMed

    Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N

    1984-03-26

    The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic DNA methylation between sperm and oocyte DNA. The methylation levels of the minor satellite sequences did not change during spermiogenesis, and were not associated with the onset of meiosis or a specific stage in sperm development.

  20. Process of labeling specific chromosomes using recombinant repetitive DNA

    DOEpatents

    Moyzis, R.K.; Meyne, J.

    1988-02-12

    Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.

  1. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  2. Systematic study of the genus Acetobacter with descriptions of Acetobacter indonesiensis sp. nov., Acetobacter tropicalis sp. nov., Acetobacter orleanensis (Henneberg 1906) comb. nov., Acetobacter lovaniensis (Frateur 1950) comb. nov., and Acetobacter estunensis (Carr 1958) comb. nov.

    PubMed

    Lisdiyanti, Puspita; Kawasaki, Hiroko; Seki, Tatsuji; Yamada, Yuzo; Uchimura, Tai; Komagata, Kazuo

    2000-06-01

    Thirty-one Acetobacter strains obtained from culture collections and 45 Acetobacter strains isolated from Indonesian sources were investigated for their phenotypic characteristics, ubiquinone systems, DNA base compositions, and levels of DNA-DNA relatedness. Of 31 reference strains, six showed the presence of ubiquinone 10 (Q-10). These strains were eliminated from the genus Acetobacter. The other 25 reference strains and 45 Indonesian isolates were subjected to a systematic study and separated into 8 distinct groups on the basis of DNA-DNA relatedness. The known species, Acetobacter aceti, A. pasteurianus, and A. peroxydans are retained for three of these groups. New combinations, A. orleanensis (Henneberg 1906) comb. nov., A. lovaniensis (Frateur 1950) comb. nov., and A. estunensis (Carr 1958) comb. nov. are proposed for three other groups. Two new species, A. indonesiensis sp. nov. and A. tropicalis sp. nov. are proposed for the remaining two. No Indonesian isolates were identified as A. aceti, A. estunensis, and A. peroxydans. Phylogenetic analysis on the basis of 16S rDNA sequences was carried out for representative strains from each of the groups. This supported that the eight species belonged to the genus Acetobacter. Several strains previously assigned to the species of A. aceti and A. pasteurianus were scattered over the different species. It is evident that the value of DNA-DNA relatedness between strains comprising a new species should be determined for the establishment of the species. Thus current bacterial species without data of DNA-DNA relatedness should be reexamined for the stability of bacterial nomenclature.

  3. Identification of cephalopod species from the North and Baltic Seas using morphology, COI and 18S rDNA sequences

    NASA Astrophysics Data System (ADS)

    Gebhardt, Katharina; Knebelsberger, Thomas

    2015-09-01

    We morphologically analyzed 79 cephalopod specimens from the North and Baltic Seas belonging to 13 separate species. Another 29 specimens showed morphological features of either Alloteuthis mediaor Alloteuthis subulata or were found to be in between. Reliable identification features to distinguish between A. media and A. subulata are currently not available. The analysis of the DNA barcoding region of the COI gene revealed intraspecific distances (uncorrected p) ranging from 0 to 2.13 % (average 0.1 %) and interspecific distances between 3.31 and 22 % (average 15.52 %). All species formed monophyletic clusters in a neighbor-joining analysis and were supported by bootstrap values of ≥99 %. All COI haplotypes belonging to the 29 Alloteuthis specimens were grouped in one cluster. Neither COI nor 18S rDNA sequences helped to distinguish between the different Alloteuthis morphotypes. For species identification purposes, we recommend the use of COI, as it showed higher bootstrap support of species clusters and less amplification and sequencing failure compared to 18S. Our data strongly support the assumption that the genus Alloteuthis is only represented by a single species, at least in the North Sea. It remained unclear whether this species is A. subulata or A. media. All COI sequences including important metadata were uploaded to the Barcode of Life Data Systems and can be used as reference library for the molecular identification of more than 50 % of the cephalopod fauna known from the North and Baltic Seas.

  4. Enlightenment of Yeast Mitochondrial Homoplasmy: Diversified Roles of Gene Conversion

    PubMed Central

    Ling, Feng; Mikawa, Tsutomu; Shibata, Takehiko

    2011-01-01

    Mitochondria have their own genomic DNA. Unlike the nuclear genome, each cell contains hundreds to thousands of copies of mitochondrial DNA (mtDNA). The copies of mtDNA tend to have heterogeneous sequences, due to the high frequency of mutagenesis, but are quickly homogenized within a cell (“homoplasmy”) during vegetative cell growth or through a few sexual generations. Heteroplasmy is strongly associated with mitochondrial diseases, diabetes and aging. Recent studies revealed that the yeast cell has the machinery to homogenize mtDNA, using a common DNA processing pathway with gene conversion; i.e., both genetic events are initiated by a double-stranded break, which is processed into 3′ single-stranded tails. One of the tails is base-paired with the complementary sequence of the recipient double-stranded DNA to form a D-loop (homologous pairing), in which repair DNA synthesis is initiated to restore the sequence lost by the breakage. Gene conversion generates sequence diversity, depending on the divergence between the donor and recipient sequences, especially when it occurs among a number of copies of a DNA sequence family with some sequence variations, such as in immunoglobulin diversification in chicken. MtDNA can be regarded as a sequence family, in which the members tend to be diversified by a high frequency of spontaneous mutagenesis. Thus, it would be interesting to determine why and how double-stranded breakage and D-loop formation induce sequence homogenization in mitochondria and sequence diversification in nuclear DNA. We will review the mechanisms and roles of mtDNA homoplasmy, in contrast to nuclear gene conversion, which diversifies gene and genome sequences, to provide clues toward understanding how the common DNA processing pathway results in such divergent outcomes. PMID:24710143

  5. "First generation" automated DNA sequencing technology.

    PubMed

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  6. Influence of DNA sequence on the structure of minicircles under torsional stress

    PubMed Central

    Wang, Qian; Irobalieva, Rossitza N.; Chiu, Wah; Schmid, Michael F.; Fogg, Jonathan M.; Zechiedrich, Lynn

    2017-01-01

    Abstract The sequence dependence of the conformational distribution of DNA under various levels of torsional stress is an important unsolved problem. Combining theory and coarse-grained simulations shows that the DNA sequence and a structural correlation due to topology constraints of a circle are the main factors that dictate the 3D structure of a 336 bp DNA minicircle under torsional stress. We found that DNA minicircle topoisomers can have multiple bend locations under high torsional stress and that the positions of these sharp bends are determined by the sequence, and by a positive mechanical correlation along the sequence. We showed that simulations and theory are able to provide sequence-specific information about individual DNA minicircles observed by cryo-electron tomography (cryo-ET). We provided a sequence-specific cryo-ET tomogram fitting of DNA minicircles, registering the sequence within the geometric features. Our results indicate that the conformational distribution of minicircles under torsional stress can be designed, which has important implications for using minicircle DNA for gene therapy. PMID:28609782

  7. Analysis of DNA Sequences by an Optical Time-Integrating Correlator: Proof-of-Concept Experiments.

    DTIC Science & Technology

    1992-05-01

    DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0 CUSTOM GENERATORS FOR DNA SEQUENCES 10 3.1 Hardware Design 10...of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5 Figure 4: Coarse analysis of a DNA sequence. 7 Figure 5: Fine...a 20-bases long database. 32 xiii LIST OF TABLES PAGE Table 1: Short representations of the DNA bases where each base is represented by 7-bits long

  8. Comparative analysis of Edwardsiella isolates from fish in the eastern United States identifies two distinct genetic taxa amongst organisms phenotypically classified as E. tarda

    USGS Publications Warehouse

    Griffin, Matt J.; Quiniou, Sylvie M.; Cody, Theresa; Tabuchi, Maki; Ware, Cynthia; Cipriano, Rocco C.; Mauel, Michael J.; Soto, Esteban

    2013-01-01

    Edwardsiella tarda, a Gram-negative member of the family Enterobacteriaceae, has been implicated in significant losses in aquaculture facilities worldwide. Here, we assessed the intra-specific variability of E. tarda isolates from 4 different fish species in the eastern United States. Repetitive sequence mediated PCR (rep-PCR) using 4 different primer sets (ERIC I & II, ERIC II, BOX, and GTG5) and multi-locus sequence analysis of 16S SSU rDNA, groEl, gyrA, gyrB, pho, pgi, pgm, and rpoA gene fragments identified two distinct genotypes of E. tarda (DNA group I; DNA group II). Isolates that fell into DNA group II demonstrated more similarity to E. ictaluri than DNA group I, which contained the reference E. tarda strain (ATCC #15947). Conventional PCR analysis using published E. tarda-specific primer sets yielded variable results, with several primer sets producing no observable amplification of target DNA from some isolates. Fluorometric determination of G + C content demonstrated 56.4% G + C content for DNA group I, 60.2% for DNA group II, and 58.4% for E. ictaluri. Surprisingly, these isolates were indistinguishable using conventional biochemical techniques, with all isolates demonstrating phenotypic characteristics consistent with E. tarda. Analysis using two commercial test kits identified multiple phenotypes, although no single metabolic characteristic could reliably discriminate between genetic groups. Additionally, anti-microbial susceptibility and fatty acid profiles did not demonstrate remarkable differences between groups. The significant genetic variation (<90% similarity at gyrA, gyrB, pho, phi and pgm; <40% similarity by rep-PCR) between these groups suggests organisms from DNA group II may represent an unrecognized, genetically distinct taxa of Edwardsiella that is phenotypically indistinguishable from E. tarda.

  9. Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Zhu, Y. F.; Chung, C. N.; Allman, S. L.

    1997-05-01

    Since laser mass spectrometry has the potential for achieving very fast DNA analysis, we recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Sanger's enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. Our preliminary results indicate laser mass spectrometry can possible be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, we applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.

  10. Analysis of Natural and Induced Variation in Tomato Glandular Trichome Flavonoids Identifies a Gene Not Present in the Reference Genome[W][OPEN

    PubMed Central

    Kim, Jeongwoon; Matsuba, Yuki; Ning, Jing; Schilmiller, Anthony L.; Hammar, Dagan; Jones, A. Daniel; Pichersky, Eran; Last, Robert L.

    2014-01-01

    Flavonoids are ubiquitous plant aromatic specialized metabolites found in a variety of cell types and organs. Methylated flavonoids are detected in secreting glandular trichomes of various Solanum species, including the cultivated tomato (Solanum lycopersicum). Inspection of the sequenced S. lycopersicum Heinz 1706 reference genome revealed a close homolog of Solanum habrochaites MOMT1 3′/5′ myricetin O-methyltransferase gene, but this gene (Solyc06g083450) is missing the first exon, raising the question of whether cultivated tomato has a distinct 3′ or 3′/5′ O-methyltransferase. A combination of mining genome and cDNA sequences from wild tomato species and S. lycopersicum cultivar M82 led to the identification of Sl-MOMT4 as a 3′ O-methyltransferase. In parallel, three independent ethyl methanesulfonate mutants in the S. lycopersicum cultivar M82 background were identified as having reduced amounts of di- and trimethylated myricetins and increased monomethylated myricetin. Consistent with the hypothesis that Sl-MOMT4 is a 3′ O-methyltransferase gene, all three myricetin methylation defective mutants were found to have defects in MOMT4 sequence, transcript accumulation, or 3′-O-methyltransferase enzyme activity. Surprisingly, no MOMT4 sequence is found in the Heinz 1706 reference genome sequence, and this cultivar accumulates 3-methyl myricetin and is deficient in 3′-methyl myricetins, demonstrating variation in this gene among cultivated tomato varieties. PMID:25128240

  11. My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing.

    PubMed

    Van Neste, Christophe; Vandewoestyne, Mado; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2014-03-01

    Forensic scientists are currently investigating how to transition from capillary electrophoresis (CE) to massive parallel sequencing (MPS) for analysis of forensic DNA profiles. MPS offers several advantages over CE such as virtually unlimited multiplexy of loci, combining both short tandem repeat (STR) and single nucleotide polymorphism (SNP) loci, small amplicons without constraints of size separation, more discrimination power, deep mixture resolution and sample multiplexing. We present our bioinformatic framework My-Forensic-Loci-queries (MyFLq) for analysis of MPS forensic data. For allele calling, the framework uses a MySQL reference allele database with automatically determined regions of interest (ROIs) by a generic maximal flanking algorithm which makes it possible to use any STR or SNP forensic locus. Python scripts were designed to automatically make allele calls starting from raw MPS data. We also present a method to assess the usefulness and overall performance of a forensic locus with respect to MPS, as well as methods to estimate whether an unknown allele, which sequence is not present in the MySQL database, is in fact a new allele or a sequencing error. The MyFLq framework was applied to an Illumina MiSeq dataset of a forensic Illumina amplicon library, generated from multilocus STR polymerase chain reaction (PCR) on both single contributor samples and multiple person DNA mixtures. Although the multilocus PCR was not yet optimized for MPS in terms of amplicon length or locus selection, the results show excellent results for most loci. The results show a high signal-to-noise ratio, correct allele calls, and a low limit of detection for minor DNA contributors in mixed DNA samples. Technically, forensic MPS affords great promise for routine implementation in forensic genomics. The method is also applicable to adjacent disciplines such as molecular autopsy in legal medicine and in mitochondrial DNA research. Copyright © 2013 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  12. Micronuclear DNA of Oxytricha nova contains sequences with autonomously replicating activity in Saccharomyces cerevisiae.

    PubMed Central

    Colombo, M M; Swanton, M T; Donini, P; Prescott, D M

    1984-01-01

    Oxytricha nova is a hypotrichous ciliate with micronuclei and macronuclei. Micronuclei, which contain large, chromosomal-sized DNA, are genetically inert but undergo meiosis and exchange during cell mating. Macronuclei, which contain only small, gene-sized DNA molecules, provide all of the nuclear RNA needed to run the cell. After cell mating the macronucleus is derived from a micronucleus, a derivation that includes excision of the genes from chromosomes and elimination of the remaining DNA. The eliminated DNA includes all of the repetitious sequences and approximately 95% of the unique sequences. We cloned large restriction fragments from the micronucleus that confer replication ability on a replication-deficient plasmid in Saccharomyces cerevisiae. Sequences that confer replication ability are called autonomously replicating sequences. The frequency and effectiveness of autonomously replicating sequences in micronuclear DNA are similar to those reported for DNAs of other organisms introduced into yeast cells. Of the 12 micronuclear fragments with autonomously replicating sequence activity, 9 also showed homology to macronuclear DNA, indicating that they contain a macronuclear gene sequence. We conclude from this that autonomously replicating sequence activity is nonrandomly distributed throughout micronuclear DNA and is preferentially associated with those regions of micronuclear DNA that contain genes. Images PMID:6092934

  13. DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation

    PubMed Central

    Boedicker, James Q.; Garcia, Hernan G.; Johnson, Stephanie; Phillips, Rob

    2014-01-01

    As the chief informational molecule of life, DNA is subject to extensive physical manipulations. The energy required to deform double-helical DNA depends on sequence, and this mechanical code of DNA influences gene regulation, such as through nucleosome positioning. Here we examine the sequence-dependent flexibility of DNA in bacterial transcription factor-mediated looping, a context for which the role of sequence remains poorly understood. Using a suite of synthetic constructs repressed by the Lac repressor and two well-known sequences that show large flexibility differences in vitro, we make precise statistical mechanical predictions as to how DNA sequence influences loop formation and test these predictions using in vivo transcription and in vitro single-molecule assays. Surprisingly, sequence-dependent flexibility does not affect in vivo gene regulation. By theoretically and experimentally quantifying the relative contributions of sequence and the DNA-bending protein HU to DNA mechanical properties, we reveal that bending by HU dominates DNA mechanics and masks intrinsic sequence-dependent flexibility. Such a quantitative understanding of how mechanical regulatory information is encoded in the genome will be a key step towards a predictive understanding of gene regulation at single-base pair resolution. PMID:24231252

  14. Reconstructing a herbivore’s diet using a novel rbcL DNA mini-barcode for plants

    USGS Publications Warehouse

    Erickson, David L.; Reed, Elizabeth; Ramachandran, Padmini; Bourg, Norman; McShea, William J.; Ottesen, Andrea

    2017-01-01

    Next Generation Sequencing and the application of metagenomic analyses can be used to answer questions about animal diet choice and study the consequences of selective foraging by herbivores. The quantification of herbivore diet choice with respect to native versus exotic plant species is particularly relevant given concerns of invasive species establishment and their effects on ecosystems. While increased abundance of white-tailed deer (Odocoileus virginianus) appears to correlate with increased incidence of invasive plant species, data supporting a causal link is scarce. We used a metabarcoding approach (PCR amplicons of the plant rbcL gene) to survey the diet of white-tailed deer (fecal samples), from a forested site in Warren County, Virginia with a comprehensive plant species inventory and corresponding reference collection of plant barcode and chloroplast sequences. We sampled fecal pellet piles and extracted DNA from 12 individual deer in October 2014. These samples were compared to a reference DNA library of plant species collected within the study area. For 72 % of the amplicons, we were able to assign taxonomy at the species level, which provides for the first time—sufficient taxonomic resolution to quantify the relative frequency at which native and exotic plant species are being consumed by white-tailed deer. For each of the 12 individual deer we collected three subsamples from the same fecal sample, resulting in sequencing 36 total samples. Using Qiime, we quantified the plant DNA found in all 36 samples, and found that variance within samples was less than variance between samples (F = 1.73, P = 0.004), indicating additional subsamples may not be necessary. Species level diversity ranged from 60 to 93 OTUs per individual and nearly 70 % of all plant sequences recovered were from native plant species. The number of species detected did reduce significantly (range 4–12) when we excluded species whose OTU composed <1 % of each sample’s total. When compared to the abundance of native and non-natives plants inventoried in the local community, our results support the observation that white-tailed deer have strong foraging preferences, but these preferences were not consistent for species in either class. Deer forage behaviour may favour some exotic species, but not all.

  15. Reconstructing a herbivore's diet using a novel rbcL DNA mini-barcode for plants.

    PubMed

    Erickson, David L; Reed, Elizabeth; Ramachandran, Padmini; Bourg, Norman A; McShea, William J; Ottesen, Andrea

    2017-05-01

    Next Generation Sequencing and the application of metagenomic analyses can be used to answer questions about animal diet choice and study the consequences of selective foraging by herbivores. The quantification of herbivore diet choice with respect to native versus exotic plant species is particularly relevant given concerns of invasive species establishment and their effects on ecosystems. While increased abundance of white-tailed deer ( Odocoileus virginianus ) appears to correlate with increased incidence of invasive plant species, data supporting a causal link is scarce. We used a metabarcoding approach (PCR amplicons of the plant rbc L gene) to survey the diet of white-tailed deer (fecal samples), from a forested site in Warren County, Virginia with a comprehensive plant species inventory and corresponding reference collection of plant barcode and chloroplast sequences. We sampled fecal pellet piles and extracted DNA from 12 individual deer in October 2014. These samples were compared to a reference DNA library of plant species collected within the study area. For 72 % of the amplicons, we were able to assign taxonomy at the species level, which provides for the first time-sufficient taxonomic resolution to quantify the relative frequency at which native and exotic plant species are being consumed by white-tailed deer. For each of the 12 individual deer we collected three subsamples from the same fecal sample, resulting in sequencing 36 total samples. Using Qiime, we quantified the plant DNA found in all 36 samples, and found that variance within samples was less than variance between samples ( F  = 1.73, P  = 0.004), indicating additional subsamples may not be necessary. Species level diversity ranged from 60 to 93 OTUs per individual and nearly 70 % of all plant sequences recovered were from native plant species. The number of species detected did reduce significantly (range 4-12) when we excluded species whose OTU composed <1 % of each sample's total. When compared to the abundance of native and non-natives plants inventoried in the local community, our results support the observation that white-tailed deer have strong foraging preferences, but these preferences were not consistent for species in either class. Deer forage behaviour may favour some exotic species, but not all.

  16. Reconstructing a herbivore’s diet using a novel rbcL DNA mini-barcode for plants

    PubMed Central

    Erickson, David L.; Reed, Elizabeth; Ramachandran, Padmini; Bourg, Norman A.; Ottesen, Andrea

    2017-01-01

    Abstract Next Generation Sequencing and the application of metagenomic analyses can be used to answer questions about animal diet choice and study the consequences of selective foraging by herbivores. The quantification of herbivore diet choice with respect to native versus exotic plant species is particularly relevant given concerns of invasive species establishment and their effects on ecosystems. While increased abundance of white-tailed deer (Odocoileus virginianus) appears to correlate with increased incidence of invasive plant species, data supporting a causal link is scarce. We used a metabarcoding approach (PCR amplicons of the plant rbcL gene) to survey the diet of white-tailed deer (fecal samples), from a forested site in Warren County, Virginia with a comprehensive plant species inventory and corresponding reference collection of plant barcode and chloroplast sequences. We sampled fecal pellet piles and extracted DNA from 12 individual deer in October 2014. These samples were compared to a reference DNA library of plant species collected within the study area. For 72 % of the amplicons, we were able to assign taxonomy at the species level, which provides for the first time—sufficient taxonomic resolution to quantify the relative frequency at which native and exotic plant species are being consumed by white-tailed deer. For each of the 12 individual deer we collected three subsamples from the same fecal sample, resulting in sequencing 36 total samples. Using Qiime, we quantified the plant DNA found in all 36 samples, and found that variance within samples was less than variance between samples (F = 1.73, P = 0.004), indicating additional subsamples may not be necessary. Species level diversity ranged from 60 to 93 OTUs per individual and nearly 70 % of all plant sequences recovered were from native plant species. The number of species detected did reduce significantly (range 4–12) when we excluded species whose OTU composed <1 % of each sample’s total. When compared to the abundance of native and non-natives plants inventoried in the local community, our results support the observation that white-tailed deer have strong foraging preferences, but these preferences were not consistent for species in either class. Deer forage behaviour may favour some exotic species, but not all. PMID:28533898

  17. Divergent nuclear 18S rDNA paralogs in a turkey coccidium, Eimeria meleagrimitis, complicate molecular systematics and identification.

    PubMed

    El-Sherry, Shiem; Ogedengbe, Mosun E; Hafeez, Mian A; Barta, John R

    2013-07-01

    Multiple 18S rDNA sequences were obtained from two single-oocyst-derived lines of each of Eimeria meleagrimitis and Eimeria adenoeides. After analysing the 15 new 18S rDNA sequences from two lines of E. meleagrimitis and 17 new sequences from two lines of E. adenoeides, there were clear indications that divergent, paralogous 18S rDNA copies existed within the nuclear genome of E. meleagrimitis. In contrast, mitochondrial cytochrome c oxidase subunit I (COI) partial sequences from all lines of a particular Eimeria sp. were identical and, in phylogenetic analyses, COI sequences clustered unambiguously in monophyletic and highly-supported clades specific to individual Eimeria sp. Phylogenetic analysis of the new 18S rDNA sequences from E. meleagrimitis showed that they formed two distinct clades: Type A with four new sequences; and Type B with nine new sequences; both Types A and B sequences were obtained from each of the single-oocyst-derived lines of E. meleagrimitis. Together these rDNA types formed a well-supported E. meleagrimitis clade. Types A and B 18S rDNA sequences from E. meleagrimitis had a mean sequence identity of only 97.4% whereas mean sequence identity within types was 99.1-99.3%. The observed intraspecific sequence divergence among E. meleagrimitis 18S rDNA sequence types was even higher (approximately 2.6%) than the interspecific sequence divergence present between some well-recognized species such as Eimeria tenella and Eimeria necatrix (1.1%). Our observations suggest that, unlike COI sequences, 18S rDNA sequences are not reliable molecular markers to be used alone for species identification with coccidia, although 18S rDNA sequences have clear utility for phylogenetic reconstruction of apicomplexan parasites at the genus and higher taxonomic ranks. Copyright © 2013. Published by Elsevier Ltd.

  18. Identification of Rays through DNA Barcoding: An Application for Ecologists

    PubMed Central

    Cerutti-Pereyra, Florencia; Meekan, Mark G.; Wei, Nu-Wei V.; O'Shea, Owen; Bradshaw, Corey J. A.; Austin, Chris M.

    2012-01-01

    DNA barcoding potentially offers scientists who are not expert taxonomists a powerful tool to support the accuracy of field studies involving taxa that are diverse and difficult to identify. The taxonomy of rays has received reasonable attention in Australia, although the fauna in remote locations such as Ningaloo Reef, Western Australia is poorly studied and the identification of some species in the field is problematic. Here, we report an application of DNA-barcoding to the identification of 16 species (from 10 genera) of tropical rays as part of an ecological study. Analysis of the dataset combined across all samples grouped sequences into clearly defined operational taxonomic units, with two conspicuous exceptions: the Neotrygon kuhlii species complex and the Aetobatus species complex. In the field, the group that presented the most difficulties for identification was the spotted whiptail rays, referred to as the ‘uarnak’ complex. Two sets of problems limited the successful application of DNA barcoding: (1) the presence of cryptic species, species complexes with unresolved taxonomic status and intra-specific geographical variation, and (2) insufficient numbers of entries in online databases that have been verified taxonomically, and the presence of lodged sequences in databases with inconsistent names. Nevertheless, we demonstrate the potential of the DNA barcoding approach to confirm field identifications and to highlight species complexes where taxonomic uncertainty might confound ecological data. PMID:22701556

  19. Complementary DNA cloning of the pear 1-aminocyclopropane-1-carboxylic acid oxidase gene and agrobacterium-mediated anti-sense genetic transformation.

    PubMed

    Qi, Jing; Dong, Zhen; Zhang, Yu-Xing

    2015-12-01

    The aim of the present study was to genetically modify plantlets of the Chinese yali pear to reduce their expression of ripening-associated 1-aminocyclopropane-1-carboxylic acid oxidase (ACO) and therefore increase the shelf-life of the fruit. Primers were designed with selectivity for the conserved regions of published ACO gene sequences, and yali complementary DNA (cDNA) cloning was performed by reverse transcription quantitative polymerase chain reaction (PCR). The obtained cDNA fragment contained 831 base pairs, encoding 276 amino acid residues, and shared no less than 94% nucleotide sequence identity with other published ACO genes. The cDNA fragment was inversely inserted into a pBI121 expression vector, between the cauliflower mosaic virus 35S promoter and the nopaline synthase terminator, in order to construct the anti‑sense expression vector of the ACO gene; it was transfected into cultured yali plants using Agrobacterium LBA4404. Four independent transgenic lines of pear plantlets were obtained and validated by PCR analysis. A Southern blot assay revealed that there were three transgenic lines containing a single copy of exogenous gene and one line with double copies. The present study provided germplasm resources for the cultivation of novel storage varieties of pears, therefore providing a reference for further applications of anti‑sense RNA technology in the genetic improvement of pears and other fruit.

  20. Acidovorax anthurii sp. nov., a new phytopathogenic bacterium which causes bacterial leaf-spot of anthurium.

    PubMed

    Gardan, L; Dauga, C; Prior, P; Gillis, M; Saddler, G S

    2000-01-01

    The bacterial leaf-spot of anthurium emerged during the 1980s, in the French West Indies and Trinidad. This new bacterial disease is presently wide spread and constitutes a serious limiting factor for commercial anthurium production. Twenty-nine strains isolated from leaf-spots of naturally infected anthurium were characterized and compared with reference strains belonging to the Comamonadaceae family, the genera Ralstonia and Burkholderia, and representative fluorescent pseudomonads. From artificial inoculations 25 out of 29 strains were pathogenic on anthurium. Biochemical and physiological tests, fatty acid analysis, DNA-DNA hybridization, 16S rRNA gene sequence analysis, DNA-16S RNA hybridization were performed. The 25 pathogenic strains on anthurium were clustered in one phenon closely related to phytopathogenic strains of the genus Acidovorax. Anthurium strains were 79-99% (deltaTm range 0.2-1.6) related to the strain CFBP 3232 and constituted a discrete DNA homology group indicating that they belong to the same species. DNA-rRNA hybridization, 16S rRNA sequence and fatty acid analysis confirmed that this new species belongs to the beta-subclass of Proteobacteria and to rRNA superfamily III, to the family of Comamonadaceae and to the genus Acidovorax. The name Acidovorax anthurii is proposed for this new phytopathogenic bacterium. The type strain has been deposited in the Collection Française des Bactéries Phytopathogènes as CFBP 3232T.

Top