Sample records for comparative dna sequence

  1. CRITICA: coding region identification tool invoking comparative analysis

    NASA Technical Reports Server (NTRS)

    Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1999-01-01

    Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).

  2. Molecular design of sequence specific DNA alkylating agents.

    PubMed

    Minoshima, Masafumi; Bando, Toshikazu; Shinohara, Ken-ichi; Sugiyama, Hiroshi

    2009-01-01

    Sequence-specific DNA alkylating agents have great interest for novel approach to cancer chemotherapy. We designed the conjugates between pyrrole (Py)-imidazole (Im) polyamides and DNA alkylating chlorambucil moiety possessing at different positions. The sequence-specific DNA alkylation by conjugates was investigated by using high-resolution denaturing polyacrylamide gel electrophoresis (PAGE). The results showed that polyamide chlorambucil conjugates alkylate DNA at flanking adenines in recognition sequences of Py-Im polyamides, however, the reactivities and alkylation sites were influenced by the positions of conjugation. In addition, we synthesized conjugate between Py-Im polyamide and another alkylating agent, 1-(chloromethyl)-5-hydroxy-1,2-dihydro-3H-benz[e]indole (seco-CBI). DNA alkylation reactivies by both alkylating polyamides were almost comparable. In contrast, cytotoxicities against cell lines differed greatly. These comparative studies would promote development of appropriate sequence-specific DNA alkylating polyamides against specific cancer cells.

  3. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing

    PubMed Central

    Mak, Sarah Siu Tze; Gopalakrishnan, Shyam; Carøe, Christian; Geng, Chunyu; Liu, Shanlin; Sinding, Mikkel-Holger S; Kuderna, Lukas F K; Zhang, Wenwei; Fu, Shujin; Vieira, Filipe G; Germonpré, Mietje; Bocherens, Hervé; Fedorov, Sergey; Petersen, Bent; Sicheritz-Pontén, Thomas; Marques-Bonet, Tomas; Zhang, Guojie; Jiang, Hui; Gilbert, M Thomas P

    2017-01-01

    Abstract Ancient DNA research has been revolutionized following development of next-generation sequencing platforms. Although a number of such platforms have been applied to ancient DNA samples, the Illumina series are the dominant choice today, mainly because of high production capacities and short read production. Recently a potentially attractive alternative platform for palaeogenomic data generation has been developed, the BGISEQ-500, whose sequence output are comparable with the Illumina series. In this study, we modified the standard BGISEQ-500 library preparation specifically for use on degraded DNA, then directly compared the sequencing performance and data quality of the BGISEQ-500 to the Illumina HiSeq2500 platform on DNA extracted from 8 historic and ancient dog and wolf samples. The data generated were largely comparable between sequencing platforms, with no statistically significant difference observed for parameters including level (P = 0.371) and average sequence length (P = 0718) of endogenous nuclear DNA, sequence GC content (P = 0.311), double-stranded DNA damage rate (v. 0.309), and sequence clonality (P = 0.093). Small significant differences were found in single-strand DNA damage rate (δS; slightly lower for the BGISEQ-500, P = 0.011) and the background rate of difference from the reference genome (θ; slightly higher for BGISEQ-500, P = 0.012). This may result from the differences in amplification cycles used to polymerase chain reaction–amplify the libraries. A significant difference was also observed in the mitochondrial DNA percentages recovered (P = 0.018), although we believe this is likely a stochastic effect relating to the extremely low levels of mitochondria that were sequenced from 3 of the samples with overall very low levels of endogenous DNA. Although we acknowledge that our analyses were limited to animal material, our observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA. PMID:28854615

  5. Whole exome sequencing for determination of tumor mutation load in liquid biopsy from advanced cancer patients.

    PubMed

    Koeppel, Florence; Blanchard, Steven; Jovelet, Cécile; Genin, Bérengère; Marcaillou, Charles; Martin, Emmanuel; Rouleau, Etienne; Solary, Eric; Soria, Jean-Charles; André, Fabrice; Lacroix, Ludovic

    2017-01-01

    Tumor mutation load (TML) has been proposed as a biomarker of patient response to immunotherapy in several studies. TML is usually determined by tumor biopsy DNA (tDNA) whole exome sequencing (WES), therefore TML evaluation is limited by informative biopsy availability. Circulating cell free DNA (cfDNA) provided by liquid biopsy is a surrogate specimen to biopsy for molecular profiling. Nevertheless performing WES on DNA from plasma is technically challenging and the ability to determine tumor mutation load from liquid biopsies remains to be demonstrated. In the current study, WES was performed on cfDNA from 32 metastatic patients of various cancer types included into MOSCATO 01 (NCT01566019) and/or MATCHR (NCT02517892) molecular triage trials. Results from targeted gene sequencing (TGS) and WES performed on cfDNA were compared to results from tumor tissue biopsy. In cfDNA samples, WES mutation detection sensitivity was 92% compared to targeted sequencing (TGS). When comparing cfDNA-WES to tDNA-WES, mutation detection sensitivity was 53%, consistent with previously published prospective study comparing cfDNA-TGS to tDNA-TGS. For samples in which presence of tumor DNA was confirmed in cfDNA, tumor mutation load from liquid biopsy was correlated with tumor biopsy. Taken together, this study demonstrated that liquid biopsy may be applied to determine tumor mutation load. Qualification of liquid biopsy for interpretation is a crucial point to use cfDNA for mutational load estimation.

  6. Whole exome sequencing for determination of tumor mutation load in liquid biopsy from advanced cancer patients

    PubMed Central

    Blanchard, Steven; Jovelet, Cécile; Genin, Bérengère; Marcaillou, Charles; Martin, Emmanuel; Rouleau, Etienne; Solary, Eric; Soria, Jean-Charles; André, Fabrice; Lacroix, Ludovic

    2017-01-01

    Tumor mutation load (TML) has been proposed as a biomarker of patient response to immunotherapy in several studies. TML is usually determined by tumor biopsy DNA (tDNA) whole exome sequencing (WES), therefore TML evaluation is limited by informative biopsy availability. Circulating cell free DNA (cfDNA) provided by liquid biopsy is a surrogate specimen to biopsy for molecular profiling. Nevertheless performing WES on DNA from plasma is technically challenging and the ability to determine tumor mutation load from liquid biopsies remains to be demonstrated. In the current study, WES was performed on cfDNA from 32 metastatic patients of various cancer types included into MOSCATO 01 (NCT01566019) and/or MATCHR (NCT02517892) molecular triage trials. Results from targeted gene sequencing (TGS) and WES performed on cfDNA were compared to results from tumor tissue biopsy. In cfDNA samples, WES mutation detection sensitivity was 92% compared to targeted sequencing (TGS). When comparing cfDNA-WES to tDNA-WES, mutation detection sensitivity was 53%, consistent with previously published prospective study comparing cfDNA-TGS to tDNA-TGS. For samples in which presence of tumor DNA was confirmed in cfDNA, tumor mutation load from liquid biopsy was correlated with tumor biopsy. Taken together, this study demonstrated that liquid biopsy may be applied to determine tumor mutation load. Qualification of liquid biopsy for interpretation is a crucial point to use cfDNA for mutational load estimation. PMID:29161279

  7. Entire plastid phylogeny of the carrot genus (Daucus, Apiaceae):Concordance with nuclear data and mitochondrial and nuclear DNA insertions to the plastid

    USDA-ARS?s Scientific Manuscript database

    We explored the phylogenetic utility of entire plastid DNA sequences in Daucus and compared the results to prior phylogenetic results using plastid, nuclear, and mitochondrial DNA sequences. We obtained, using Illumina sequencing, full plastid sequences of 37 accessions of 20 Daucus taxa and outgrou...

  8. Sequencing of cDNA Clones from the Genetic Map of Tomato (Lycopersicon esculentum)

    PubMed Central

    Ganal, Martin W.; Czihal, Rosemarie; Hannappel, Ulrich; Kloos, Dorothee-U.; Polley, Andreas; Ling, Hong-Qing

    1998-01-01

    The dense RFLP linkage map of tomato (Lycopersicon esculentum) contains >300 anonymous cDNA clones. Of those clones, 272 were partially or completely sequenced. The sequences were compared at the DNA and protein level to known genes in databases. For 57% of the clones, a significant match to previously described genes was found. The information will permit the conversion of those markers to STS markers and allow their use in PCR-based mapping experiments. Furthermore, it will facilitate the comparative mapping of genes across distantly related plant species by direct comparison of DNA sequences and map positions. [cDNA sequence data reported in this paper have been submitted to the EMBL database under accession nos. AA824695–AA825005 and the dbEST_Id database under accession nos. 1546519–1546862.] PMID:9724330

  9. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  10. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

    PubMed

    Yin, Changchuan

    2015-04-01

    To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.

  11. Improved multiple displacement amplification (iMDA) and ultraclean reagents.

    PubMed

    Motley, S Timothy; Picuri, John M; Crowder, Chris D; Minich, Jeremiah J; Hofstadler, Steven A; Eshoo, Mark W

    2014-06-06

    Next-generation sequencing sample preparation requires nanogram to microgram quantities of DNA; however, many relevant samples are comprised of only a few cells. Genomic analysis of these samples requires a whole genome amplification method that is unbiased and free of exogenous DNA contamination. To address these challenges we have developed protocols for the production of DNA-free consumables including reagents and have improved upon multiple displacement amplification (iMDA). A specialized ethylene oxide treatment was developed that renders free DNA and DNA present within Gram positive bacterial cells undetectable by qPCR. To reduce DNA contamination in amplification reagents, a combination of ion exchange chromatography, filtration, and lot testing protocols were developed. Our multiple displacement amplification protocol employs a second strand-displacing DNA polymerase, improved buffers, improved reaction conditions and DNA free reagents. The iMDA protocol, when used in combination with DNA-free laboratory consumables and reagents, significantly improved efficiency and accuracy of amplification and sequencing of specimens with moderate to low levels of DNA. The sensitivity and specificity of sequencing of amplified DNA prepared using iMDA was compared to that of DNA obtained with two commercial whole genome amplification kits using 10 fg (~1-2 bacterial cells worth) of bacterial genomic DNA as a template. Analysis showed >99% of the iMDA reads mapped to the template organism whereas only 0.02% of the reads from the commercial kits mapped to the template. To assess the ability of iMDA to achieve balanced genomic coverage, a non-stochastic amount of bacterial genomic DNA (1 pg) was amplified and sequenced, and data obtained were compared to sequencing data obtained directly from genomic DNA. The iMDA DNA and genomic DNA sequencing had comparable coverage 99.98% of the reference genome at ≥1X coverage and 99.9% at ≥5X coverage while maintaining both balance and representation of the genome. The iMDA protocol in combination with DNA-free laboratory consumables, significantly improved the ability to sequence specimens with low levels of DNA. iMDA has broad utility in metagenomics, diagnostics, ancient DNA analysis, pre-implantation embryo screening, single-cell genomics, whole genome sequencing of unculturable organisms, and forensic applications for both human and microbial targets.

  12. First Complete Squash leaf curl China virus Genomic Segment DNA-A Sequence from East Timor

    PubMed Central

    Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel

    2017-01-01

    ABSTRACT We present here the first complete Squash leaf curl China virus (SLCCV) genomic segment DNA-A sequence from East Timor. It was isolated from a pumpkin plant. When compared with 15 complete SLCCV DNA-A genome sequences from other world regions, it most resembled the Malaysian isolate MC1 sequence. PMID:28619789

  13. Ubiquitous and gene-specific regulatory 5' sequences in a sea urchin histone DNA clone coding for histone protein variants.

    PubMed Central

    Busslinger, M; Portmann, R; Irminger, J C; Birnstiel, M L

    1980-01-01

    The DNA sequences of the entire structural H4, H3, H2A and H2B genes and of their 5' flanking regions have been determined in the histone DNA clone h19 of the sea urchin Psammechinus miliaris. In clone h19 the polarity of transcription and the relative arrangement of the histone genes is identical to that in clone h22 of the same species. The histone proteins encoded by h19 DNA differ in their primary structure from those encoded by clone h22 and have been compared to histone protein sequences of other sea urchin species as well as other eukaryotes. A comparative analysis of the 5' flanking DNA sequences of the structural histone genes in both clones revealed four ubiquitous sequence motifs; a pentameric element GATCC, followed at short distance by the Hogness box GTATAAATAG, a conserved sequence PyCATTCPu, in or near which the 5' ends of the mRNAs map in h22 DNA and lastly a sequence A, containing the initiation codon. These sequences are also found, sometimes in modified version, in front of other eukaryotic genes transcribed by polymerase II. When prelude sequences of isocoding histone genes in clone h19 and h22 are compared areas of homology are seen to extend beyond the ubiquitous sequence motifs towards the divergent AT-rich spacer and terminate between approximately 140 and 240 nucleotides away from the structural gene. These prelude regions contain quite large conservative sequence blocks which are specific for each type of histone genes. Images PMID:7443547

  14. Utility of 16S rDNA Sequencing for Identification of Rare Pathogenic Bacteria.

    PubMed

    Loong, Shih Keng; Khor, Chee Sieng; Jafar, Faizatul Lela; AbuBakar, Sazaly

    2016-11-01

    Phenotypic identification systems are established methods for laboratory identification of bacteria causing human infections. Here, the utility of phenotypic identification systems was compared against 16S rDNA identification method on clinical isolates obtained during a 5-year study period, with special emphasis on isolates that gave unsatisfactory identification. One hundred and eighty-seven clinical bacteria isolates were tested with commercial phenotypic identification systems and 16S rDNA sequencing. Isolate identities determined using phenotypic identification systems and 16S rDNA sequencing were compared for similarity at genus and species level, with 16S rDNA sequencing as the reference method. Phenotypic identification systems identified ~46% (86/187) of the isolates with identity similar to that identified using 16S rDNA sequencing. Approximately 39% (73/187) and ~15% (28/187) of the isolates showed different genus identity and could not be identified using the phenotypic identification systems, respectively. Both methods succeeded in determining the species identities of 55 isolates; however, only ~69% (38/55) of the isolates matched at species level. 16S rDNA sequencing could not determine the species of ~20% (37/187) of the isolates. The 16S rDNA sequencing is a useful method over the phenotypic identification systems for the identification of rare and difficult to identify bacteria species. The 16S rDNA sequencing method, however, does have limitation for species-level identification of some bacteria highlighting the need for better bacterial pathogen identification tools. © 2016 Wiley Periodicals, Inc.

  15. A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences

    NASA Technical Reports Server (NTRS)

    Ho, P. S.; Ellison, M. J.; Quigley, G. J.; Rich, A.

    1986-01-01

    The ease with which a particular DNA segment adopts the left-handed Z-conformation depends largely on the sequence and on the degree of negative supercoiling to which it is subjected. We describe a computer program (Z-hunt) that is designed to search long sequences of naturally occurring DNA and retrieve those nucleotide combinations of up to 24 bp in length which show a strong propensity for Z-DNA formation. Incorporated into Z-hunt is a statistical mechanical model based on empirically determined energetic parameters for the B to Z transition accumulated to date. The Z-forming potential of a sequence is assessed by ranking its behavior as a function of negative superhelicity relative to the behavior of similar sized randomly generated nucleotide sequences assembled from over 80,000 combinations. The program makes it possible to compare directly the Z-forming potential of sequences with different base compositions and different sequence lengths. Using Z-hunt, we have analyzed the DNA sequences of the bacteriophage phi X174, plasmid pBR322, the animal virus SV40 and the replicative form of the eukaryotic adenovirus-2. The results are compared with those previously obtained by others from experiments designed to locate Z-DNA forming regions in these sequences using probes which show specificity for the left-handed DNA conformation.

  16. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    PubMed Central

    Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC

    2006-01-01

    Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935

  17. Complementary DNA cloning and molecular evolution of opine dehydrogenases in some marine invertebrates.

    PubMed

    Kimura, Tomohiro; Nakano, Toshiki; Yamaguchi, Toshiyasu; Sato, Minoru; Ogawa, Tomohisa; Muramoto, Koji; Yokoyama, Takehiko; Kan-No, Nobuhiro; Nagahisa, Eizou; Janssen, Frank; Grieshaber, Manfred K

    2004-01-01

    The complete complementary DNA sequences of genes presumably coding for opine dehydrogenases from Arabella iricolor (sandworm), Haliotis discus hannai (abalone), and Patinopecten yessoensis (scallop) were determined, and partial cDNA sequences were derived for Meretrix lusoria (Japanese hard clam) and Spisula sachalinensis (Sakhalin surf clam). The primers ODH-9F and ODH-11R proved useful for amplifying the sequences for opine dehydrogenases from the 4 mollusk species investigated in this study. The sequence of the sandworm was obtained using primers constructed from the amino acid sequence of tauropine dehydrogenase, the main opine dehydrogenase in A. iricolor. The complete cDNA sequence of A. iricolor, H. discus hannai, and P. yessoensis encode 397, 400, and 405 amino acids, respectively. All sequences were aligned and compared with published databank sequences of Loligo opalescens, Loligo vulgaris (squid), Sepia officinalis (cuttlefish), and Pecten maximus (scallop). As expected, a high level of homology was observed for the cDNA from closely related species, such as for cephalopods or scallops, whereas cDNA from the other species showed lower-level homologies. A similar trend was observed when the deduced amino acid sequences were compared. Furthermore, alignment of these sequences revealed some structural motifs that are possibly related to the binding sites of the substrates. The phylogenetic trees derived from the nucleotide and amino acid sequences were consistent with the classification of species resulting from classical taxonomic analyses.

  18. Constructing DNA Barcode Sets Based on Particle Swarm Optimization.

    PubMed

    Wang, Bin; Zheng, Xuedong; Zhou, Shihua; Zhou, Changjun; Wei, Xiaopeng; Zhang, Qiang; Wei, Ziqi

    2018-01-01

    Following the completion of the human genome project, a large amount of high-throughput bio-data was generated. To analyze these data, massively parallel sequencing, namely next-generation sequencing, was rapidly developed. DNA barcodes are used to identify the ownership between sequences and samples when they are attached at the beginning or end of sequencing reads. Constructing DNA barcode sets provides the candidate DNA barcodes for this application. To increase the accuracy of DNA barcode sets, a particle swarm optimization (PSO) algorithm has been modified and used to construct the DNA barcode sets in this paper. Compared with the extant results, some lower bounds of DNA barcode sets are improved. The results show that the proposed algorithm is effective in constructing DNA barcode sets.

  19. Sequencing and comparing whole mitochondrial genomes ofanimals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based onmore » our experiences to date with determining and comparing complete mtDNA sequences.« less

  20. Recent patents of nanopore DNA sequencing technology: progress and challenges.

    PubMed

    Zhou, Jianfeng; Xu, Bingqian

    2010-11-01

    DNA sequencing techniques witnessed fast development in the last decades, primarily driven by the Human Genome Project. Among the proposed new techniques, Nanopore was considered as a suitable candidate for the single DNA sequencing with ultrahigh speed and very low cost. Several fabrication and modification techniques have been developed to produce robust and well-defined nanopore devices. Many efforts have also been done to apply nanopore to analyze the properties of DNA molecules. By comparing with traditional sequencing techniques, nanopore has demonstrated its distinctive superiorities in main practical issues, such as sample preparation, sequencing speed, cost-effective and read-length. Although challenges still remain, recent researches in improving the capabilities of nanopore have shed a light to achieve its ultimate goal: Sequence individual DNA strand at single nucleotide level. This patent review briefly highlights recent developments and technological achievements for DNA analysis and sequencing at single molecule level, focusing on nanopore based methods.

  1. DNA polymerase having modified nucleotide binding site for DNA sequencing

    DOEpatents

    Tabor, Stanley; Richardson, Charles

    1997-01-01

    Modified gene encoding a modified DNA polymerase wherein the modified polymerase incorporates dideoxynucleotides at least 20-fold better compared to the corresponding deoxynucleotides as compared with the corresponding naturally-occurring DNA polymerase.

  2. Mapping the Space of Genomic Signatures

    PubMed Central

    Kari, Lila; Hill, Kathleen A.; Sayem, Abu S.; Karamichalis, Rallis; Bryans, Nathaniel; Davis, Katelyn; Dattani, Nikesh S.

    2015-01-01

    We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber. PMID:26000734

  3. Affordable hands-on DNA sequencing and genotyping: an exercise for teaching DNA analysis to undergraduates.

    PubMed

    Shah, Kushani; Thomas, Shelby; Stein, Arnold

    2013-01-01

    In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C Sanger sequencing reactions. They prepare and run the gels, perform Southern blots (which require only 10 min), and detect sequencing ladders using a colorimetric detection system. Students enlarge their sequencing ladders from digital images of their small nylon membranes, and read the sequence manually. They compare their reads with the actual DNA sequence using BLAST2. After mastering the DNA sequencing system, students prepare their own DNA from a cheek swab, polymerase chain reaction-amplify a region of their DNA that encompasses a SNP of interest, and perform sequencing to determine their genotype at the SNP position. A family pedigree can also be constructed. The SNP chosen by the instructor was rs17822931, which is in the ABCC11 gene and is the determinant of human earwax type. Genotypes at the rs178229931 site vary in different ethnic populations. © 2013 by The International Union of Biochemistry and Molecular Biology.

  4. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  5. Microsatellite DNA in genomic survey sequences and UniGenes of loblolly pine

    Treesearch

    Craig S Echt; Surya Saha; Dennis L Deemer; C Dana Nelson

    2011-01-01

    Genomic DNA sequence databases are a potential and growing resource for simple sequence repeat (SSR) marker development in loblolly pine (Pinus taeda L.). Loblolly pine also has many expressed sequence tags (ESTs) available for microsatellite (SSR) marker development. We compared loblolly pine SSR densities in genome survey sequences (GSSs) to those in non-redundant...

  6. DNA Base-Calling from a Nanopore Using a Viterbi Algorithm

    PubMed Central

    Timp, Winston; Comer, Jeffrey; Aksimentiev, Aleksei

    2012-01-01

    Nanopore-based DNA sequencing is the most promising third-generation sequencing method. It has superior read length, speed, and sample requirements compared with state-of-the-art second-generation methods. However, base-calling still presents substantial difficulty because the resolution of the technique is limited compared with the measured signal/noise ratio. Here we demonstrate a method to decode 3-bp-resolution nanopore electrical measurements into a DNA sequence using a Hidden Markov model. This method shows tremendous potential for accuracy (∼98%), even with a poor signal/noise ratio. PMID:22677395

  7. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.

  8. SeqCompress: an algorithm for biological sequence compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz; Bajwa, Hassan

    2014-10-01

    The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount of data efficiently. Biological sequence data storage cost has become a noticeable proportion of total cost in the generation and analysis. Particularly increase in DNA sequencing rate is significantly outstripping the rate of increase in disk storage capacity, which may go beyond the limit of storage capacity. It is essential to develop algorithms that handle large data sets via better memory management. This article presents a DNA sequence compression algorithm SeqCompress that copes with the space complexity of biological sequences. The algorithm is based on lossless data compression and uses statistical model as well as arithmetic coding to compress DNA sequences. The proposed algorithm is compared with recent specialized compression tools for biological sequences. Experimental results show that proposed algorithm has better compression gain as compared to other existing algorithms. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. GATA: A graphic alignment tool for comparative sequenceanalysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nix, David A.; Eisen, Michael B.

    2005-01-01

    Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less

  10. DNA polymerase having modified nucleotide binding site for DNA sequencing

    DOEpatents

    Tabor, S.; Richardson, C.

    1997-03-25

    A modified gene encoding a modified DNA polymerase is disclosed. The modified polymerase incorporates dideoxynucleotides at least 20-fold better compared to the corresponding deoxynucleotides as compared with the corresponding naturally-occurring DNA polymerase. 6 figs.

  11. Effects of sequence on DNA wrapping around histones

    NASA Astrophysics Data System (ADS)

    Ortiz, Vanessa

    2011-03-01

    A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).

  12. Characterization of the repetitive DNA elements in the genome of fish lymphocystis disease viruses.

    PubMed

    Schnitzler, P; Darai, G

    1989-09-01

    The complete DNA nucleotide sequence of the repetitive DNA elements in the genome of fish lymphocystis disease virus (FLDV) isolated from two different species (flounder and dab) was determined. The size of these repetitive DNA elements was found to be 1413 bp which corresponds to the DNA sequences of the 5' terminus of the EcoRI DNA fragment B (0.034 to 0.052 m.u.) and to the EcoRI DNA fragment M (0.718 to 0.736 m.u.) of the FLDV genome causing lymphocystis disease in flounder and plaice. The degree of DNA nucleotide homology between both regions was found to be 99%. The repetitive DNA element in the genome of FLDV isolated from other fish species (dab) was identified and is located within the EcoRI DNA fragment B and J of the viral genome. The DNA nucleotide sequence of one duplicate of this repetition (EcoRI DNA fragment J) was determined (1410 bp) and compared to the DNA nucleotide sequences of the repetitive DNA elements of the genome of FLDV isolated from flounder. It was found that the repetitive DNA elements of the genome of FLDV derived from two different fish species are highly conserved and possess a degree of DNA sequence homology of 94%. The DNA sequences of each strand of the individual repetitive element possess one open reading frame.

  13. Sequencing historical specimens: successful preparation of small specimens with low amounts of degraded DNA.

    PubMed

    Sproul, John S; Maddison, David R

    2017-11-01

    Despite advances that allow DNA sequencing of old museum specimens, sequencing small-bodied, historical specimens can be challenging and unreliable as many contain only small amounts of fragmented DNA. Dependable methods to sequence such specimens are especially critical if the specimens are unique. We attempt to sequence small-bodied (3-6 mm) historical specimens (including nomenclatural types) of beetles that have been housed, dried, in museums for 58-159 years, and for which few or no suitable replacement specimens exist. To better understand ideal approaches of sample preparation and produce preparation guidelines, we compared different library preparation protocols using low amounts of input DNA (1-10 ng). We also explored low-cost optimizations designed to improve library preparation efficiency and sequencing success of historical specimens with minimal DNA, such as enzymatic repair of DNA. We report successful sample preparation and sequencing for all historical specimens despite our low-input DNA approach. We provide a list of guidelines related to DNA repair, bead handling, reducing adapter dimers and library amplification. We present these guidelines to facilitate more economical use of valuable DNA and enable more consistent results in projects that aim to sequence challenging, irreplaceable historical specimens. © 2017 John Wiley & Sons Ltd.

  14. Determination of Trichuris skrjabini by sequencing of the ITS1-5.8S-ITS2 segment of the ribosomal DNA: comparative molecular study of different species of trichurids.

    PubMed

    Cutillas, C; Oliveros, R; de Rojas, M; Guevara, D C

    2004-06-01

    Adults of Trichuris skrjahini have been isolated from the cecum of caprine hosts (Capra hircus), Trichuris ovis and Trichuris globulosa from Ovis aries (sheep) and C. hircus (goats), and Trichuris leporis from Lepus europaeus (rabbits) in Spain. Genomic DNA was isolated and the ITS1-5.8S-ITS2 segment from the ribosomal DNA (rDNA) was amplified and sequenced by polymerase chain reaction (PCR) techniques. The ITS1 of T. skrjabini, T. ovis, T. globulosa, and T. leporis was 495, 757, 757, and 536 nucleotides in length, respectively, and had G + C contents of 59.6, 58.7, 58.7, and 60.8%, respectively. Intraindividual variation was detected in the ITSI sequences of the 4 species. Furthermore, the 5.8S sequences of T. skrjabini, T. ovis, T. globulosa, and T. leporis were compared. A total of 157, 152, 153, and 157 nucleotides in length was observed in the 5.8S sequences of these 4 species, respectively. There were no sequence differences of ITS1 and 5.8S products between T. ovis and T. globulosa. Nevertheless, clear differences were detected between the ITS1 sequences of T. skrjabini, T. ovis, T. leporis, Trichuris muris, and T. arvicolae. The ITS2 fragment from the rDNA of T. skrjabini was sequenced. A comparative study of the ITS2 sequence of T. skrjabini with the previously published ITS2 sequence data of T. ovis, T. leporis, T. muris, and T. arvicolae suggested that the combined use of sequence data from both spacers would be useful in the molecular characterization of trichurid parasites.

  15. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

  16. DNA base-calling from a nanopore using a Viterbi algorithm.

    PubMed

    Timp, Winston; Comer, Jeffrey; Aksimentiev, Aleksei

    2012-05-16

    Nanopore-based DNA sequencing is the most promising third-generation sequencing method. It has superior read length, speed, and sample requirements compared with state-of-the-art second-generation methods. However, base-calling still presents substantial difficulty because the resolution of the technique is limited compared with the measured signal/noise ratio. Here we demonstrate a method to decode 3-bp-resolution nanopore electrical measurements into a DNA sequence using a Hidden Markov model. This method shows tremendous potential for accuracy (~98%), even with a poor signal/noise ratio. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  17. Comparative analysis of mitochondrial genomes between a wheat K-type cytoplasmic male sterility (CMS) line and its maintainer line.

    PubMed

    Liu, Huitao; Cui, Peng; Zhan, Kehui; Lin, Qiang; Zhuo, Guoyin; Guo, Xiaoli; Ding, Feng; Yang, Wenlong; Liu, Dongcheng; Hu, Songnian; Yu, Jun; Zhang, Aimin

    2011-03-29

    Plant mitochondria, semiautonomous organelles that function as manufacturers of cellular ATP, have their own genome that has a slow rate of evolution and rapid rearrangement. Cytoplasmic male sterility (CMS), a common phenotype in higher plants, is closely associated with rearrangements in mitochondrial DNA (mtDNA), and is widely used to produce F1 hybrid seeds in a variety of valuable crop species. Novel chimeric genes deduced from mtDNA rearrangements causing CMS have been identified in several plants, such as rice, sunflower, pepper, and rapeseed, but there are very few reports about mtDNA rearrangements in wheat. In the present work, we describe the mitochondrial genome of a wheat K-type CMS line and compare it with its maintainer line. The complete mtDNA sequence of a wheat K-type (with cytoplasm of Aegilops kotschyi) CMS line, Ks3, was assembled into a master circle (MC) molecule of 647,559 bp and found to harbor 34 known protein-coding genes, three rRNAs (18 S, 26 S, and 5 S rRNAs), and 16 different tRNAs. Compared to our previously published sequence of a K-type maintainer line, Km3, we detected Ks3-specific mtDNA (> 100 bp, 11.38%) and repeats (> 100 bp, 29 units) as well as genes that are unique to each line: rpl5 was missing in Ks3 and trnH was absent from Km3. We also defined 32 single nucleotide polymorphisms (SNPs) in 13 protein-coding, albeit functionally irrelevant, genes, and predicted 22 unique ORFs in Ks3, representing potential candidates for K-type CMS. All these sequence variations are candidates for involvement in CMS. A comparative analysis of the mtDNA of several angiosperms, including those from Ks3, Km3, rice, maize, Arabidopsis thaliana, and rapeseed, showed that non-coding sequences of higher plants had mostly divergent multiple reorganizations during the mtDNA evolution of higher plants. The complete mitochondrial genome of the wheat K-type CMS line Ks3 is very different from that of its maintainer line Km3, especially in non-coding sequences. Sequence rearrangement has produced novel chimeric ORFs, which may be candidate genes for CMS. Comparative analysis of several angiosperm mtDNAs indicated that non-coding sequences are the most frequently reorganized during mtDNA evolution in higher plants.

  18. DNA barcode goes two-dimensions: DNA QR code web server.

    PubMed

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  19. TaxI: a software tool for DNA barcoding using distance methods

    PubMed Central

    Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

    2005-01-01

    DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755

  20. Colorimetric and dynamic light scattering detection of DNA sequences by using positively charged gold nanospheres: a comparative study with gold nanorods

    NASA Astrophysics Data System (ADS)

    Pylaev, T. E.; Khanadeev, V. A.; Khlebtsov, B. N.; Dykman, L. A.; Bogatyrev, V. A.; Khlebtsov, N. G.

    2011-07-01

    We introduce a new genosensing approach employing CTAB (cetyltrimethylammonium bromide)-coated positively charged colloidal gold nanoparticles (GNPs) to detect target DNA sequences by using absorption spectroscopy and dynamic light scattering. The approach is compared with a previously reported method employing unmodified CTAB-coated gold nanorods (GNRs). Both approaches are based on the observation that whereas the addition of probe and target ssDNA to CTAB-coated particles results in particle aggregation, no aggregation is observed after addition of probe and nontarget DNA sequences. Our goal was to compare the feasibility and sensitivity of both methods. A 21-mer ssDNA from the human immunodeficiency virus type 1 HIV-1 U5 long terminal repeat (LTR) sequence and a 23-mer ssDNA from the Bacillus anthracis cryptic protein and protective antigen precursor (pagA) genes were used as ssDNA models. In the case of GNRs, unexpectedly, the colorimetric test failed with perfect cigar-like particles but could be performed with dumbbell and dog-bone rods. By contrast, our approach with cationic CTAB-coated GNPs is easy to implement and possesses excellent feasibility with retention of comparable sensitivity—a 0.1 nM concentration of target cDNA can be detected with the naked eye and 10 pM by dynamic light scattering (DLS) measurements. The specificity of our method is illustrated by successful DLS detection of one-three base mismatches in cDNA sequences for both DNA models. These results suggest that the cationic GNPs and DLS can be used for genosensing under optimal DNA hybridization conditions without any chemical modifications of the particle surface with ssDNA molecules and signal amplification. Finally, we discuss a more than two-three-order difference in the reported estimations of the detection sensitivity of colorimetric methods (0.1 to 10-100 pM) to show that the existing aggregation models are inconsistent with the detection limits of about 0.1-1 pM DNA and that other explanations should be developed.

  1. A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

    PubMed

    Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso

    2015-07-01

    In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Assessing the Fidelity of Ancient DNA Sequences Amplified From Nuclear Genes

    PubMed Central

    Binladen, Jonas; Wiuf, Carsten; Gilbert, M. Thomas P.; Bunce, Michael; Barnett, Ross; Larson, Greger; Greenwood, Alex D.; Haile, James; Ho, Simon Y. W.; Hansen, Anders J.; Willerslev, Eske

    2006-01-01

    To date, the field of ancient DNA has relied almost exclusively on mitochondrial DNA (mtDNA) sequences. However, a number of recent studies have reported the successful recovery of ancient nuclear DNA (nuDNA) sequences, thereby allowing the characterization of genetic loci directly involved in phenotypic traits of extinct taxa. It is well documented that postmortem damage in ancient mtDNA can lead to the generation of artifactual sequences. However, as yet no one has thoroughly investigated the damage spectrum in ancient nuDNA. By comparing clone sequences from 23 fossil specimens, recovered from environments ranging from permafrost to desert, we demonstrate the presence of miscoding lesion damage in both the mtDNA and nuDNA, resulting in insertion of erroneous bases during amplification. Interestingly, no significant differences in the frequency of miscoding lesion damage are recorded between mtDNA and nuDNA despite great differences in cellular copy numbers. For both mtDNA and nuDNA, we find significant positive correlations between total sequence heterogeneity and the rates of type 1 transitions (adenine → guanine and thymine → cytosine) and type 2 transitions (cytosine → thymine and guanine → adenine), respectively. Type 2 transitions are by far the most dominant and increase relative to those of type 1 with damage load. The results suggest that the deamination of cytosine (and 5-methyl cytosine) to uracil (and thymine) is the main cause of miscoding lesions in both ancient mtDNA and nuDNA sequences. We argue that the problems presented by postmortem damage, as well as problems with contamination from exogenous sources of conserved nuclear genes, allelic variation, and the reliance on single nucleotide polymorphisms, call for great caution in studies relying on ancient nuDNA sequences. PMID:16299392

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sobottka, Marcelo, E-mail: sobottka@mtm.ufsc.br; Hart, Andrew G., E-mail: ahart@dim.uchile.cl

    Highlights: {yields} We propose a simple stochastic model to construct primitive DNA sequences. {yields} The model provide an explanation for Chargaff's second parity rule in primitive DNA sequences. {yields} The model is also used to predict a novel type of strand symmetry in primitive DNA sequences. {yields} We extend the results for bacterial DNA sequences and compare distributional properties intrinsic to the model to statistical estimates from 1049 bacterial genomes. {yields} We find out statistical evidences that the novel type of strand symmetry holds for bacterial DNA sequences. -- Abstract: Chargaff's second parity rule for short oligonucleotides states that themore » frequency of any short nucleotide sequence on a strand is approximately equal to the frequency of its reverse complement on the same strand. Recent studies have shown that, with the exception of organellar DNA, this parity rule generally holds for double-stranded DNA genomes and fails to hold for single-stranded genomes. While Chargaff's first parity rule is fully explained by the Watson-Crick pairing in the DNA double helix, a definitive explanation for the second parity rule has not yet been determined. In this work, we propose a model based on a hidden Markov process for approximating the distributional structure of primitive DNA sequences. Then, we use the model to provide another possible theoretical explanation for Chargaff's second parity rule, and to predict novel distributional aspects of bacterial DNA sequences.« less

  4. The role of DNA repair in herpesvirus pathogenesis.

    PubMed

    Brown, Jay C

    2014-10-01

    In cells latently infected with a herpesvirus, the viral DNA is present in the cell nucleus, but it is not extensively replicated or transcribed. In this suppressed state the virus DNA is vulnerable to mutagenic events that affect the host cell and have the potential to destroy the virus' genetic integrity. Despite the potential for genetic damage, however, herpesvirus sequences are well conserved after reactivation from latency. To account for this apparent paradox, I have tested the idea that host cell-encoded mechanisms of DNA repair are able to control genetic damage to latent herpesviruses. Studies were focused on homologous recombination-dependent DNA repair (HR). Methods of DNA sequence analysis were employed to scan herpesvirus genomes for DNA features able to activate HR. Analyses were carried out with a total of 39 herpesvirus DNA sequences, a group that included viruses from the alpha-, beta- and gamma-subfamilies. The results showed that all 39 genome sequences were enriched in two or more of the eight recombination-initiating features examined. The results were interpreted to indicate that HR can stabilize latent herpesvirus genomes. The results also showed, unexpectedly, that repair-initiating DNA features differed in alpha- compared to gamma-herpesviruses. Whereas inverted and tandem repeats predominated in alpha-herpesviruses, gamma-herpesviruses were enriched in short, GC-rich initiation sequences such as CCCAG and depleted in repeats. In alpha-herpesviruses, repair-initiating repeat sequences were found to be concentrated in a specific region (the S segment) of the genome while repair-initiating short sequences were distributed more uniformly in gamma-herpesviruses. The results suggest that repair pathways are activated differently in alpha- compared to gamma-herpesviruses. Copyright © 2014. Published by Elsevier Inc.

  5. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing

    PubMed Central

    Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante

    2008-01-01

    Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465

  6. Comparative study of IDH1 mutations in gliomas by immunohistochemistry and DNA sequencing.

    PubMed

    Agarwal, Shipra; Sharma, Mehar Chand; Jha, Prerana; Pathak, Pankaj; Suri, Vaishali; Sarkar, Chitra; Chosdol, Kunzang; Suri, Ashish; Kale, Shashank Sharad; Mahapatra, Ashok Kumar; Jha, Pankaj

    2013-06-01

    Mutations involving isocitrate dehydrogenase 1 (IDH 1) occur in a high proportion of diffuse gliomas, with implications on diagnosis and prognosis. About 90% involve exon 4 at codon 132, replacing amino acid arginine with histidine (R132H). Rarer ones include R132C, R132S, R132G, R132L, R132V, and R132P. Most authors have used DNA-based methods to assess IDH1 status. Preliminary studies comparing imunohistochemistry (IHC) with IDH1-R132H mutation-specific antibodies have shown concordance with DNA sequencing and no cross-reactivity with wild-type IDH1 or other mutant proteins. The present study compares results of IHC with DNA sequencing in diffuse gliomas. Fifty diffuse gliomas with frozen tissue samples for DNA sequencing and adequate tissue in paraffin blocks for IHC using IDH1-R132H specific antibody were assessed for IDH1 mutations. Concordance of findings between IHC and DNA sequencing was noted in 88% (44/50) cases. All 6 cases with discrepancy were immunopositive with DIA-H09 antibody. While in 3 of these 6 cases, DNA sequencing failed to reveal any mutations, R132L (arginine replaced by leucine) mutation was found in the rest 3 cases. Interestingly, of the immunopositive cases, 46.6% (14/30) showed immunostaining in only a fraction of tumor cells. IHC is an easy and quick method of detecting IDH1-R132H mutations, but there may be some discrepancies between IHC and DNA sequencing. Although there were no false-negative cases, cross-reactivity with IDH1-R132L was seen in 3, a finding not reported thus far. Because of more universal availability of IHC over genetic testing, cross-reactivity and staining heterogeneity may have bearing over its use in detecting IDH1-R132H mutation in gliomas.

  7. Comparative study of IDH1 mutations in gliomas by immunohistochemistry and DNA sequencing

    PubMed Central

    Agarwal, Shipra; Sharma, Mehar Chand; Jha, Prerana; Pathak, Pankaj; Suri, Vaishali; Sarkar, Chitra; Chosdol, Kunzang; Suri, Ashish; Kale, Shashank Sharad; Mahapatra, Ashok Kumar; Jha, Pankaj

    2013-01-01

    Background Mutations involving isocitrate dehydrogenase 1 (IDH 1) occur in a high proportion of diffuse gliomas, with implications on diagnosis and prognosis. About 90% involve exon 4 at codon 132, replacing amino acid arginine with histidine (R132H). Rarer ones include R132C, R132S, R132G, R132L, R132V, and R132P. Most authors have used DNA-based methods to assess IDH1 status. Preliminary studies comparing imunohistochemistry (IHC) with IDH1-R132H mutation-specific antibodies have shown concordance with DNA sequencing and no cross-reactivity with wild-type IDH1 or other mutant proteins. The present study compares results of IHC with DNA sequencing in diffuse gliomas. Materials and methods Fifty diffuse gliomas with frozen tissue samples for DNA sequencing and adequate tissue in paraffin blocks for IHC using IDH1-R132H specific antibody were assessed for IDH1 mutations. Results Concordance of findings between IHC and DNA sequencing was noted in 88% (44/50) cases. All 6 cases with discrepancy were immunopositive with DIA-H09 antibody. While in 3 of these 6 cases, DNA sequencing failed to reveal any mutations, R132L (arginine replaced by leucine) mutation was found in the rest 3 cases. Interestingly, of the immunopositive cases, 46.6% (14/30) showed immunostaining in only a fraction of tumor cells. Conclusions IHC is an easy and quick method of detecting IDH1-R132H mutations, but there may be some discrepancies between IHC and DNA sequencing. Although there were no false-negative cases, cross-reactivity with IDH1-R132L was seen in 3, a finding not reported thus far. Because of more universal availability of IHC over genetic testing, cross-reactivity and staining heterogeneity may have bearing over its use in detecting IDH1-R132H mutation in gliomas. PMID:23486690

  8. Comparative analysis of Campylobacter isolates from wild birds and chickens using MALDI-TOF MS, biochemical testing, and DNA sequencing.

    PubMed

    Lawton, Samantha J; Weis, Allison M; Byrne, Barbara A; Fritz, Heather; Taff, Conor C; Townsend, Andrea K; Weimer, Bart C; Mete, Aslı; Wheeler, Sarah; Boyce, Walter M

    2018-05-01

    Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) was compared to conventional biochemical testing methods and nucleic acid analyses (16S rDNA sequencing, hippurate hydrolysis gene testing, whole genome sequencing [WGS]) for species identification of Campylobacter isolates obtained from chickens ( Gallus gallus domesticus, n = 8), American crows ( Corvus brachyrhynchos, n = 17), a mallard duck ( Anas platyrhynchos, n = 1), and a western scrub-jay ( Aphelocoma californica, n = 1). The test results for all 27 isolates were in 100% agreement between MALDI-TOF MS, the combined results of 16S rDNA sequencing, and the hippurate hydrolysis gene PCR ( p = 0.0027, kappa = 1). Likewise, the identifications derived from WGS from a subset of 14 isolates were in 100% agreement with the MALDI-TOF MS identification. In contrast, biochemical testing misclassified 5 isolates of C. jejuni as C. coli, and 16S rDNA sequencing alone was not able to differentiate between C. coli and C. jejuni for 11 sequences ( p = 0.1573, kappa = 0.0857) when compared to MALDI-TOF MS and WGS. No agreement was observed between MALDI-TOF MS dendrograms and the phylogenetic relationships revealed by rDNA sequencing or WGS. Our results confirm that MALDI-TOF MS is a fast and reliable method for identifying Campylobacter isolates to the species level from wild birds and chickens, but not for elucidating phylogenetic relationships among Campylobacter isolates.

  9. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes.

    PubMed

    Fredlake, Christopher P; Hert, Daniel G; Kan, Cheuk-Wai; Chiesl, Thomas N; Root, Brian E; Forster, Ryan E; Barron, Annelise E

    2008-01-15

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require approximately 70 min to deliver approximately 650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered "hybrid" mechanism of DNA electromigration, in which DNA molecules alternate rapidly between repeating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs.

  10. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes

    PubMed Central

    Fredlake, Christopher P.; Hert, Daniel G.; Kan, Cheuk-Wai; Chiesl, Thomas N.; Root, Brian E.; Forster, Ryan E.; Barron, Annelise E.

    2008-01-01

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require ≈70 min to deliver ≈650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered “hybrid” mechanism of DNA electromigration, in which DNA molecules alternate rapidly between reptating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs. PMID:18184818

  11. A new family of satellite DNA sequences as a major component of centromeric heterochromatin in owls (Strigiformes).

    PubMed

    Yamada, Kazuhiko; Nishida-Umehara, Chizuko; Matsuda, Yoichi

    2004-03-01

    We isolated a new family of satellite DNA sequences from HaeIII- and EcoRI-digested genomic DNA of the Blakiston's fish owl ( Ketupa blakistoni). The repetitive sequences were organized in tandem arrays of the 174 bp element, and localized to the centromeric regions of all macrochromosomes, including the Z and W chromosomes, and microchromosomes. This hybridization pattern was consistent with the distribution of C-band-positive centromeric heterochromatin, and the satellite DNA sequences occupied 10% of the total genome as a major component of centromeric heterochromatin. The sequences were homogenized between macro- and microchromosomes in this species, and therefore intraspecific divergence of the nucleotide sequences was low. The 174 bp element cross-hybridized to the genomic DNA of six other Strigidae species, but not to that of the Tytonidae, suggesting that the satellite DNA sequences are conserved in the same family but fairly divergent between the different families in the Strigiformes. Secondly, the centromeric satellite DNAs were cloned from eight Strigidae species, and the nucleotide sequences of 41 monomer fragments were compared within and between species. Molecular phylogenetic relationships of the nucleotide sequences were highly correlated with both the taxonomy based on morphological traits and the phylogenetic tree constructed by DNA-DNA hybridization. These results suggest that the satellite DNA sequence has evolved by concerted evolution in the Strigidae and that it is a good taxonomic and phylogenetic marker to examine genetic diversity between Strigiformes species.

  12. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats

    PubMed Central

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-01-01

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363

  13. Osmylated DNA, a novel concept for sequencing DNA using nanopores

    NASA Astrophysics Data System (ADS)

    Kanavarioti, Anastassia

    2015-03-01

    Saenger sequencing has led the advances in molecular biology, while faster and cheaper next generation technologies are urgently needed. A newer approach exploits nanopores, natural or solid-state, set in an electrical field, and obtains base sequence information from current variations due to the passage of a ssDNA molecule through the pore. A hurdle in this approach is the fact that the four bases are chemically comparable to each other which leads to small differences in current obstruction. ‘Base calling’ becomes even more challenging because most nanopores sense a short sequence and not individual bases. Perhaps sequencing DNA via nanopores would be more manageable, if only the bases were two, and chemically very different from each other; a sequence of 1s and 0s comes to mind. Osmylated DNA comes close to such a sequence of 1s and 0s. Osmylation is the addition of osmium tetroxide bipyridine across the C5-C6 double bond of the pyrimidines. Osmylation adds almost 400% mass to the reactive base, creates a sterically and electronically notably different molecule, labeled 1, compared to the unreactive purines, labeled 0. If osmylated DNA were successfully sequenced, the result would be a sequence of osmylated pyrimidines (1), and purines (0), and not of the actual nucleobases. To solve this problem we studied the osmylation reaction with short oligos and with M13mp18, a long ssDNA, developed a UV-vis assay to measure extent of osmylation, and designed two protocols. Protocol A uses mild conditions and yields osmylated thymidines (1), while leaving the other three bases (0) practically intact. Protocol B uses harsher conditions and effectively osmylates both pyrimidines, but not the purines. Applying these two protocols also to the complementary of the target polynucleotide yields a total of four osmylated strands that collectively could define the actual base sequence of the target DNA.

  14. Satellite DNA Sequences in Canidae and Their Chromosome Distribution in Dog and Red Fox.

    PubMed

    Vozdova, Miluse; Kubickova, Svatava; Cernohorska, Halina; Fröhlich, Jan; Rubes, Jiri

    2016-01-01

    Satellite DNA is a characteristic component of mammalian centromeric heterochromatin, and a comparative analysis of its evolutionary dynamics can be used for phylogenetic studies. We analysed satellite and satellite-like DNA sequences available in NCBI for 4 species of the family Canidae (red fox, Vulpes vulpes, VVU; domestic dog, Canis familiaris, CFA; arctic fox, Vulpes lagopus, VLA; raccoon dog, Nyctereutes procyonoides procyonoides, NPR) by comparative sequence analysis, which revealed 86-90% intraspecies and 76-79% interspecies similarity. Comparative fluorescence in situ hybridisation in the red fox and dog showed signals of the red fox satellite probe in canine and vulpine autosomal centromeres, on VVUY, B chromosomes, and in the distal parts of VVU9q and VVU10p which were shown to contain nucleolus organiser regions. The CFA satellite probe stained autosomal centromeres only in the dog. The CFA satellite-like DNA did not show any significant sequence similarity with the satellite DNA of any species analysed and was localised to the centromeres of 9 canine chromosome pairs. No significant heterochromatin block was detected on the B chromosomes of the red fox. Our results show extensive heterogeneity of satellite sequences among Canidae and prove close evolutionary relationships between the red and arctic fox. © 2017 S. Karger AG, Basel.

  15. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.

    PubMed

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus; Morling, Niels

    2016-01-01

    Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates automation of DNA sequencing.

  16. Cloning and expression of cDNA coding for bouganin.

    PubMed

    den Hartog, Marcel T; Lubelli, Chiara; Boon, Louis; Heerkens, Sijmie; Ortiz Buijsse, Antonio P; de Boer, Mark; Stirpe, Fiorenzo

    2002-03-01

    Bouganin is a ribosome-inactivating protein that recently was isolated from Bougainvillea spectabilis Willd. In this work, the cloning and expression of the cDNA encoding for bouganin is described. From the cDNA, the amino-acid sequence was deduced, which correlated with the primary sequence data obtained by amino-acid sequencing on the native protein. Bouganin is synthesized as a pro-peptide consisting of 305 amino acids, the first 26 of which act as a leader signal while the 29 C-terminal amino acids are cleaved during processing of the molecule. The mature protein consists of 250 amino acids. Using the cDNA sequence encoding the mature protein of 250 amino acids, a recombinant protein was expressed, purified and characterized. The recombinant molecule had similar activity in a cell-free protein synthesis assay and had comparable toxicity on living cells as compared to the isolated native bouganin.

  17. First Molecular Identification and Phylogeny of Moroccan Anopheles sergentii (Diptera: Culicidae) Based on Second Internal Transcribed Spencer (ITS2) and Cytochrome c Oxidase I (COI) Sequences.

    PubMed

    Benabdelkrim Filali, Oumama; Kabine, Mostafa; El Hamouchi, Adil; Lemrani, Meryem; Debboun, Mustapha; Sarih, M'hammed

    2018-06-05

    Anopheles sergentii known as the "oasis vector" or the "desert malaria vector" is considered the main vector of malaria in the southern parts of Morocco. Its presence in Morocco is confirmed for the first time through sequencing of mitochondrial DNA (mDNA) cytochrome c oxidase subunit I (COI) barcodes and nuclear ribosomal DNA (rDNA) second internal transcribed spacer (ITS2) sequences and direct comparison with specimens of A. sergentii of other countries. The DNA barcodes (n = 39) obtained from A. sergentii collected in 2015 and 2016 showed more diversity with 10 haplotypes, compared with 3 haplotypes obtained from ITS2 sequences (n = 59). Moreover, the comparison using the ITS2 sequences showed closer evolutionary relationship between the Moroccan and Egyptian strains than the Iranian strain. Nevertheless, genetic differences due to geographical segregation were also observed. This study provides the first report on the sequence of rDNA-ITS2 and mtDNA COI, which could be used to better understand the biodiversity of A. sergentii.

  18. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  19. Alignment of high-throughput sequencing data inside in-memory databases.

    PubMed

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  20. Single nucleotide polymorphisms in common bean: their discovery and genotyping using a multiplex detection system

    USDA-ARS?s Scientific Manuscript database

    Single-nucleotide Polymorphism (SNP) markers are by far the most common form of DNA polymorphism in a genome. The objectives of this study were to discover SNPs in common bean comparing sequences from coding and non-coding regions obtained from Genbank and genomic DNA and to compare sequencing resu...

  1. DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server

    PubMed Central

    Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113

  2. Mapping DNA polymerase errors by single-molecule sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, David F.; Lu, Jenny; Chang, Seungwoo

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less

  3. Mapping DNA polymerase errors by single-molecule sequencing

    DOE PAGES

    Lee, David F.; Lu, Jenny; Chang, Seungwoo; ...

    2016-05-16

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less

  4. Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

    PubMed

    King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

    2014-01-01

    Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

  5. Nucleotide Sequence Database Comparison for Routine Dermatophyte Identification by Internal Transcribed Spacer 2 Genetic Region DNA Barcoding.

    PubMed

    Normand, A C; Packeu, A; Cassagne, C; Hendrickx, M; Ranque, S; Piarroux, R

    2018-05-01

    Conventional dermatophyte identification is based on morphological features. However, recent studies have proposed to use the nucleotide sequences of the rRNA internal transcribed spacer (ITS) region as an identification barcode of all fungi, including dermatophytes. Several nucleotide databases are available to compare sequences and thus identify isolates; however, these databases often contain mislabeled sequences that impair sequence-based identification. We evaluated five of these databases on a clinical isolate panel. We selected 292 clinical dermatophyte strains that were prospectively subjected to an ITS2 nucleotide sequence analysis. Sequences were analyzed against the databases, and the results were compared to clusters obtained via DNA alignment of sequence segments. The DNA tree served as the identification standard throughout the study. According to the ITS2 sequence identification, the majority of strains (255/292) belonged to the genus Trichophyton , mainly T. rubrum complex ( n = 184), T. interdigitale ( n = 40), T. tonsurans ( n = 26), and T. benhamiae ( n = 5). Other genera included Microsporum (e.g., M. canis [ n = 21], M. audouinii [ n = 10], Nannizzia gypsea [ n = 3], and Epidermophyton [ n = 3]). Species-level identification of T. rubrum complex isolates was an issue. Overall, ITS DNA sequencing is a reliable tool to identify dermatophyte species given that a comprehensive and correctly labeled database is consulted. Since many inaccurate identification results exist in the DNA databases used for this study, reference databases must be verified frequently and amended in line with the current revisions of fungal taxonomy. Before describing a new species or adding a new DNA reference to the available databases, its position in the phylogenetic tree must be verified. Copyright © 2018 American Society for Microbiology.

  6. Molecular Identification and Databases in Fusarium

    USDA-ARS?s Scientific Manuscript database

    DNA sequence-based methods for identifying pathogenic and mycotoxigenic Fusarium isolates have become the gold standard worldwide. Moreover, fusarial DNA sequence data are increasing rapidly in several web-accessible databases for comparative purposes. Unfortunately, the use of Basic Alignment Sea...

  7. An accurate algorithm for the detection of DNA fragments from dilution pool sequencing experiments.

    PubMed

    Bansal, Vikas

    2018-01-01

    The short read lengths of current high-throughput sequencing technologies limit the ability to recover long-range haplotype information. Dilution pool methods for preparing DNA sequencing libraries from high molecular weight DNA fragments enable the recovery of long DNA fragments from short sequence reads. These approaches require computational methods for identifying the DNA fragments using aligned sequence reads and assembling the fragments into long haplotypes. Although a number of computational methods have been developed for haplotype assembly, the problem of identifying DNA fragments from dilution pool sequence data has not received much attention. We formulate the problem of detecting DNA fragments from dilution pool sequencing experiments as a genome segmentation problem and develop an algorithm that uses dynamic programming to optimize a likelihood function derived from a generative model for the sequence reads. This algorithm uses an iterative approach to automatically infer the mean background read depth and the number of fragments in each pool. Using simulated data, we demonstrate that our method, FragmentCut, has 25-30% greater sensitivity compared with an HMM based method for fragment detection and can also detect overlapping fragments. On a whole-genome human fosmid pool dataset, the haplotypes assembled using the fragments identified by FragmentCut had greater N50 length, 16.2% lower switch error rate and 35.8% lower mismatch error rate compared with two existing methods. We further demonstrate the greater accuracy of our method using two additional dilution pool datasets. FragmentCut is available from https://bansal-lab.github.io/software/FragmentCut. vibansal@ucsd.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  8. Mitochondrial DNA mutations in single human blood cells.

    PubMed

    Yao, Yong-Gang; Kajigaya, Sachiko; Young, Neal S

    2015-09-01

    Determination mitochondrial DNA (mtDNA) sequences from extremely small amounts of DNA extracted from tissue of limited amounts and/or degraded samples is frequently employed in medical, forensic, and anthropologic studies. Polymerase chain reaction (PCR) amplification followed by DNA cloning is a routine method, especially to examine heteroplasmy of mtDNA mutations. In this review, we compare the mtDNA mutation patterns detected by three different sequencing strategies. Cloning and sequencing methods that are based on PCR amplification of DNA extracted from either single cells or pooled cells yield a high frequency of mutations, partly due to the artifacts introduced by PCR and/or the DNA cloning process. Direct sequencing of PCR product which has been amplified from DNA in individual cells is able to detect the low levels of mtDNA mutations present within a cell. We further summarize the findings in our recent studies that utilized this single cell method to assay mtDNA mutation patterns in different human blood cells. Our data show that many somatic mutations observed in the end-stage differentiated cells are found in hematopoietic stem cells (HSCs) and progenitors within the CD34(+) cell compartment. Accumulation of mtDNA variations in the individual CD34+ cells is affected by both aging and family genetic background. Granulocytes harbor higher numbers of mutations compared with the other cells, such as CD34(+) cells and lymphocytes. Serial assessment of mtDNA mutations in a population of single CD34(+) cells obtained from the same donor over time suggests stability of some somatic mutations. CD34(+) cell clones from a donor marked by specific mtDNA somatic mutations can be found in the recipient after transplantation. The significance of these findings is discussed in terms of the lineage tracing of HSCs, aging effect on accumulation of mtDNA mutations and the usage of mtDNA sequence in forensic identification. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Organization and evolution of highly repeated satellite DNA sequences in plant chromosomes.

    PubMed

    Sharma, S; Raina, S N

    2005-01-01

    A major component of the plant nuclear genome is constituted by different classes of repetitive DNA sequences. The structural, functional and evolutionary aspects of the satellite repetitive DNA families, and their organization in the chromosomes is reviewed. The tandem satellite DNA sequences exhibit characteristic chromosomal locations, usually at subtelomeric and centromeric regions. The repetitive DNA family(ies) may be widely distributed in a taxonomic family or a genus, or may be specific for a species, genome or even a chromosome. They may acquire large-scale variations in their sequence and copy number over an evolutionary time-scale. These features have formed the basis of extensive utilization of repetitive sequences for taxonomic and phylogenetic studies. Hybrid polyploids have especially proven to be excellent models for studying the evolution of repetitive DNA sequences. Recent studies explicitly show that some repetitive DNA families localized at the telomeres and centromeres have acquired important structural and functional significance. The repetitive elements are under different evolutionary constraints as compared to the genes. Satellite DNA families are thought to arise de novo as a consequence of molecular mechanisms such as unequal crossing over, rolling circle amplification, replication slippage and mutation that constitute "molecular drive". Copyright 2005 S. Karger AG, Basel.

  10. Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

    PubMed Central

    2012-01-01

    Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742

  11. Escaping introns in COI through cDNA barcoding of mushrooms: Pleurotus as a test case.

    PubMed

    Avin, Farhat A; Subha, Bhassu; Tan, Yee-Shin; Braukmann, Thomas W A; Vikineswary, Sabaratnam; Hebert, Paul D N

    2017-09-01

    DNA barcoding involves the use of one or more short, standardized DNA fragments for the rapid identification of species. A 648-bp segment near the 5' terminus of the mitochondrial cytochrome c oxidase subunit I (COI) gene has been adopted as the universal DNA barcode for members of the animal kingdom, but its utility in mushrooms is complicated by the frequent occurrence of large introns. As a consequence, ITS has been adopted as the standard DNA barcode marker for mushrooms despite several shortcomings. This study employed newly designed primers coupled with cDNA analysis to examine COI sequence diversity in six species of Pleurotus and compared these results with those for ITS. The ability of the COI gene to discriminate six species of Pleurotus , the commonly cultivated oyster mushroom, was examined by analysis of cDNA. The amplification success, sequence variation within and among species, and the ability to design effective primers was tested. We compared ITS sequences to their COI cDNA counterparts for all isolates. ITS discriminated between all six species, but some sequence results were uninterpretable, because of length variation among ITS copies. By comparison, a complete COI sequences were recovered from all but three individuals of Pleurotus giganteus where only the 5' region was obtained. The COI sequences permitted the resolution of all species when partial data was excluded for P. giganteus . Our results suggest that COI can be a useful barcode marker for mushrooms when cDNA analysis is adopted, permitting identifications in cases where ITS cannot be recovered or where it offers higher resolution when fresh tissue is. The suitability of this approach remains to be confirmed for other mushrooms.

  12. Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries

    PubMed Central

    Carpenter, Meredith L.; Buenrostro, Jason D.; Valdiosera, Cristina; Schroeder, Hannes; Allentoft, Morten E.; Sikora, Martin; Rasmussen, Morten; Gravel, Simon; Guillén, Sonia; Nekhrizov, Georgi; Leshtakov, Krasimir; Dimitrova, Diana; Theodossiev, Nikola; Pettener, Davide; Luiselli, Donata; Sandoval, Karla; Moreno-Estrada, Andrés; Li, Yingrui; Wang, Jun; Gilbert, M. Thomas P.; Willerslev, Eske; Greenleaf, William J.; Bustamante, Carlos D.

    2013-01-01

    Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062–147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217–73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples. PMID:24568772

  13. A simple algorithm for quantifying DNA methylation levels on multiple independent CpG sites in bisulfite genomic sequencing electropherograms.

    PubMed

    Leakey, Tatiana I; Zielinski, Jerzy; Siegfried, Rachel N; Siegel, Eric R; Fan, Chun-Yang; Cooney, Craig A

    2008-06-01

    DNA methylation at cytosines is a widely studied epigenetic modification. Methylation is commonly detected using bisulfite modification of DNA followed by PCR and additional techniques such as restriction digestion or sequencing. These additional techniques are either laborious, require specialized equipment, or are not quantitative. Here we describe a simple algorithm that yields quantitative results from analysis of conventional four-dye-trace sequencing. We call this method Mquant and we compare it with the established laboratory method of combined bisulfite restriction assay (COBRA). This analysis of sequencing electropherograms provides a simple, easily applied method to quantify DNA methylation at specific CpG sites.

  14. Detection of Bacterial Pathogens from Broncho-Alveolar Lavage by Next-Generation Sequencing.

    PubMed

    Leo, Stefano; Gaïa, Nadia; Ruppé, Etienne; Emonet, Stephane; Girard, Myriam; Lazarevic, Vladimir; Schrenzel, Jacques

    2017-09-20

    The applications of whole-metagenome shotgun sequencing (WMGS) in routine clinical analysis are still limited. A combination of a DNA extraction procedure, sequencing, and bioinformatics tools is essential for the removal of human DNA and for improving bacterial species identification in a timely manner. We tackled these issues with a broncho-alveolar lavage (BAL) sample from an immunocompromised patient who had developed severe chronic pneumonia. We extracted DNA from the BAL sample with protocols based either on sequential lysis of human and bacterial cells or on the mechanical disruption of all cells. Metagenomic libraries were sequenced on Illumina HiSeq platforms. Microbial community composition was determined by k-mer analysis or by mapping to taxonomic markers. Results were compared to those obtained by conventional clinical culture and molecular methods. Compared to mechanical cell disruption, a sequential lysis protocol resulted in a significantly increased proportion of bacterial DNA over human DNA and higher sequence coverage of Mycobacterium abscessus , Corynebacterium jeikeium and Rothia dentocariosa , the bacteria reported by clinical microbiology tests. In addition, we identified anaerobic bacteria not searched for by the clinical laboratory. Our results further support the implementation of WMGS in clinical routine diagnosis for bacterial identification.

  15. Rényi continuous entropy of DNA sequences.

    PubMed

    Vinga, Susana; Almeida, Jonas S

    2004-12-07

    Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon's formalism is proposed. Renyi's quadratic entropy calculated with Parzen window density estimation method applied to CGR/USM continuous maps of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behaviour of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. The results obtained were compared with the distributions of the null model of randomness obtained by simulation. The biological sequences have shown a different p-value according to the kernel resolution of Parzen's method, which might indicate an unknown level of organization of their patterns. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation. The main MATLAB applications developed and additional material are available at the webpage . Specialized functions can be obtained from the authors.

  16. Application and comparison of large-scale solution-based DNA capture-enrichment methods on ancient DNA

    PubMed Central

    Ávila-Arcos, María C.; Cappellini, Enrico; Romero-Navarro, J. Alberto; Wales, Nathan; Moreno-Mayar, J. Víctor; Rasmussen, Morten; Fordyce, Sarah L.; Montiel, Rafael; Vielle-Calzada, Jean-Philippe; Willerslev, Eske; Gilbert, M. Thomas P.

    2011-01-01

    The development of second-generation sequencing technologies has greatly benefitted the field of ancient DNA (aDNA). Its application can be further exploited by the use of targeted capture-enrichment methods to overcome restrictions posed by low endogenous and contaminating DNA in ancient samples. We tested the performance of Agilent's SureSelect and Mycroarray's MySelect in-solution capture systems on Illumina sequencing libraries built from ancient maize to identify key factors influencing aDNA capture experiments. High levels of clonality as well as the presence of multiple-copy sequences in the capture targets led to biases in the data regardless of the capture method. Neither method consistently outperformed the other in terms of average target enrichment, and no obvious difference was observed either when two tiling designs were compared. In addition to demonstrating the plausibility of capturing aDNA from ancient plant material, our results also enable us to provide useful recommendations for those planning targeted-sequencing on aDNA. PMID:22355593

  17. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

    PubMed

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-11-16

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Identification of Forensic Samples via Mitochondrial DNA in the Undergraduate Biochemistry Laboratory

    NASA Astrophysics Data System (ADS)

    Millard, Julie T.; Pilon, André M.

    2003-04-01

    A recent forensic approach for identification of unknown biological samples is mitochondrial DNA (mtDNA) sequencing. We describe a laboratory exercise suitable for an undergraduate biochemistry course in which the polymerase chain reaction is used to amplify a 440 base pair hypervariable region of human mtDNA from a variety of "crime scene" samples (e.g., teeth, hair, nails, cigarettes, envelope flaps, toothbrushes, and chewing gum). Amplification is verified via agarose gel electrophoresis and then samples are subjected to cycle sequencing. Sequence alignments are made via the program CLUSTAL W, allowing students to compare samples and solve the "crime."

  19. Whole-comparative genomic hybridization in domestic sheep (Ovis aries) breeds.

    PubMed

    Dávila-Rodríguez, M I; Cortés-Gutiérrez, E I; López-Fernández, C; Pita, M; Mezzanotte, R; Gosálvez, J

    2009-01-01

    Whole-comparative genomic hybridization (W-CGH) allows identification of chromosomal polymorphisms related to highly repetitive DNA sequences localized in constitutive heterochromatin. Such polymorphisms are detected establishing competition between genomic DNAs in an in situ hybridization environment without subtraction of highly repetitive DNA sequences, when comparing two species from closely related taxa (same species, sub-species, or breeds) or somewhat related taxa. This experimental approach was applied to investigating differences in highly repetitive sequences of three sheep breeds (Castellana, Ojalada, and Assaf). To this end, W-CGH was carried out using mouflon (sheep ancestor) chromosomes as a common target to co-hybridize equimolar quantities of two genomic DNAs obtained from either Castellana, Ojalada or Assaf sheep breeds. The results showed that the amount of constitutive heterochromatin is greater in all pericentromeric heterochromatin regions of acrocentric chromosomes than in metacentric or sex chromosomes. Additionally, when W-CGH was performed using DNAs from the Iberian breeds Castellana and Ojalada, chromosomal pericentromeric regions revealed quantitatively and qualitatively a presence of DNA families similar to that obtained from any of the above-cited breeds. On the contrary, when the DNA used in W-CGH experiments was obtained from Assaf, as compared to either Castellana or Ojalada, two different pericentromeric DNA families of highly repetitive sequences could be detected. Lastly, sex chromosomes were shown to be homogeneous among all breeds and thus revealed no detectable constitutive heterochromatin. W-CGH results were confirmed using DNA breakage detection-FISH experiments (DBD-FISH) carried out on lymphocytes. As a whole, the results showed that two different repetitive DNA families are present in the pericentromeric heterochromatin of the sheep breeds studied here. Additionally, they suggest a differential presence of these distinct repetitive DNA families in Castellana and Ojalada breeds as compared to the Assaf breed. Finally, the results of W-CGH after using mouflon as the targeted chromosomes also show that the two DNA families are present in the ancestor. Copyright 2009 S. Karger AG, Basel.

  20. Toward a Better Compression for DNA Sequences Using Huffman Encoding

    PubMed Central

    Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

    2017-01-01

    Abstract Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016). PMID:27960065

  1. Toward a Better Compression for DNA Sequences Using Huffman Encoding.

    PubMed

    Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

    2017-04-01

    Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).

  2. Mesoscopic modeling of DNA denaturation rates: Sequence dependence and experimental comparison

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dahlen, Oda, E-mail: oda.dahlen@ntnu.no; Erp, Titus S. van, E-mail: titus.van.erp@ntnu.no

    Using rare event simulation techniques, we calculated DNA denaturation rate constants for a range of sequences and temperatures for the Peyrard-Bishop-Dauxois (PBD) model with two different parameter sets. We studied a larger variety of sequences compared to previous studies that only consider DNA homopolymers and DNA sequences containing an equal amount of weak AT- and strong GC-base pairs. Our results show that, contrary to previous findings, an even distribution of the strong GC-base pairs does not always result in the fastest possible denaturation. In addition, we applied an adaptation of the PBD model to study hairpin denaturation for which experimentalmore » data are available. This is the first quantitative study in which dynamical results from the mesoscopic PBD model have been compared with experiments. Our results show that present parameterized models, although giving good results regarding thermodynamic properties, overestimate denaturation rates by orders of magnitude. We believe that our dynamical approach is, therefore, an important tool for verifying DNA models and for developing next generation models that have higher predictive power than present ones.« less

  3. Plant DNA sequences from feces: potential means for assessing diets of wild primates.

    PubMed

    Bradley, Brenda J; Stiller, Mathias; Doran-Sheehy, Diane M; Harris, Tara; Chapman, Colin A; Vigilant, Linda; Poinar, Hendrik

    2007-06-01

    Analyses of plant DNA in feces provides a promising, yet largely unexplored, means of documenting the diets of elusive primates. Here we demonstrate the promise and pitfalls of this approach using DNA extracted from fecal samples of wild western gorillas (Gorilla gorilla) and black and white colobus monkeys (Colobus guereza). From these DNA extracts we amplified, cloned, and sequenced small segments of chloroplast DNA (part of the rbcL gene) and plant nuclear DNA (ITS-2). The obtained sequences were compared to sequences generated from known plant samples and to those in GenBank to identify plant taxa in the feces. With further optimization, this method could provide a basic evaluation of minimum primate dietary diversity even when knowledge of local flora is limited. This approach may find application in studies characterizing the diets of poorly-known, unhabituated primate species or assaying consumer-resource relationships in an ecosystem. (c) 2007 Wiley-Liss, Inc.

  4. Conserved Sequences at the Origin of Adenovirus DNA Replication

    PubMed Central

    Stillman, Bruce W.; Topp, William C.; Engler, Jeffrey A.

    1982-01-01

    The origin of adenovirus DNA replication lies within an inverted sequence repetition at either end of the linear, double-stranded viral DNA. Initiation of DNA replication is primed by a deoxynucleoside that is covalently linked to a protein, which remains bound to the newly synthesized DNA. We demonstrate that virion-derived DNA-protein complexes from five human adenovirus serological subgroups (A to E) can act as a template for both the initiation and the elongation of DNA replication in vitro, using nuclear extracts from adenovirus type 2 (Ad2)-infected HeLa cells. The heterologous template DNA-protein complexes were not as active as the homologous Ad2 DNA, most probably due to inefficient initiation by Ad2 replication factors. In an attempt to identify common features which may permit this replication, we have also sequenced the inverted terminal repeated DNA from human adenovirus serotypes Ad4 (group E), Ad9 and Ad10 (group D), and Ad31 (group A), and we have compared these to previously determined sequences from Ad2 and Ad5 (group C), Ad7 (group B), and Ad12 and Ad18 (group A) DNA. In all cases, the sequence around the origin of DNA replication can be divided into two structural domains: a proximal A · T-rich region which is partially conserved among these serotypes, and a distal G · C-rich region which is less well conserved. The G · C-rich region contains sequences similar to sequences present in papovavirus replication origins. The two domains may reflect a dual mechanism for initiation of DNA replication: adenovirus-specific protein priming of replication, and subsequent utilization of this primer by host replication factors for completion of DNA synthesis. Images PMID:7143575

  5. Non-B-DNA structures on the interferon-beta promoter?

    PubMed

    Robbe, K; Bonnefoy, E

    1998-01-01

    The high mobility group (HMG) I protein intervenes as an essential factor during the virus induced expression of the interferon-beta (IFN-beta) gene. It is a non-histone chromatine associated protein that has the dual capacity of binding to a non-B-DNA structure such as cruciform-DNA as well as to AT rich B-DNA sequences. In this work we compare the binding affinity of HMGI for a synthetic cruciform-DNA to its binding affinity for the HMGI-binding-site present in the positive regulatory domain II (PRDII) of the IFN-beta promoter. Using gel retardation experiments, we show that HMGI protein binds with at least ten times more affinity to the synthetic cruciform-DNA structure than to the PRDII B-DNA sequence. DNA hairpin sequences are present in both the human and the murine PRDII-DNAs. We discuss in this work the presence of, yet putative, non-B-DNA structures in the IFN-beta promoter.

  6. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications.

    PubMed

    Xie, Guosen; Mo, Zhongxi

    2011-01-21

    In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species. Copyright © 2010 Elsevier Ltd. All rights reserved.

  7. Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences.

    PubMed

    Bergman, C M; Kreitman, M

    2001-08-01

    Comparative genomic approaches to gene and cis-regulatory prediction are based on the principle that differential DNA sequence conservation reflects variation in functional constraint. Using this principle, we analyze noncoding sequence conservation in Drosophila for 40 loci with known or suspected cis-regulatory function encompassing >100 kb of DNA. We estimate the fraction of noncoding DNA conserved in both intergenic and intronic regions and describe the length distribution of ungapped conserved noncoding blocks. On average, 22%-26% of noncoding sequences surveyed are conserved in Drosophila, with median block length approximately 19 bp. We show that point substitution in conserved noncoding blocks exhibits transition bias as well as lineage effects in base composition, and occurs more than an order of magnitude more frequently than insertion/deletion (indel) substitution. Overall, patterns of noncoding DNA structure and evolution differ remarkably little between intergenic and intronic conserved blocks, suggesting that the effects of transcription per se contribute minimally to the constraints operating on these sequences. The results of this study have implications for the development of alignment and prediction algorithms specific to noncoding DNA, as well as for models of cis-regulatory DNA sequence evolution.

  8. Basic quantitative polymerase chain reaction using real-time fluorescence measurements.

    PubMed

    Ares, Manuel

    2014-10-01

    This protocol uses quantitative polymerase chain reaction (qPCR) to measure the number of DNA molecules containing a specific contiguous sequence in a sample of interest (e.g., genomic DNA or cDNA generated by reverse transcription). The sample is subjected to fluorescence-based PCR amplification and, theoretically, during each cycle, two new duplex DNA molecules are produced for each duplex DNA molecule present in the sample. The progress of the reaction during PCR is evaluated by measuring the fluorescence of dsDNA-dye complexes in real time. In the early cycles, DNA duplication is not detected because inadequate amounts of DNA are made. At a certain threshold cycle, DNA-dye complexes double each cycle for 8-10 cycles, until the DNA concentration becomes so high and the primer concentration so low that the reassociation of the product strands blocks efficient synthesis of new DNA and the reaction plateaus. There are two types of measurements: (1) the relative change of the target sequence compared to a reference sequence and (2) the determination of molecule number in the starting sample. The first requires a reference sequence, and the second requires a sample of the target sequence with known numbers of the molecules of sequence to generate a standard curve. By identifying the threshold cycle at which a sample first begins to accumulate DNA-dye complexes exponentially, an estimation of the numbers of starting molecules in the sample can be extrapolated. © 2014 Cold Spring Harbor Laboratory Press.

  9. Cloning and sequencing of a laccase gene from the lignin-degrading basidiomycete Pleurotus ostreatus.

    PubMed Central

    Giardina, P; Cannio, R; Martirani, L; Marzullo, L; Palmieri, G; Sannia, G

    1995-01-01

    The gene (pox1) encoding a phenol oxidase from Pleurotus ostreatus, a lignin-degrading basidiomycete, was cloned and sequenced, and the corresponding pox1 cDNA was also synthesized and sequenced. The isolated gene consists of 2,592 bp, with the coding sequence being interrupted by 19 introns and flanked by an upstream region in which putative CAAT and TATA consensus sequences could be identified at positions -174 and -84, respectively. The isolation of a second cDNA (pox2 cDNA), showing 84% similarity, and of the corresponding truncated genomic clones demonstrated the existence of a multigene family coding for isoforms of laccase in P. ostreatus. PCR amplifications of specific regions on the DNA of isolated monokaryons proved that the two genes are not allelic forms. The POX1 amino acid sequence deduced was compared with those of other known laccases from different fungi. PMID:7793961

  10. Comparative molecular cytogenetic analyses of a major tandemly repeated DNA family and retrotransposon sequences in cultivated jute Corchorus species (Malvaceae).

    PubMed

    Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas

    2013-07-01

    The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100-500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S-5·8S-25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species.

  11. Comparative genomics and repetitive sequence divergence in the species of diploid Nicotiana section Alatae.

    PubMed

    Lim, K Yoong; Kovarik, Ales; Matyasek, Roman; Chase, Mark W; Knapp, Sandra; McCarthy, Elizabeth; Clarkson, James J; Leitch, Andrew R

    2006-12-01

    Combining phylogenetic reconstructions of species relationships with comparative genomic approaches is a powerful way to decipher evolutionary events associated with genome divergence. Here, we reconstruct the history of karyotype and tandem repeat evolution in species of diploid Nicotiana section Alatae. By analysis of plastid DNA, we resolved two clades with high bootstrap support, one containing N. alata, N. langsdorffii, N. forgetiana and N. bonariensis (called the n = 9 group) and another containing N. plumbaginifolia and N. longiflora (called the n = 10 group). Despite little plastid DNA sequence divergence, we observed, via fluorescent in situ hybridization, substantial chromosomal repatterning, including altered chromosome numbers, structure and distribution of repeats. Effort was focussed on 35S and 5S nuclear ribosomal DNA (rDNA) and the HRS60 satellite family of tandem repeats comprising the elements HRS60, NP3R and NP4R. We compared divergence of these repeats in diploids and polyploids of Nicotiana. There are dramatic shifts in the distribution of the satellite repeats and complete replacement of intergenic spacers (IGSs) of 35S rDNA associated with divergence of the species in section Alatae. We suggest that sequence homogenization has replaced HRS60 family repeats at sub-telomeric regions, but that this process may not occur, or occurs more slowly, when the repeats are found at intercalary locations. Sequence homogenization acts more rapidly (at least two orders of magnitude) on 35S rDNA than 5S rDNA and sub-telomeric satellite sequences. This rapid rate of divergence is analogous to that found in polyploid species, and is therefore, in plants, not only associated with polyploidy.

  12. USE OF COMPETITIVE DNA HYBRIDIZATION TO IDENTIFY DIFFERENCES IN THE GENOMES OF TWO CLOSELY RELATED FECAL INDICATOR BACTERIA

    EPA Science Inventory

    Although recent technological advances in DNA sequencing and computational biology now allow scientists to compare entire microbial genomes, comparisons of closely related bacterial species and individual isolates by whole-genome sequencing approaches remains prohibitively expens...

  13. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    PubMed

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  14. [Structural organization of 5S ribosomal DNA of Rosa rugosa].

    PubMed

    Tynkevych, Iu O; Volkov, R A

    2014-01-01

    In order to clarify molecular organization of the genomic region encoding 5S rRNA in diploid species Rosa rugosa several 5S rDNA repeated units were cloned and sequenced. Analysis of the obtained sequences revealed that only one length variant of 5S rDNA repeated units, which contains intact promoter elements in the intergenic spacer region (IGS) and appears to be transcriptionally active is present in the genome. Additionally, a limited number of 5S rDNA pseudogenes lacking a portion of coding sequence and the complete IGS was detected. A high level of sequence similarity (from 93.7 to 97.5%) between the IGS of major 5S rDNA variants of East Asian R. rugosa and North American R. nitida was found indicating comparatively recent divergence of these species.

  15. Plasmodium falciparum Nucleosomes Exhibit Reduced Stability and Lost Sequence Dependent Nucleosome Positioning

    PubMed Central

    Silberhorn, Elisabeth; Schwartz, Uwe; Symelka, Anne; de Koning-Ward, Tania; Längst, Gernot

    2016-01-01

    The packaging and organization of genomic DNA into chromatin represents an additional regulatory layer of gene expression, with specific nucleosome positions that restrict the accessibility of regulatory DNA elements. The mechanisms that position nucleosomes in vivo are thought to depend on the biophysical properties of the histones, sequence patterns, like phased di-nucleotide repeats and the architecture of the histone octamer that folds DNA in 1.65 tight turns. Comparative studies of human and P. falciparum histones reveal that the latter have a strongly reduced ability to recognize internal sequence dependent nucleosome positioning signals. In contrast, the nucleosomes are positioned by AT-repeat sequences flanking nucleosomes in vivo and in vitro. Further, the strong sequence variations in the plasmodium histones, compared to other mammalian histones, do not present adaptations to its AT-rich genome. Human and parasite histones bind with higher affinity to GC-rich DNA and with lower affinity to AT-rich DNA. However, the plasmodium nucleosomes are overall less stable, with increased temperature induced mobility, decreased salt stability of the histones H2A and H2B and considerable reduced binding affinity to GC-rich DNA, as compared with the human nucleosomes. In addition, we show that plasmodium histone octamers form the shortest known nucleosome repeat length (155bp) in vitro and in vivo. Our data suggest that the biochemical properties of the parasite histones are distinct from the typical characteristics of other eukaryotic histones and these properties reflect the increased accessibility of the P. falciparum genome. PMID:28033404

  16. Identification of Bacterial Species in Kuwaiti Waters Through DNA Sequencing

    NASA Astrophysics Data System (ADS)

    Chen, K.

    2017-01-01

    With an objective of identifying the bacterial diversity associated with ecosystem of various Kuwaiti Seas, bacteria were cultured and isolated from 3 water samples. Due to the difficulties for cultured and isolated fecal coliforms on the selective agar plates, bacterial isolates from marine agar plates were selected for molecular identification. 16S rRNA genes were successfully amplified from the genome of the selected isolates using Universal Eubacterial 16S rRNA primers. The resulted amplification products were subjected to automated DNA sequencing. Partial 16S rDNA sequences obtained were compared directly with sequences in the NCBI database using BLAST as well as with the sequences available with Ribosomal Database Project (RDP).

  17. High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.

    PubMed

    Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie

    2015-06-17

    High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This sequence structure can be harnessed to improve bioinformatics algorithms, in particular for CNV and structural variant detection. Descriptive measures for cell-free DNA features developed here could also be used in biomarker analysis to monitor the changes that occur during different pathological conditions.

  18. Signatures of DNA Methylation across Insects Suggest Reduced DNA Methylation Levels in Holometabola

    PubMed Central

    Provataris, Panagiotis; Meusemann, Karen; Niehuis, Oliver; Grath, Sonja; Misof, Bernhard

    2018-01-01

    Abstract It has been experimentally shown that DNA methylation is involved in the regulation of gene expression and the silencing of transposable element activity in eukaryotes. The variable levels of DNA methylation among different insect species indicate an evolutionarily flexible role of DNA methylation in insects, which due to a lack of comparative data is not yet well-substantiated. Here, we use computational methods to trace signatures of DNA methylation across insects by analyzing transcriptomic and genomic sequence data from all currently recognized insect orders. We conclude that: 1) a functional methylation system relying exclusively on DNA methyltransferase 1 is widespread across insects. 2) DNA methylation has potentially been lost or extremely reduced in species belonging to springtails (Collembola), flies and relatives (Diptera), and twisted-winged parasites (Strepsiptera). 3) Holometabolous insects display signs of reduced DNA methylation levels in protein-coding sequences compared with hemimetabolous insects. 4) Evolutionarily conserved insect genes associated with housekeeping functions tend to display signs of heavier DNA methylation in comparison to the genomic/transcriptomic background. With this comparative study, we provide the much needed basis for experimental and detailed comparative analyses required to gain a deeper understanding on the evolution and function of DNA methylation in insects. PMID:29697817

  19. Brief Guide to Genomics: DNA, Genes and Genomes

    MedlinePlus

    ... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...

  20. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    PubMed

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-02-14

    The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.

  1. High Resolution Size Analysis of Fetal DNA in the Urine of Pregnant Women by Paired-End Massively Parallel Sequencing

    PubMed Central

    Tsui, Nancy B. Y.; Jiang, Peiyong; Chow, Katherine C. K.; Su, Xiaoxi; Leung, Tak Y.; Sun, Hao; Chan, K. C. Allen; Chiu, Rossa W. K.; Lo, Y. M. Dennis

    2012-01-01

    Background Fetal DNA in maternal urine, if present, would be a valuable source of fetal genetic material for noninvasive prenatal diagnosis. However, the existence of fetal DNA in maternal urine has remained controversial. The issue is due to the lack of appropriate technology to robustly detect the potentially highly degraded fetal DNA in maternal urine. Methodology We have used massively parallel paired-end sequencing to investigate cell-free DNA molecules in maternal urine. Catheterized urine samples were collected from seven pregnant women during the third trimester of pregnancies. We detected fetal DNA by identifying sequenced reads that contained fetal-specific alleles of the single nucleotide polymorphisms. The sizes of individual urinary DNA fragments were deduced from the alignment positions of the paired reads. We measured the fractional fetal DNA concentration as well as the size distributions of fetal and maternal DNA in maternal urine. Principal Findings Cell-free fetal DNA was detected in five of the seven maternal urine samples, with the fractional fetal DNA concentrations ranged from 1.92% to 4.73%. Fetal DNA became undetectable in maternal urine after delivery. The total urinary cell-free DNA molecules were less intact when compared with plasma DNA. Urinary fetal DNA fragments were very short, and the most dominant fetal sequences were between 29 bp and 45 bp in length. Conclusions With the use of massively parallel sequencing, we have confirmed the existence of transrenal fetal DNA in maternal urine, and have shown that urinary fetal DNA was heavily degraded. PMID:23118982

  2. Local alignment of two-base encoded DNA sequence

    PubMed Central

    Homer, Nils; Merriman, Barry; Nelson, Stanley F

    2009-01-01

    Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732

  3. Comparison of dkgB-linked intergenic sequence ribotyping to DNA microarray hybridization for assigning serotype to Salmonella enterica

    PubMed Central

    Guard, Jean; Sanchez-Ingunza, Roxana; Morales, Cesar; Stewart, Tod; Liljebjelke, Karen; Kessel, JoAnn; Ingram, Kim; Jones, Deana; Jackson, Charlene; Fedorka-Cray, Paula; Frye, Jonathan; Gast, Richard; Hinton, Arthur

    2012-01-01

    Two DNA-based methods were compared for the ability to assign serotype to 139 isolates of Salmonella enterica ssp. I. Intergenic sequence ribotyping (ISR) evaluated single nucleotide polymorphisms occurring in a 5S ribosomal gene region and flanking sequences bordering the gene dkgB. A DNA microarray hybridization method that assessed the presence and the absence of sets of genes was the second method. Serotype was assigned for 128 (92.1%) of submissions by the two DNA methods. ISR detected mixtures of serotypes within single colonies and it cost substantially less than Kauffmann–White serotyping and DNA microarray hybridization. Decreasing the cost of serotyping S. enterica while maintaining reliability may encourage routine testing and research. PMID:22998607

  4. Genome organization and DNA methylation patterns of B chromosomes in the red fox and Chinese raccoon dogs.

    PubMed

    Bugno-Poniewierska, Monika; Solek, Przemysław; Wronski, Mariusz; Potocki, Leszek; Jezewska-Witkowska, Grażyna; Wnuk, Maciej

    2014-12-01

    The molecular structure of B chromosomes (Bs) is relatively well studied. Previous research demonstrates that Bs of various species usually contain two types of repetitive DNA sequences, satellite DNA and ribosomal DNA, but Bs also contain genes encoding histone proteins and many others. However, many questions remain regarding the origin and function of these chromosomes. Here, we focused on the comparative cytogenetic characteristics of the red fox and Chinese raccoon dog B chromosomes with particular attention to the distribution of repetitive DNA sequences and their methylation status. We confirmed that the small Bs of the red fox show a typical fluorescent telomeric distal signal, whereas medium-sized Bs of the Chinese raccoon dog were characterized by clusters of telomeric sequences along their length. We also found different DNA methylation patterns for the B chromosomes of both species. Therefore, we concluded that DNA methylation may maintain the transcriptional inactivation of DNA sequences localized to B chromosomes and may prevent genetic unbalancing and several negative phenotypic effects. © 2014 The Authors.

  5. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing

    PubMed Central

    Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis. PMID:26505622

  6. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    PubMed

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis.

  7. Early detection of non-native fishes using next-generation DNA sequencing of fish larvae

    EPA Science Inventory

    Our objective was to evaluate the use of fish larvae for early detection of non-native fishes, comparing traditional and molecular taxonomy based on next-generation DNA sequencing to investigate potential efficiencies. Our approach was to intensively sample a Great Lakes non-nati...

  8. A comparative study of ancient environmental DNA to pollen and macrofossils from lake sediments reveals taxonomic overlap and additional plant taxa

    NASA Astrophysics Data System (ADS)

    Pedersen, Mikkel Winther; Ginolhac, Aurélien; Orlando, Ludovic; Olsen, Jesper; Andersen, Kenneth; Holm, Jakob; Funder, Svend; Willerslev, Eske; Kjær, Kurt H.

    2013-09-01

    We use 2nd generation sequencing technology on sedimentary ancient DNA (sedaDNA) from a lake in South Greenland to reconstruct the local floristic history around a low-arctic lake and compare the results with those previously obtained from pollen and macrofossils in the same lake. Thirty-eight of thirty-nine samples from the core yielded putative DNA sequences. Using a multiple assignment strategy on the trnL g-h DNA barcode, consisting of two different phylogenetic and one sequence similarity assignment approaches, thirteen families of plants were identified, of which two (Scrophulariaceae and Asparagaceae) are absent from the pollen and macrofossil records. An age model for the sediment based on twelve radiocarbon dates establishes a chronology and shows that the lake record dates back to 10,650 cal yr BP. Our results suggest that sedaDNA analysis from lake sediments, although taxonomically less detailed than pollen and macrofossil analyses can be a complementary tool for establishing the composition of both terrestrial and aquatic local plant communities and a method for identifying additional taxa.

  9. DNA interactions with a Methylene Blue redox indicator depend on the DNA length and are sequence specific.

    PubMed

    Farjami, Elaheh; Clima, Lilia; Gothelf, Kurt V; Ferapontova, Elena E

    2010-06-01

    A DNA molecular beacon approach was used for the analysis of interactions between DNA and Methylene Blue (MB) as a redox indicator of a hybridization event. DNA hairpin structures of different length and guanine (G) content were immobilized onto gold electrodes in their folded states through the alkanethiol linker at the 5'-end. Binding of MB to the folded hairpin DNA was electrochemically studied and compared with binding to the duplex structure formed by hybridization of the hairpin DNA to a complementary DNA strand. Variation of the electrochemical signal from the DNA-MB complex was shown to depend primarily on the DNA length and sequence used: the G-C base pairs were the preferential sites of MB binding in the duplex. For short 20 nts long DNA sequences, the increased electrochemical response from MB bound to the duplex structure was consistent with the increased amount of bound and electrochemically readable MB molecules (i.e. MB molecules that are available for the electron transfer (ET) reaction with the electrode). With longer DNA sequences, the balance between the amounts of the electrochemically readable MB molecules bound to the hairpin DNA and to the hybrid was opposite: a part of the MB molecules bound to the long-sequence DNA duplex seem to be electrochemically mute due to long ET distance. The increasing electrochemical response from MB bound to the short-length DNA hybrid contrasts with the decreasing signal from MB bound to the long-length DNA hybrid and allows an "off"-"on" genosensor development.

  10. Phylogenetic Analysis of Shewanella Strains by DNA Relatedness Derived from Whole Genome Microarray DNA-DNA Hybridization and Comparison with Other Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Liyou; Yi, T. Y.; Van Nostrand, Joy

    Phylogenetic analyses were done for the Shewanella strains isolated from Baltic Sea (38 strains), US DOE Hanford Uranium bioremediation site [Hanford Reach of the Columbia River (HRCR), 11 strains], Pacific Ocean and Hawaiian sediments (8 strains), and strains from other resources (16 strains) with three out group strains, Rhodopseudomonas palustris, Clostridium cellulolyticum, and Thermoanaerobacter ethanolicus X514, using DNA relatedness derived from WCGA-based DNA-DNA hybridizations, sequence similarities of 16S rRNA gene and gyrB gene, and sequence similarities of 6 loci of Shewanella genome selected from a shared gene list of the Shewanella strains with whole genome sequenced based on the averagemore » nucleotide identity of them (ANI). The phylogenetic trees based on 16S rRNA and gyrB gene sequences, and DNA relatedness derived from WCGA hybridizations of the tested Shewanella strains share exactly the same sub-clusters with very few exceptions, in which the strains were basically grouped by species. However, the phylogenetic analysis based on DNA relatedness derived from WCGA hybridizations dramatically increased the differentiation resolution at species and strains level within Shewanella genus. When the tree based on DNA relatedness derived from WCGA hybridizations was compared to the tree based on the combined sequences of the selected functional genes (6 loci), we found that the resolutions of both methods are similar, but the clustering of the tree based on DNA relatedness derived from WMGA hybridizations was clearer. These results indicate that WCGA-based DNA-DNA hybridization is an idea alternative of conventional DNA-DNA hybridization methods and it is superior to the phylogenetics methods based on sequence similarities of single genes. Detailed analysis is being performed for the re-classification of the strains examined.« less

  11. The minimal amount of starting DNA for Agilent’s hybrid capture-based targeted massively parallel sequencing

    PubMed Central

    Chung, Jongsuk; Son, Dae-Soon; Jeon, Hyo-Jeong; Kim, Kyoung-Mee; Park, Gahee; Ryu, Gyu Ha; Park, Woong-Yang; Park, Donghyun

    2016-01-01

    Targeted capture massively parallel sequencing is increasingly being used in clinical settings, and as costs continue to decline, use of this technology may become routine in health care. However, a limited amount of tissue has often been a challenge in meeting quality requirements. To offer a practical guideline for the minimum amount of input DNA for targeted sequencing, we optimized and evaluated the performance of targeted sequencing depending on the input DNA amount. First, using various amounts of input DNA, we compared commercially available library construction kits and selected Agilent’s SureSelect-XT and KAPA Biosystems’ Hyper Prep kits as the kits most compatible with targeted deep sequencing using Agilent’s SureSelect custom capture. Then, we optimized the adapter ligation conditions of the Hyper Prep kit to improve library construction efficiency and adapted multiplexed hybrid selection to reduce the cost of sequencing. In this study, we systematically evaluated the performance of the optimized protocol depending on the amount of input DNA, ranging from 6.25 to 200 ng, suggesting the minimal input DNA amounts based on coverage depths required for specific applications. PMID:27220682

  12. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    PubMed

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  13. To Clone or Not To Clone: Method Analysis for Retrieving Consensus Sequences In Ancient DNA Samples

    PubMed Central

    Winters, Misa; Barta, Jodi Lynn; Monroe, Cara; Kemp, Brian M.

    2011-01-01

    The challenges associated with the retrieval and authentication of ancient DNA (aDNA) evidence are principally due to post-mortem damage which makes ancient samples particularly prone to contamination from “modern” DNA sources. The necessity for authentication of results has led many aDNA researchers to adopt methods considered to be “gold standards” in the field, including cloning aDNA amplicons as opposed to directly sequencing them. However, no standardized protocol has emerged regarding the necessary number of clones to sequence, how a consensus sequence is most appropriately derived, or how results should be reported in the literature. In addition, there has been no systematic demonstration of the degree to which direct sequences are affected by damage or whether direct sequencing would provide disparate results from a consensus of clones. To address this issue, a comparative study was designed to examine both cloned and direct sequences amplified from ∼3,500 year-old ancient northern fur seal DNA extracts. Majority rules and the Consensus Confidence Program were used to generate consensus sequences for each individual from the cloned sequences, which exhibited damage at 31 of 139 base pairs across all clones. In no instance did the consensus of clones differ from the direct sequence. This study demonstrates that, when appropriate, cloning need not be the default method, but instead, should be used as a measure of authentication on a case-by-case basis, especially when this practice adds time and cost to studies where it may be superfluous. PMID:21738625

  14. Phylogeny and genetic diversity of Bridgeoporus nobilissimus inferred using mitochondrial and nuclear rDNA sequences

    USGS Publications Warehouse

    Redberg, G.L.; Hibbett, D.S.; Ammirati, J.F.; Rodriguez, R.J.

    2003-01-01

    The genetic diversity and phylogeny of Bridgeoporus nobilissimus have been analyzed. DNA was extracted from spores collected from individual fruiting bodies representing six geographically distinct populations in Oregon and Washington. Spore samples collected contained low levels of bacteria, yeast and a filamentous fungal species. Using taxon-specific PCR primers, it was possible to discriminate among rDNA from bacteria, yeast, a filamentous associate and B. nobilissimus. Nuclear rDNA internal transcribed spacer (ITS) region sequences of B. nobilissimus were compared among individuals representing six populations and were found to have less than 2% variation. These sequences also were used to design dual and nested PCR primers for B. nobilissimus-specific amplification. Mitochondrial small-subunit rDNA sequences were used in a phylogenetic analysis that placed B. nobilissimus in the hymenochaetoid clade, where it was associated with Oxyporus and Schizopora.

  15. Nuclear 28S rDNA phylogeny supports the basal placement of Noctiluca scintillans (Dinophyceae; Noctilucales) in dinoflagellates.

    PubMed

    Ki, Jang-Seu

    2010-05-01

    Noctiluca scintillans (Macartney) Kofoid et Swezy, 1921 is an unarmoured heterotrophic dinoflagellate with a global distribution, and has been considered as one of the ancestral taxa among dinoflagellates. Recently, 18S rDNA, actin, alpha-, beta-tubulin, and Hsp90-based phylogenies have shown the basal position of the noctilucids. However, the relationships of dinoflagellates in the basal lineages are still controversial. Although the nuclear rDNA (e.g. 18S, ITS-5.8S, and 28S) contains much genetic information, DNA sequences of N. scintillans rDNA molecules were insufficiently characterized as yet. Here the author sequenced a long-range nuclear rDNA, spanning from the 18S to the D5 region of the 28S rDNA, of N. scintillans. The present N. scintillans had a nearly identical genotype (>99.0% similarity) compared to other Noctiluca sequences from different geographic origins. Nucleotide divergence in the partial 28S rDNA was significantly high (p<0.05) as compared to the 18S rDNA, demonstrating that the information from 28S rDNA is more variable. The 28S rDNA phylogeny of 17 selected dinoflagellates, two perkinsids, and two apicomplexans as outgroups showed that N. scintillans and Oxyrrhis marina formed a clade that diverged separately from core dinoflagellates. Copyright (c) 2009 Elsevier GmbH. All rights reserved.

  16. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

    PubMed

    Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo

    2014-01-01

    Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.

  17. TFBSshape: a motif database for DNA shape features of transcription factor binding sites

    PubMed Central

    Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo

    2014-01-01

    Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955

  18. Predicting DNA binding proteins using support vector machine with hybrid fractal features.

    PubMed

    Niu, Xiao-Hui; Hu, Xue-Hai; Shi, Feng; Xia, Jing-Bo

    2014-02-21

    DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances. © 2013 The Authors. Published by Elsevier Ltd All rights reserved.

  19. Analytical and Clinical Validation of a Digital Sequencing Panel for Quantitative, Highly Accurate Evaluation of Cell-Free Circulating Tumor DNA

    PubMed Central

    Zill, Oliver A.; Sebisanovic, Dragan; Lopez, Rene; Blau, Sibel; Collisson, Eric A.; Divers, Stephen G.; Hoon, Dave S. B.; Kopetz, E. Scott; Lee, Jeeyun; Nikolinakos, Petros G.; Baca, Arthur M.; Kermani, Bahram G.; Eltoukhy, Helmy; Talasaz, AmirAli

    2015-01-01

    Next-generation sequencing of cell-free circulating solid tumor DNA addresses two challenges in contemporary cancer care. First this method of massively parallel and deep sequencing enables assessment of a comprehensive panel of genomic targets from a single sample, and second, it obviates the need for repeat invasive tissue biopsies. Digital SequencingTM is a novel method for high-quality sequencing of circulating tumor DNA simultaneously across a comprehensive panel of over 50 cancer-related genes with a simple blood test. Here we report the analytic and clinical validation of the gene panel. Analytic sensitivity down to 0.1% mutant allele fraction is demonstrated via serial dilution studies of known samples. Near-perfect analytic specificity (> 99.9999%) enables complete coverage of many genes without the false positives typically seen with traditional sequencing assays at mutant allele frequencies or fractions below 5%. We compared digital sequencing of plasma-derived cell-free DNA to tissue-based sequencing on 165 consecutive matched samples from five outside centers in patients with stage III-IV solid tumor cancers. Clinical sensitivity of plasma-derived NGS was 85.0%, comparable to 80.7% sensitivity for tissue. The assay success rate on 1,000 consecutive samples in clinical practice was 99.8%. Digital sequencing of plasma-derived DNA is indicated in advanced cancer patients to prevent repeated invasive biopsies when the initial biopsy is inadequate, unobtainable for genomic testing, or uninformative, or when the patient’s cancer has progressed despite treatment. Its clinical utility is derived from reduction in the costs, complications and delays associated with invasive tissue biopsies for genomic testing. PMID:26474073

  20. Substrate sequence selectivity of APOBEC3A implicates intra-DNA interactions.

    PubMed

    Silvas, Tania V; Hou, Shurong; Myint, Wazo; Nalivaika, Ellen; Somasundaran, Mohan; Kelch, Brian A; Matsuo, Hiroshi; Kurt Yilmaz, Nese; Schiffer, Celia A

    2018-05-14

    The APOBEC3 (A3) family of human cytidine deaminases is renowned for providing a first line of defense against many exogenous and endogenous retroviruses. However, the ability of these proteins to deaminate deoxycytidines in ssDNA makes A3s a double-edged sword. When overexpressed, A3s can mutate endogenous genomic DNA resulting in a variety of cancers. Although the sequence context for mutating DNA varies among A3s, the mechanism for substrate sequence specificity is not well understood. To characterize substrate specificity of A3A, a systematic approach was used to quantify the affinity for substrate as a function of sequence context, length, secondary structure, and solution pH. We identified the A3A ssDNA binding motif as (T/C)TC(A/G), which correlated with enzymatic activity. We also validated that A3A binds RNA in a sequence specific manner. A3A bound tighter to substrate binding motif within a hairpin loop compared to linear oligonucleotide, suggesting A3A affinity is modulated by substrate structure. Based on these findings and previously published A3A-ssDNA co-crystal structures, we propose a new model with intra-DNA interactions for the molecular mechanism underlying A3A sequence preference. Overall, the sequence and structural preferences identified for A3A leads to a new paradigm for identifying A3A's involvement in mutation of endogenous or exogenous DNA.

  1. Alteration of gene expression in human hepatocellular carcinoma with integrated hepatitis B virus DNA.

    PubMed

    Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei

    2005-08-15

    Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.

  2. mtDNA sequence diversity of Hazara ethnic group from Pakistan.

    PubMed

    Rakha, Allah; Fatima; Peng, Min-Sheng; Adan, Atif; Bi, Rui; Yasmin, Memona; Yao, Yong-Gang

    2017-09-01

    The present study was undertaken to investigate mitochondrial DNA (mtDNA) control region sequences of Hazaras from Pakistan, so as to generate mtDNA reference database for forensic casework in Pakistan and to analyze phylogenetic relationship of this particular ethnic group with geographically proximal populations. Complete mtDNA control region (nt 16024-576) sequences were generated through Sanger Sequencing for 319 Hazara individuals from Quetta, Baluchistan. The population sample set showed a total of 189 distinct haplotypes, belonging mainly to West Eurasian (51.72%), East & Southeast Asian (29.78%) and South Asian (18.50%) haplogroups. Compared with other populations from Pakistan, the Hazara population had a relatively high haplotype diversity (0.9945) and a lower random match probability (0.0085). The dataset has been incorporated into EMPOP database under accession number EMP00680. The data herein comprises the largest, and likely most thoroughly examined, control region mtDNA dataset from Hazaras of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. cgDNA: a software package for the prediction of sequence-dependent coarse-grain free energies of B-form DNA.

    PubMed

    Petkevičiūtė, D; Pasi, M; Gonzalez, O; Maddocks, J H

    2014-11-10

    cgDNA is a package for the prediction of sequence-dependent configuration-space free energies for B-form DNA at the coarse-grain level of rigid bases. For a fragment of any given length and sequence, cgDNA calculates the configuration of the associated free energy minimizer, i.e. the relative positions and orientations of each base, along with a stiffness matrix, which together govern differences in free energies. The model predicts non-local (i.e. beyond base-pair step) sequence dependence of the free energy minimizer. Configurations can be input or output in either the Curves+ definition of the usual helical DNA structural variables, or as a PDB file of coordinates of base atoms. We illustrate the cgDNA package by comparing predictions of free energy minimizers from (a) the cgDNA model, (b) time-averaged atomistic molecular dynamics (or MD) simulations, and (c) NMR or X-ray experimental observation, for (i) the Dickerson-Drew dodecamer and (ii) three oligomers containing A-tracts. The cgDNA predictions are rather close to those of the MD simulations, but many orders of magnitude faster to compute. Both the cgDNA and MD predictions are in reasonable agreement with the available experimental data. Our conclusion is that cgDNA can serve as a highly efficient tool for studying structural variations in B-form DNA over a wide range of sequences. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Comparison of DNA Microarray, Loop-Mediated Isothermal Amplification (LAMP) and Real-Time PCR with DNA Sequencing for Identification of Fusarium spp. Obtained from Patients with Hematologic Malignancies.

    PubMed

    de Souza, Marcela; Matsuzawa, Tetsuhiro; Sakai, Kanae; Muraosa, Yasunori; Lyra, Luzia; Busso-Lopes, Ariane Fidelis; Levin, Anna Sara Shafferman; Schreiber, Angélica Zaninelli; Mikami, Yuzuru; Gonoi, Tohoru; Kamei, Katsuhiko; Moretti, Maria Luiza; Trabasso, Plínio

    2017-08-01

    The performance of three molecular biology techniques, i.e., DNA microarray, loop-mediated isothermal amplification (LAMP), and real-time PCR were compared with DNA sequencing for properly identification of 20 isolates of Fusarium spp. obtained from blood stream as etiologic agent of invasive infections in patients with hematologic malignancies. DNA microarray, LAMP and real-time PCR identified 16 (80%) out of 20 samples as Fusarium solani species complex (FSSC) and four (20%) as Fusarium spp. The agreement among the techniques was 100%. LAMP exhibited 100% specificity, while DNA microarray, LAMP and real-time PCR showed 100% sensitivity. The three techniques had 100% agreement with DNA sequencing. Sixteen isolates were identified as FSSC by sequencing, being five Fusarium keratoplasticum, nine Fusarium petroliphilum and two Fusarium solani. On the other hand, sequencing identified four isolates as Fusarium non-solani species complex (FNSSC), being three isolates as Fusarium napiforme and one isolate as Fusarium oxysporum. Finally, LAMP proved to be faster and more accessible than DNA microarray and real-time PCR, since it does not require a thermocycler. Therefore, LAMP signalizes as emerging and promising methodology to be used in routine identification of Fusarium spp. among cases of invasive fungal infections.

  5. Performing SELEX experiments in silico

    NASA Astrophysics Data System (ADS)

    Wondergem, J. A. J.; Schiessel, H.; Tompitak, M.

    2017-11-01

    Due to the sequence-dependent nature of the elasticity of DNA, many protein-DNA complexes and other systems in which DNA molecules must be deformed have preferences for the type of DNA sequence they interact with. SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiments and similar sequence selection experiments have been used extensively to examine the (indirect readout) sequence preferences of, e.g., nucleosomes (protein spools around which DNA is wound for compactification) and DNA rings. We show how recently developed computational and theoretical tools can be used to emulate such experiments in silico. Opening up this possibility comes with several benefits. First, it allows us a better understanding of our models and systems, specifically about the roles played by the simulation temperature and the selection pressure on the sequences. Second, it allows us to compare the predictions made by the model of choice with experimental results. We find agreement on important features between predictions of the rigid base-pair model and experimental results for DNA rings and interesting differences that point out open questions in the field. Finally, our simulations allow application of the SELEX methodology to systems that are experimentally difficult to realize because they come with high energetic costs and are therefore unlikely to form spontaneously, such as very short or overwound DNA rings.

  6. Median network analysis of defectively sequenced entire mitochondrial genomes from early and contemporary disease studies.

    PubMed

    Bandelt, Hans-Jürgen; Yao, Yong-Gang; Bravi, Claudio M; Salas, Antonio; Kivisild, Toomas

    2009-03-01

    Sequence analysis of the mitochondrial genome has become a routine method in the study of mitochondrial diseases. Quite often, the sequencing efforts in the search of pathogenic or disease-associated mutations are affected by technical and interpretive problems, caused by sample mix-up, contamination, biochemical problems, incomplete sequencing, misdocumentation and insufficient reference to previously published data. To assess data quality in case studies of mitochondrial diseases, it is recommended to compare any mtDNA sequence under consideration to their phylogenetically closest lineages available in the Web. The median network method has proven useful for visualizing potential problems with the data. We contrast some early reports of complete mtDNA sequences to more recent total mtDNA sequencing efforts in studies of various mitochondrial diseases. We conclude that the quality of complete mtDNA sequences generated in the medical field in the past few years is somewhat unsatisfactory and may even fall behind that of pioneer manual sequencing in the early nineties. Our study provides a paradigm for an a posteriori evaluation of sequence quality and for detection of potential problems with inferring a pathogenic status of a particular mutation.

  7. Recognition of platinum-DNA adducts by HMGB1a.

    PubMed

    Ramachandran, Srinivas; Temple, Brenda; Alexandrova, Anastassia N; Chaney, Stephen G; Dokholyan, Nikolay V

    2012-09-25

    Cisplatin (CP) and oxaliplatin (OX), platinum-based drugs used widely in chemotherapy, form adducts on intrastrand guanines (5'GG) in genomic DNA. DNA damage recognition proteins, transcription factors, mismatch repair proteins, and DNA polymerases discriminate between CP- and OX-GG DNA adducts, which could partly account for differences in the efficacy, toxicity, and mutagenicity of CP and OX. In addition, differential recognition of CP- and OX-GG adducts is highly dependent on the sequence context of the Pt-GG adduct. In particular, DNA binding protein domain HMGB1a binds to CP-GG DNA adducts with up to 53-fold greater affinity than to OX-GG adducts in the TGGA sequence context but shows much smaller differences in binding in the AGGC or TGGT sequence contexts. Here, simulations of the HMGB1a-Pt-DNA complex in the three sequence contexts revealed a higher number of interface contacts for the CP-DNA complex in the TGGA sequence context than in the OX-DNA complex. However, the number of interface contacts was similar in the TGGT and AGGC sequence contexts. The higher number of interface contacts in the CP-TGGA sequence context corresponded to a larger roll of the Pt-GG base pair step. Furthermore, geometric analysis of stacking of phenylalanine 37 in HMGB1a (Phe37) with the platinated guanines revealed more favorable stacking modes correlated with a larger roll of the Pt-GG base pair step in the TGGA sequence context. These data are consistent with our previous molecular dynamics simulations showing that the CP-TGGA complex was able to sample larger roll angles than the OX-TGGA complex or either CP- or OX-DNA complexes in the AGGC or TGGT sequences. We infer that the high binding affinity of HMGB1a for CP-TGGA is due to the greater flexibility of CP-TGGA compared to OX-TGGA and other Pt-DNA adducts. This increased flexibility is reflected in the ability of CP-TGGA to sample larger roll angles, which allows for a higher number of interface contacts between the Pt-DNA adduct and HMGB1a.

  8. BIOCHEMICAL AND PHYLOGENETIC CHARACTERIZATION OF TWO NOVEL DEEP-SEA THERMOCOCCUS ISOLATES WITH POTENTIALLY BIOTECHNOLOGICAL APPLICATIONS

    EPA Science Inventory

    The partial 16S rDNA gene sequences of two thermophilic archaeal strains, TY and TYS, previously isolated from the Guaymas Basin hydrothermal vent site were determined. Lipid analyses and a comparative analysis performed with 16S rDNA sequences of similar thermophilic species sho...

  9. Testing the Use of Implicit Solvent in the Molecular Dynamics Modelling of DNA Flexibility

    NASA Astrophysics Data System (ADS)

    Mitchell, J.; Harris, S.

    DNA flexibility controls packaging, looping and in some cases sequence specific protein binding. Molecular dynamics simulations carried out with a computationally efficient implicit solvent model are potentially a powerful tool for studying larger DNA molecules than can be currently simulated when water and counterions are represented explicitly. In this work we compare DNA flexibility at the base pair step level modelled using an implicit solvent model to that previously determined from explicit solvent simulations and database analysis. Although much of the sequence dependent behaviour is preserved in implicit solvent, the DNA is considerably more flexible when the approximate model is used. In addition we test the ability of the implicit solvent to model stress induced DNA disruptions by simulating a series of DNA minicircle topoisomers which vary in size and superhelical density. When compared with previously run explicit solvent simulations, we find that while the levels of DNA denaturation are similar using both computational methodologies, the specific structural form of the disruptions is different.

  10. Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq.

    PubMed

    Rhodes, Johanna; Beale, Mathew A; Fisher, Matthew C

    2014-01-01

    The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina), along with two new kits: the TruSeq Nano DNA kit (Illumina) and the NEBNext Ultra DNA kit (New England Biolabs) to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality) being considered when ultimately deciding on which library prep method to use.

  11. Epigenomics

    MedlinePlus

    ... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...

  12. Cloning

    MedlinePlus

    ... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...

  13. Chromosomes

    MedlinePlus

    ... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...

  14. Transcriptome

    MedlinePlus

    ... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...

  15. Guidelines for whole genome bisulphite sequencing of intact and FFPET DNA on the Illumina HiSeq X Ten.

    PubMed

    Nair, Shalima S; Luu, Phuc-Loi; Qu, Wenjia; Maddugoda, Madhavi; Huschtscha, Lily; Reddel, Roger; Chenevix-Trench, Georgia; Toso, Martina; Kench, James G; Horvath, Lisa G; Hayes, Vanessa M; Stricker, Phillip D; Hughes, Timothy P; White, Deborah L; Rasko, John E J; Wong, Justin J-L; Clark, Susan J

    2018-05-28

    Comprehensive genome-wide DNA methylation profiling is critical to gain insights into epigenetic reprogramming during development and disease processes. Among the different genome-wide DNA methylation technologies, whole genome bisulphite sequencing (WGBS) is considered the gold standard for assaying genome-wide DNA methylation at single base resolution. However, the high sequencing cost to achieve the optimal depth of coverage limits its application in both basic and clinical research. To achieve 15× coverage of the human methylome, using WGBS, requires approximately three lanes of 100-bp-paired-end Illumina HiSeq 2500 sequencing. It is important, therefore, for advances in sequencing technologies to be developed to enable cost-effective high-coverage sequencing. In this study, we provide an optimised WGBS methodology, from library preparation to sequencing and data processing, to enable 16-20× genome-wide coverage per single lane of HiSeq X Ten, HCS 3.3.76. To process and analyse the data, we developed a WGBS pipeline (METH10X) that is fast and can call SNPs. We performed WGBS on both high-quality intact DNA and degraded DNA from formalin-fixed paraffin-embedded tissue. First, we compared different library preparation methods on the HiSeq 2500 platform to identify the best method for sequencing on the HiSeq X Ten. Second, we optimised the PhiX and genome spike-ins to achieve higher quality and coverage of WGBS data on the HiSeq X Ten. Third, we performed integrated whole genome sequencing (WGS) and WGBS of the same DNA sample in a single lane of HiSeq X Ten to improve data output. Finally, we compared methylation data from the HiSeq 2500 and HiSeq X Ten and found high concordance (Pearson r > 0.9×). Together we provide a systematic, efficient and complete approach to perform and analyse WGBS on the HiSeq X Ten. Our protocol allows for large-scale WGBS studies at reasonable processing time and cost on the HiSeq X Ten platform.

  16. Visual ModuleOrganizer: a graphical interface for the detection and comparative analysis of repeat DNA modules

    PubMed Central

    2014-01-01

    Background DNA repeats, such as transposable elements, minisatellites and palindromic sequences, are abundant in sequences and have been shown to have significant and functional roles in the evolution of the host genomes. In a previous study, we introduced the concept of a repeat DNA module, a flexible motif present in at least two occurences in the sequences. This concept was embedded into ModuleOrganizer, a tool allowing the detection of repeat modules in a set of sequences. However, its implementation remains difficult for larger sequences. Results Here we present Visual ModuleOrganizer, a Java graphical interface that enables a new and optimized version of the ModuleOrganizer tool. To implement this version, it was recoded in C++ with compressed suffix tree data structures. This leads to less memory usage (at least 120-fold decrease in average) and decreases by at least four the computation time during the module detection process in large sequences. Visual ModuleOrganizer interface allows users to easily choose ModuleOrganizer parameters and to graphically display the results. Moreover, Visual ModuleOrganizer dynamically handles graphical results through four main parameters: gene annotations, overlapping modules with known annotations, location of the module in a minimal number of sequences, and the minimal length of the modules. As a case study, the analysis of FoldBack4 sequences clearly demonstrated that our tools can be extended to comparative and evolutionary analyses of any repeat sequence elements in a set of genomic sequences. With the increasing number of sequences available in public databases, it is now possible to perform comparative analyses of repeated DNA modules in a graphic and friendly manner within a reasonable time period. Availability Visual ModuleOrganizer interface and the new version of the ModuleOrganizer tool are freely available at: http://lcb.cnrs-mrs.fr/spip.php?rubrique313. PMID:24678954

  17. ANN modeling of DNA sequences: new strategies using DNA shape code.

    PubMed

    Parbhane, R V; Tambe, S S; Kulkarni, B D

    2000-09-01

    Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.

  18. [The use of 16S rDNA sequencing in species diversity analysis for sputum of patients with ventilator-associated pneumonia].

    PubMed

    Yang, Xiaojun; Wang, Xiaohong; Liang, Zhijuan; Zhang, Xiaoya; Wang, Yanbo; Wang, Zhenhai

    2014-05-01

    To study the species and amount of bacteria in sputum of patients with ventilator-associated pneumonia (VAP) by using 16S rDNA sequencing analysis, and to explore the new method for etiologic diagnosis of VAP. Bronchoalveolar lavage sputum samples were collected from 31 patients with VAP. Bacterial DNA of the samples were extracted and identified by polymerase chain reaction (PCR). At the same time, sputum specimens were processed for routine bacterial culture. The high flux sequencing experiment was conducted on PCR positive samples with 16S rDNA macro genome sequencing technology, and sequencing results were analyzed using bioinformatics, then the results between the sequencing and bacteria culture were compared. (1) 550 bp of specific DNA sequences were amplified in sputum specimens from 27 cases of the 31 patients with VAP, and they were used for sequencing analysis. 103 856 sequences were obtained from those sputum specimens using 16S rDNA sequencing, yielding approximately 39 Mb of raw data. Tag sequencing was able to inform genus level in all 27 samples. (2) Alpha-diversity analysis showed that sputum samples of patients with VAP had significantly higher variability and richness in bacterial species (Shannon index values 1.20, Simpson index values 0.48). Rarefaction curve analysis showed that there were more species that were not detected by sequencing from some VAP sputum samples. (3) Analysis of 27 sputum samples with VAP by using 16S rDNA sequences yielded four phyla: namely Acitinobacteria, Bacteroidetes, Firmicutes, Proteobacteria. With genus as a classification, it was found that the dominant species included Streptococcus 88.9% (24/27), Limnohabitans 77.8% (21/27), Acinetobacter 70.4% (19/27), Sphingomonas 63.0% (17/27), Prevotella 63.0% (17/27), Klebsiella 55.6% (15/27), Pseudomonas 55.6% (15/27), Aquabacterium 55.6% (15/27), and Corynebacterium 55.6% (15/27). (4) Pyrophosphate sequencing discovered that Prevotella, Limnohabitans, Aquabacterium, Sphingomonas might not be detected by routine bacteria culture. Among seven species which were identified by both methods, pyrophosphate sequencing yielded higher positive rate than that of ordinary bacteria culture [Streptococcus: 88.9% (24/27) vs. 18.5% (5/27), Klebsiella: 55.6% (15/27) vs. 18.5% (5/27), Acinetobacter: 70.4% (19/27) vs. 37.0% (10/27), Corynebacterium: 55.6% (15/27) vs. 7.4% (2/27), P<0.05 or P<0.01]. Sequencing positive rate was found to increase positive rate for culture of Pseudomonas [55.6% (15/27) vs. 25.9% (7/27), P=0.050]. No significant differences were observed between sequencing and ordinary bacteria culture for detection Staphylococcus [7.4% (2/27) vs. 11.1% (3/27)] and Neisseria bacteria genera [18.5% (5/27) vs. 3.7% (1/27), both P>0.05]. 16S rDNA sequencing analysis confirmed that pathogenic bacteria in sputum of VAP were complicated with multiple drug resistant strains. Compared with routine bacterial culture, pyrophosphate sequencing had higher positive rate in detecting pathogens. 16S rDNA gene sequencing technology may become a new method for etiological diagnosis of VAP.

  19. Identification of Genomic Insertion and Flanking Sequence of G2-EPSPS and GAT Transgenes in Soybean Using Whole Genome Sequencing Method.

    PubMed

    Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan

    2016-01-01

    Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.

  20. Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly

    PubMed Central

    Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka

    2010-01-01

    Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877

  1. Genetic Mapping

    MedlinePlus

    ... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...

  2. Biological Pathways

    MedlinePlus

    ... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...

  3. gyrB as a phylogenetic discriminator for members of the Bacillus anthracis-cereus-thuringiensis group

    NASA Technical Reports Server (NTRS)

    La Duc, Myron T.; Satomi, Masataka; Agata, Norio; Venkateswaran, Kasthuri

    2004-01-01

    Bacillus anthracis, the causative agent of the human disease anthrax, Bacillus cereus, a food-borne pathogen capable of causing human illness, and Bacillus thuringiensis, a well-characterized insecticidal toxin producer, all cluster together within a very tight clade (B. cereus group) phylogenetically and are indistinguishable from one another via 16S rDNA sequence analysis. As new pathogens are continually emerging, it is imperative to devise a system capable of rapidly and accurately differentiating closely related, yet phenotypically distinct species. Although the gyrB gene has proven useful in discriminating closely related species, its sequence analysis has not yet been validated by DNA:DNA hybridization, the taxonomically accepted "gold standard". We phylogenetically characterized the gyrB sequences of various species and serotypes encompassed in the "B. cereus group," including lab strains and environmental isolates. Results were compared to those obtained from analyses of phenotypic characteristics, 16S rDNA sequence, DNA:DNA hybridization, and virulence factors. The gyrB gene proved more highly differential than 16S, while, at the same time, as analytical as costly and laborious DNA:DNA hybridization techniques in differentiating species within the B. cereus group.

  4. Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview.

    PubMed

    Karvelis, Tautvydas; Gasiunas, Giedrius; Siksnys, Virginijus

    2017-05-15

    Recently the Cas9, an RNA guided DNA endonuclease, emerged as a powerful tool for targeted genome manipulations. Cas9 protein can be reprogrammed to cleave, bind or nick any DNA target by simply changing crRNA sequence, however a short nucleotide sequence, termed PAM, is required to initiate crRNA hybridization to the DNA target. PAM sequence is recognized by Cas9 protein and must be determined experimentally for each Cas9 variant. Exploration of Cas9 orthologs could offer a diversity of PAM sequences and novel biochemical properties that may be beneficial for genome editing applications. Here we briefly review and compare Cas9 PAM identification assays that can be adopted for other PAM-dependent CRISPR-Cas systems. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Diagnostics of Neisseriaceae and Moraxellaceae by Ribosomal DNA Sequencing: Ribosomal Differentiation of Medical Microorganisms

    PubMed Central

    Harmsen, Dag; Singer, Christian; Rothgänger, Jörg; Tønjum, Tone; Sybren de Hoog, Gerrit; Shah, Haroun; Albert, Jürgen; Frosch, Matthias

    2001-01-01

    Fast and reliable identification of microbial isolates is a fundamental goal of clinical microbiology. However, in the case of some fastidious gram-negative bacterial species, classical phenotype identification based on either metabolic, enzymatic, or serological methods is difficult, time-consuming, and/or inadequate. 16S or 23S ribosomal DNA (rDNA) bacterial sequencing will most often result in accurate speciation of isolates. Therefore, the objective of this study was to find a hypervariable rDNA stretch, flanked by strongly conserved regions, which is suitable for molecular species identification of members of the Neisseriaceae and Moraxellaceae. The inter- and intrageneric relationships were investigated using comparative sequence analysis of PCR-amplified partial 16S and 23S rDNAs from a total of 94 strains. When compared to the type species of the genera Acinetobacter, Moraxella, and Neisseria, an average of 30 polymorphic positions was observed within the partial 16S rDNA investigated (corresponding to Escherichia coli positions 54 to 510) for each species and an average of 11 polymorphic positions was observed within the 202 nucleotides of the 23S rDNA gene (positions 1400 to 1600). Neisseria macacae and Neisseria mucosa subsp. mucosa (ATCC 19696) had identical 16S and 23S rDNA sequences. Species clusters were heterogeneous in both genes in the case of Acinetobacter lwoffii, Moraxella lacunata, and N. mucosa. Neisseria meningitidis isolates failed to cluster only in the 23S rDNA subset. Our data showed that the 16S rDNA region is more suitable than the partial 23S rDNA for the molecular diagnosis of Neisseriaceae and Moraxellaceae and that a reference database should include more than one strain of each species. All sequence chromatograms and taxonomic and disease-related information are available as part of our ribosomal differentiation of medical microorganisms (RIDOM) web-based service (http://www.ridom.hygiene.uni-wuerzburg.de/). Users can submit a sequence and conduct a similarity search against the RIDOM reference database for microbial identification purposes. PMID:11230407

  6. Re-sequencing transgenic plants revealed rearrangements at T-DNA inserts, and integration of a short T-DNA fragment, but no increase of small mutations elsewhere.

    PubMed

    Schouten, Henk J; Vande Geest, Henri; Papadimitriou, Sofia; Bemer, Marian; Schaart, Jan G; Smulders, Marinus J M; Perez, Gabino Sanchez; Schijlen, Elio

    2017-03-01

    Transformation resulted in deletions and translocations at T-DNA inserts, but not in genome-wide small mutations. A tiny T-DNA splinter was detected that probably would remain undetected by conventional techniques. We investigated to which extent Agrobacterium tumefaciens-mediated transformation is mutagenic, on top of inserting T-DNA. To prevent mutations due to in vitro propagation, we applied floral dip transformation of Arabidopsis thaliana. We re-sequenced the genomes of five primary transformants, and compared these to genomic sequences derived from a pool of four wild-type plants. By genome-wide comparisons, we identified ten small mutations in the genomes of the five transgenic plants, not correlated to the positions or number of T-DNA inserts. This mutation frequency is within the range of spontaneous mutations occurring during seed propagation in A. thaliana, as determined earlier. In addition, we detected small as well as large deletions specifically at the T-DNA insert sites. Furthermore, we detected partial T-DNA inserts, one of these a tiny 50-bp fragment originating from a central part of the T-DNA construct used, inserted into the plant genome without flanking other T-DNA. Because of its small size, we named this fragment a T-DNA splinter. As far as we know this is the first report of such a small T-DNA fragment insert in absence of any T-DNA border sequence. Finally, we found evidence for translocations from other chromosomes, flanking T-DNA inserts. In this study, we showed that next-generation sequencing (NGS) is a highly sensitive approach to detect T-DNA inserts in transgenic plants.

  7. Comparative molecular cytogenetic analyses of a major tandemly repeated DNA family and retrotransposon sequences in cultivated jute Corchorus species (Malvaceae)

    PubMed Central

    Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas

    2013-01-01

    Background and Aims The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. Methods A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100–500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Key Results Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S–5·8S–25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. Conclusions The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species. PMID:23666888

  8. Inhibition of hepatitis B virus replication with linear DNA sequences expressing antiviral micro-RNA shuttles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chattopadhyay, Saket; Ely, Abdullah; Bloom, Kristie

    2009-11-20

    RNA interference (RNAi) may be harnessed to inhibit viral gene expression and this approach is being developed to counter chronic infection with hepatitis B virus (HBV). Compared to synthetic RNAi activators, DNA expression cassettes that generate silencing sequences have advantages of sustained efficacy and ease of propagation in plasmid DNA (pDNA). However, the large size of pDNAs and inclusion of sequences conferring antibiotic resistance and immunostimulation limit delivery efficiency and safety. To develop use of alternative DNA templates that may be applied for therapeutic gene silencing, we assessed the usefulness of PCR-generated linear expression cassettes that produce anti-HBV micro-RNA (miR)more » shuttles. We found that silencing of HBV markers of replication was efficient (>75%) in cell culture and in vivo. miR shuttles were processed to form anti-HBV guide strands and there was no evidence of induction of the interferon response. Modification of terminal sequences to include flanking human adenoviral type-5 inverted terminal repeats was easily achieved and did not compromise silencing efficacy. These linear DNA sequences should have utility in the development of gene silencing applications where modifications of terminal elements with elimination of potentially harmful and non-essential sequences are required.« less

  9. Dielectrophoretic isolation and detection of cancer-related circulating cell-free DNA biomarkers from blood and plasma

    PubMed Central

    Sonnenberg, Avery; Marciniak, Jennifer Y.; Skowronski, Elaine A.; Manouchehri, Sareh; Rassenti, Laura; Ghia, Emanuela M.; Widhopf, George F.; Kipps, Thomas J.; Heller, Michael J.

    2014-01-01

    Conventional methods for the isolation of cancer-related circulating cell-free (ccf) DNA from patient blood (plasma) are time consuming and laborious. A DEP approach utilizing a microarray device now allows rapid isolation of ccf-DNA directly from a small volume of unprocessed blood. In this study, the DEP device is used to compare the ccf-DNA isolated directly from whole blood and plasma from 11 chronic lymphocytic leukemia (CLL) patients and one normal individual. Ccf-DNA from both blood and plasma samples was separated into DEP high-field regions, after which cells (blood), proteins, and other biomolecules were removed by a fluidic wash. The concentrated ccf-DNA was detected on-chip by fluorescence, and then eluted for PCR and DNA sequencing. The complete process from blood to PCR required less than 10 min; an additional 15 min was required to obtain plasma from whole blood. Ccf-DNA from the equivalent of 5 µL of CLL blood and 5 µL of plasma was amplified by PCR using Ig heavy-chain variable (IGHV) specific primers to identify the unique IGHV gene expressed by the leukemic B-cell clone. The PCR and DNA sequencing results obtained by DEP from all 11 CLL blood samples and from 8 of the 11 CLL plasma samples were exactly comparable to the DNA sequencing results obtained from genomic DNA isolated from CLL patient leukemic B cells (gold standard). PMID:24723219

  10. Dielectrophoretic isolation and detection of cancer-related circulating cell-free DNA biomarkers from blood and plasma.

    PubMed

    Sonnenberg, Avery; Marciniak, Jennifer Y; Skowronski, Elaine A; Manouchehri, Sareh; Rassenti, Laura; Ghia, Emanuela M; Widhopf, George F; Kipps, Thomas J; Heller, Michael J

    2014-07-01

    Conventional methods for the isolation of cancer-related circulating cell-free (ccf) DNA from patient blood (plasma) are time consuming and laborious. A DEP approach utilizing a microarray device now allows rapid isolation of ccf-DNA directly from a small volume of unprocessed blood. In this study, the DEP device is used to compare the ccf-DNA isolated directly from whole blood and plasma from 11 chronic lymphocytic leukemia (CLL) patients and one normal individual. Ccf-DNA from both blood and plasma samples was separated into DEP high-field regions, after which cells (blood), proteins, and other biomolecules were removed by a fluidic wash. The concentrated ccf-DNA was detected on-chip by fluorescence, and then eluted for PCR and DNA sequencing. The complete process from blood to PCR required less than 10 min; an additional 15 min was required to obtain plasma from whole blood. Ccf-DNA from the equivalent of 5 μL of CLL blood and 5 μL of plasma was amplified by PCR using Ig heavy-chain variable (IGHV) specific primers to identify the unique IGHV gene expressed by the leukemic B-cell clone. The PCR and DNA sequencing results obtained by DEP from all 11 CLL blood samples and from 8 of the 11 CLL plasma samples were exactly comparable to the DNA sequencing results obtained from genomic DNA isolated from CLL patient leukemic B cells (gold standard). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Detection of sequence variation in parasite ribosomal DNA by electrophoresis in agarose gels supplemented with a DNA-intercalating agent.

    PubMed

    Zhu, X Q; Chilton, N B; Gasser, R B

    1998-05-01

    This study evaluated the use of a commercially available DNA intercalating agent (Resolver Gold) in agarose gels for the direct detection of sequence variation in ribosomal DNA (rDNA). This agent binds preferentially to AT sequence motifs in DNA. Regions of nuclear rDNA, known to provide genetic markers for the identification of species of parasitic ascarid nematodes (order Ascaridida), were amplified by polymerase chain reaction (PCR) and subjected to electrophoresis in standard agarose gels versus gels supplemented with Resolver Gold. Individual taxa examined could not be distinguished reliably based on the size of their amplicons in standard agarose gels, whereas they could be readily delineated based on mobility using Resolver Gold-supplemented gels. The latter was achieved because of differences (approximately 0.1-8.2%) in the AT content of the fragments among different taxa, which were associated with significant interspecific differences (approximately 11-39%) in the rDNA sequences employed. There was a tendency for fragments with higher AT content to migrate slower in supplemented agarose gels compared with those of lower AT content. The results indicate the usefulness of this electrophoretic approach to rapidly screen for sequence variability within or among PCR-amplified rDNA fragments of similar sizes but differing AT contents. Although evaluated on rDNA of parasites, the approach has potential to be applied to a range of genes of different groups of infectious organisms.

  12. Mitochondrial DNA control region sequences from Nairobi (Kenya): inferring phylogenetic parameters for the establishment of a forensic database.

    PubMed

    Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J

    2004-10-01

    Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.

  13. Partial characterization of normal and Haemophilus influenzae-infected mucosal complementary DNA libraries in chinchilla middle ear mucosa.

    PubMed

    Kerschner, Joseph E; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J Christopher; Ehrlich, Garth D

    2010-04-01

    We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription-polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis.

  14. Partial Characterization of Normal and Haemophilus influenzae–Infected Mucosal Complementary DNA Libraries in Chinchilla Middle Ear Mucosa

    PubMed Central

    Kerschner, Joseph E.; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J. Christopher; Ehrlich, Garth D.

    2010-01-01

    Objectives We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Methods Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription–polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Results Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Conclusions Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis. PMID:20433028

  15. A comparative study of ChIP-seq sequencing library preparation methods.

    PubMed

    Sundaram, Arvind Y M; Hughes, Timothy; Biondi, Shea; Bolduc, Nathalie; Bowman, Sarah K; Camilli, Andrew; Chew, Yap C; Couture, Catherine; Farmer, Andrew; Jerome, John P; Lazinski, David W; McUsic, Andrew; Peng, Xu; Shazand, Kamran; Xu, Feng; Lyle, Robert; Gilfillan, Gregor D

    2016-10-21

    ChIP-seq is the primary technique used to investigate genome-wide protein-DNA interactions. As part of this procedure, immunoprecipitated DNA must undergo "library preparation" to enable subsequent high-throughput sequencing. To facilitate the analysis of biopsy samples and rare cell populations, there has been a recent proliferation of methods allowing sequencing library preparation from low-input DNA amounts. However, little information exists on the relative merits, performance, comparability and biases inherent to these procedures. Notably, recently developed single-cell ChIP procedures employing microfluidics must also employ library preparation reagents to allow downstream sequencing. In this study, seven methods designed for low-input DNA/ChIP-seq sample preparation (Accel-NGS® 2S, Bowman-method, HTML-PCR, SeqPlex™, DNA SMART™, TELP and ThruPLEX®) were performed on five replicates of 1 ng and 0.1 ng input H3K4me3 ChIP material, and compared to a "gold standard" reference PCR-free dataset. The performance of each method was examined for the prevalence of unmappable reads, amplification-derived duplicate reads, reproducibility, and for the sensitivity and specificity of peak calling. We identified consistent high performance in a subset of the tested reagents, which should aid researchers in choosing the most appropriate reagents for their studies. Furthermore, we expect this work to drive future advances by identifying and encouraging use of the most promising methods and reagents. The results may also aid judgements on how comparable are existing datasets that have been prepared with different sample library preparation reagents.

  16. Biorecognition by DNA oligonucleotides after Exposure to Photoresists and Resist Removers

    PubMed Central

    Dean, Stacey L.; Morrow, Thomas J.; Patrick, Sue; Li, Mingwei; Clawson, Gary; Mayer, Theresa S.; Keating, Christine D.

    2013-01-01

    Combining biological molecules with integrated circuit technology is of considerable interest for next generation sensors and biomedical devices. Current lithographic microfabrication methods, however, were developed for compatibility with silicon technology rather than bioorganic molecules and consequently it cannot be assumed that biomolecules will remain attached and intact during on-chip processing. Here, we evaluate the effects of three common photoresists (Microposit S1800 series, PMGI SF6, and Megaposit SPR 3012) and two photoresist removers (acetone and 1165 remover) on the ability of surface-immobilized DNA oligonucleotides to selectively recognize their reverse-complementary sequence. Two common DNA immobilization methods were compared: adsorption of 5′-thiolated sequences directly to gold nanowires and covalent attachment of 5′-thiolated sequences to surface amines on silica coated nanowires. We found that acetone had deleterious effects on selective hybridization as compared to 1165 remover, presumably due to incomplete resist removal. Use of the PMGI photoresist, which involves a high temperature bake step, was detrimental to the later performance of nanowire-bound DNA in hybridization assays, especially for DNA attached via thiol adsorption. The other three photoresists did not substantially degrade DNA binding capacity or selectivity for complementary DNA sequences. To determine if the lithographic steps caused more subtle damage, we also tested oligonucleotides containing a single base mismatch. Finally, a two-step photolithographic process was developed and used in combination with dielectrophoretic nanowire assembly to produce an array of doubly-contacted, electrically isolated individual nanowire components on a chip. Post-fabrication fluorescence imaging indicated that nanowire-bound DNA was present and able to selectively bind complementary strands. PMID:23952639

  17. Bacterial community composition in different sediments from the Eastern Mediterranean Sea: a comparison of four 16S ribosomal DNA clone libraries.

    PubMed

    Polymenakou, Paraskevi N; Bertilsson, Stefan; Tselepides, Anastasios; Stephanou, Euripides G

    2005-10-01

    The regional variability of sediment bacterial community composition and diversity was studied by comparative analysis of four large 16S ribosomal DNA (rDNA) clone libraries from sediments in different regions of the Eastern Mediterranean Sea (Thermaikos Gulf, Cretan Sea, and South lonian Sea). Amplified rDNA restriction analysis of 664 clones from the libraries indicate that the rDNA richness and evenness was high: for example, a near-1:1 relationship among screened clones and number of unique restriction patterns when up to 190 clones were screened for each library. Phylogenetic analysis of 207 bacterial 16S rDNA sequences from the sediment libraries demonstrated that Gamma-, Delta-, and Alphaproteobacteria, Holophaga/Acidobacteria, Planctomycetales, Actinobacteria, Bacteroidetes, and Verrucomicrobia were represented in all four libraries. A few clones also grouped with the Betaproteobacteria, Nitrospirae, Spirochaetales, Chlamydiae, Firmicutes, and candidate division OPl 1. The abundance of sequences affiliated with Gammaproteobacteria was higher in libraries from shallow sediments in the Thermaikos Gulf (30 m) and the Cretan Sea (100 m) compared to the deeper South Ionian station (2790 m). Most sequences in the four sediment libraries clustered with uncultured 16S rDNA phylotypes from marine habitats, and many of the closest matches were clones from hydrocarbon seeps, benzene-mineralizing consortia, sulfate reducers, sulk oxidizers, and ammonia oxidizers. LIBSHUFF statistics of 16S rDNA gene sequences from the four libraries revealed major differences, indicating either a very high richness in the sediment bacterial communities or considerable variability in bacterial community composition among regions, or both.

  18. Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics

    PubMed Central

    2012-01-01

    Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225

  19. Ligation-mediated PCR with a back-to-back adapter reduces amplification bias resulting from variations in GC content.

    PubMed

    Ishihara, Satoru; Kotomura, Naoe; Yamamoto, Naoki; Ochiai, Hiroshi

    2017-08-15

    Ligation-mediated polymerase chain reaction (LM-PCR) is a common technique for amplification of a pool of DNA fragments. Here, a double-stranded oligonucleotide consisting of two primer sequences in back-to-back orientation was designed as an adapter for LM-PCR. When DNA fragments were ligated with this adapter, the fragments were sandwiched between two adapters in random orientations. In the ensuing PCR, ligation products linked at each end to an opposite side of the adapter, i.e. to a distinct primer sequence, were preferentially amplified compared with products linked at each end to an identical primer sequence. The use of this adapter in LM-PCR reduced the impairment of PCR by substrate DNA with a high GC content, compared with the use of traditional LM-PCR adapters. This result suggested that our method has the potential to contribute to reduction of the amplification bias that is caused by an intrinsic property of the sequence context in substrate DNA. A DNA preparation obtained from a chromatin immunoprecipitation assay using pulldown of a specific form of histone H3 was successfully amplified using the modified LM-PCR, and the amplified products could be used as probes in a fluorescence in situ hybridization analysis. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Development and Validation of an Ultradeep Next-Generation Sequencing Assay for Testing of Plasma Cell-Free DNA from Patients with Advanced Cancer.

    PubMed

    Janku, Filip; Zhang, Shile; Waters, Jill; Liu, Li; Huang, Helen J; Subbiah, Vivek; Hong, David S; Karp, Daniel D; Fu, Siqing; Cai, Xuyu; Ramzanali, Nishma M; Madwani, Kiran; Cabrilo, Goran; Andrews, Debra L; Zhao, Yue; Javle, Milind; Kopetz, E Scott; Luthra, Rajyalakshmi; Kim, Hyunsung J; Gnerre, Sante; Satya, Ravi Vijaya; Chuang, Han-Yu; Kruglyak, Kristina M; Toung, Jonathan; Zhao, Chen; Shen, Richard; Heymach, John V; Meric-Bernstam, Funda; Mills, Gordon B; Fan, Jian-Bing; Salathia, Neeraj S

    2017-09-15

    Purpose: Tumor-derived cell-free DNA (cfDNA) in plasma can be used for molecular testing and provide an attractive alternative to tumor tissue. Commonly used PCR-based technologies can test for limited number of alterations at the time. Therefore, novel ultrasensitive technologies capable of testing for a broad spectrum of molecular alterations are needed to further personalized cancer therapy. Experimental Design: We developed a highly sensitive ultradeep next-generation sequencing (NGS) assay using reagents from TruSeqNano library preparation and NexteraRapid Capture target enrichment kits to generate plasma cfDNA sequencing libraries for mutational analysis in 61 cancer-related genes using common bioinformatics tools. The results were retrospectively compared with molecular testing of archival primary or metastatic tumor tissue obtained at different points of clinical care. Results: In a study of 55 patients with advanced cancer, the ultradeep NGS assay detected 82% (complete detection) to 87% (complete and partial detection) of the aberrations identified in discordantly collected corresponding archival tumor tissue. Patients with a low variant allele frequency (VAF) of mutant cfDNA survived longer than those with a high VAF did ( P = 0.018). In patients undergoing systemic therapy, radiological response was positively associated with changes in cfDNA VAF ( P = 0.02), and compared with unchanged/increased mutant cfDNA VAF, decreased cfDNA VAF was associated with longer time to treatment failure (TTF; P = 0.03). Conclusions: Ultradeep NGS assay has good sensitivity compared with conventional clinical mutation testing of archival specimens. A high VAF in mutant cfDNA corresponded with shorter survival. Changes in VAF of mutated cfDNA were associated with TTF. Clin Cancer Res; 23(18); 5648-56. ©2017 AACR . ©2017 American Association for Cancer Research.

  1. Phylogenetic analysis of Sicilian goats reveals a new mtDNA lineage.

    PubMed

    Sardina, M T; Ballester, M; Marmi, J; Finocchiaro, R; van Kaam, J B C H M; Portolano, B; Folch, J M

    2006-08-01

    The mitochondrial hypervariable region 1 (HVR1) sequence of 67 goats belonging to the Girgentana, Maltese and Derivata di Siria breeds was partially sequenced in order to present the first phylogenetic characterization of Sicilian goat breeds. These sequences were compared with published sequences of Indian and Pakistani domestic goats and wild goats. Mitochondrial lineage A was observed in most of the Sicilian goats. However, three Girgentana haplotypes were highly divergent from the Capra hircus clade, indicating that a new mtDNA lineage in domestic goats was found.

  2. DNA sequence-based comparative studies between non-extremophile and extremophile organisms with implications in exobiology

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Marchese, P.; Tremberger, G., Jr.; Cheung, E.; Subramaniam, R.; Sullivan, R.; Schneider, P.; Flamholz, A.; Lieberman, D.; Cheung, T.

    2008-08-01

    We have characterized function related DNA sequences of various organisms using informatics techniques, including fractal dimension calculation, nucleotide and multi-nucleotide statistics, and sequence fluctuation analysis. Our analysis shows trends which differentiate extremophile from non-extremophile organisms, which could be reproduced in extraterrestrial life. Among the systems studied are radiation repair genes, genes involved in thermal shocks, and genes involved in drug resistance. We also evaluate sequence level changes that have occurred during short term evolution (several thousand generations) under extreme conditions.

  3. A DNA barcode for land plants.

    PubMed

    2009-08-04

    DNA barcoding involves sequencing a standard region of DNA as a tool for species identification. However, there has been no agreement on which region(s) should be used for barcoding land plants. To provide a community recommendation on a standard plant barcode, we have compared the performance of 7 leading candidate plastid DNA regions (atpF-atpH spacer, matK gene, rbcL gene, rpoB gene, rpoC1 gene, psbK-psbI spacer, and trnH-psbA spacer). Based on assessments of recoverability, sequence quality, and levels of species discrimination, we recommend the 2-locus combination of rbcL+matK as the plant barcode. This core 2-locus barcode will provide a universal framework for the routine use of DNA sequence data to identify specimens and contribute toward the discovery of overlooked species of land plants.

  4. A DNA barcode for land plants

    PubMed Central

    Hollingsworth, Peter M.; Forrest, Laura L.; Spouge, John L.; Hajibabaei, Mehrdad; Ratnasingham, Sujeevan; van der Bank, Michelle; Chase, Mark W.; Cowan, Robyn S.; Erickson, David L.; Fazekas, Aron J.; Graham, Sean W.; James, Karen E.; Kim, Ki-Joong; Kress, W. John; Schneider, Harald; van AlphenStahl, Jonathan; Barrett, Spencer C.H.; van den Berg, Cassio; Bogarin, Diego; Burgess, Kevin S.; Cameron, Kenneth M.; Carine, Mark; Chacón, Juliana; Clark, Alexandra; Clarkson, James J.; Conrad, Ferozah; Devey, Dion S.; Ford, Caroline S.; Hedderson, Terry A.J.; Hollingsworth, Michelle L.; Husband, Brian C.; Kelly, Laura J.; Kesanakurti, Prasad R.; Kim, Jung Sung; Kim, Young-Dong; Lahaye, Renaud; Lee, Hae-Lim; Long, David G.; Madriñán, Santiago; Maurin, Olivier; Meusnier, Isabelle; Newmaster, Steven G.; Park, Chong-Wook; Percy, Diana M.; Petersen, Gitte; Richardson, James E.; Salazar, Gerardo A.; Savolainen, Vincent; Seberg, Ole; Wilkinson, Michael J.; Yi, Dong-Keun; Little, Damon P.

    2009-01-01

    DNA barcoding involves sequencing a standard region of DNA as a tool for species identification. However, there has been no agreement on which region(s) should be used for barcoding land plants. To provide a community recommendation on a standard plant barcode, we have compared the performance of 7 leading candidate plastid DNA regions (atpF–atpH spacer, matK gene, rbcL gene, rpoB gene, rpoC1 gene, psbK–psbI spacer, and trnH–psbA spacer). Based on assessments of recoverability, sequence quality, and levels of species discrimination, we recommend the 2-locus combination of rbcL+matK as the plant barcode. This core 2-locus barcode will provide a universal framework for the routine use of DNA sequence data to identify specimens and contribute toward the discovery of overlooked species of land plants. PMID:19666622

  5. Environmental Barcoding: A Next-Generation Sequencing Approach for Biomonitoring Applications Using River Benthos

    PubMed Central

    Hajibabaei, Mehrdad; Shokralla, Shadi; Zhou, Xin; Singer, Gregory A. C.; Baird, Donald J.

    2011-01-01

    Timely and accurate biodiversity analysis poses an ongoing challenge for the success of biomonitoring programs. Morphology-based identification of bioindicator taxa is time consuming, and rarely supports species-level resolution especially for immature life stages. Much work has been done in the past decade to develop alternative approaches for biodiversity analysis using DNA sequence-based approaches such as molecular phylogenetics and DNA barcoding. On-going assembly of DNA barcode reference libraries will provide the basis for a DNA-based identification system. The use of recently introduced next-generation sequencing (NGS) approaches in biodiversity science has the potential to further extend the application of DNA information for routine biomonitoring applications to an unprecedented scale. Here we demonstrate the feasibility of using 454 massively parallel pyrosequencing for species-level analysis of freshwater benthic macroinvertebrate taxa commonly used for biomonitoring. We designed our experiments in order to directly compare morphology-based, Sanger sequencing DNA barcoding, and next-generation environmental barcoding approaches. Our results show the ability of 454 pyrosequencing of mini-barcodes to accurately identify all species with more than 1% abundance in the pooled mixture. Although the approach failed to identify 6 rare species in the mixture, the presence of sequences from 9 species that were not represented by individuals in the mixture provides evidence that DNA based analysis may yet provide a valuable approach in finding rare species in bulk environmental samples. We further demonstrate the application of the environmental barcoding approach by comparing benthic macroinvertebrates from an urban region to those obtained from a conservation area. Although considerable effort will be required to robustly optimize NGS tools to identify species from bulk environmental samples, our results indicate the potential of an environmental barcoding approach for biomonitoring programs. PMID:21533287

  6. Sequencing of whole plastid genomes and nuclear ribosomal DNA of Diospyros species (Ebenaceae) endemic to New Caledonia: many species, little divergence

    PubMed Central

    Turner, Barbara; Paun, Ovidiu; Munzinger, Jérôme; Chase, Mark W.; Samuel, Rosabelle

    2016-01-01

    Background and Aims Some plant groups, especially on islands, have been shaped by strong ancestral bottlenecks and rapid, recent radiation of phenotypic characters. Single molecular markers are often not informative enough for phylogenetic reconstruction in such plant groups. Whole plastid genomes and nuclear ribosomal DNA (nrDNA) are viewed by many researchers as sources of information for phylogenetic reconstruction of groups in which expected levels of divergence in standard markers are low. Here we evaluate the usefulness of these data types to resolve phylogenetic relationships among closely related Diospyros species. Methods Twenty-two closely related Diospyros species from New Caledonia were investigated using whole plastid genomes and nrDNA data from low-coverage next-generation sequencing (NGS). Phylogenetic trees were inferred using maximum parsimony, maximum likelihood and Bayesian inference on separate plastid and nrDNA and combined matrices. Key Results The plastid and nrDNA sequences were, singly and together, unable to provide well supported phylogenetic relationships among the closely related New Caledonian Diospyros species. In the nrDNA, a 6-fold greater percentage of parsimony-informative characters compared with plastid DNA was found, but the total number of informative sites was greater for the much larger plastid DNA genomes. Combining the plastid and nuclear data improved resolution. Plastid results showed a trend towards geographical clustering of accessions rather than following taxonomic species. Conclusions In plant groups in which multiple plastid markers are not sufficiently informative, an investigation at the level of the entire plastid genome may also not be sufficient for detailed phylogenetic reconstruction. Sequencing of complete plastid genomes and nrDNA repeats seems to clarify some relationships among the New Caledonian Diospyros species, but the higher percentage of parsimony-informative characters in nrDNA compared with plastid DNA did not help to resolve the phylogenetic tree because the total number of variable sites was much lower than in the entire plastid genome. The geographical clustering of the individuals against a background of overall low sequence divergence could indicate transfer of plastid genomes due to hybridization and introgression following secondary contact. PMID:27098088

  7. Kinetics and thermodynamics of exonuclease-deficient DNA polymerases

    NASA Astrophysics Data System (ADS)

    Gaspard, Pierre

    2016-04-01

    A kinetic theory is developed for exonuclease-deficient DNA polymerases, based on the experimental observation that the rates depend not only on the newly incorporated nucleotide, but also on the previous one, leading to the growth of Markovian DNA sequences from a Bernoullian template. The dependencies on nucleotide concentrations and template sequence are explicitly taken into account. In this framework, the kinetic and thermodynamic properties of DNA replication, in particular, the mean growth velocity, the error probability, and the entropy production are calculated analytically in terms of the rate constants and the concentrations. Theory is compared with numerical simulations for the DNA polymerases of T7 viruses and human mitochondria.

  8. Selection and Screening of DNA Aptamers for Inorganic Nanomaterials.

    PubMed

    Zhou, Yibo; Huang, Zhicheng; Yang, Ronghua; Liu, Juewen

    2018-02-21

    Searching for DNA sequences that can strongly and selectively bind to inorganic surfaces is a long-standing topic in bionanotechnology, analytical chemistry and biointerface research. This can be achieved either by aptamer selection starting with a very large library of ≈10 14 random DNA sequences, or by careful screening of a much smaller library (usually from a few to a few hundred) with rationally designed sequences. Unlike typical molecular targets, inorganic surfaces often have quite strong DNA adsorption affinities due to polyvalent binding and even chemical interactions. This leads to a very high background binding making aptamer selection difficult. Screening, on the other hand, can be designed to compare relative binding affinities of different DNA sequences and could be more appropriate for inorganic surfaces. The resulting sequences have been used for DNA-directed assembly, sorting of carbon nanotubes, and DNA-controlled growth of inorganic nanomaterials. It was recently discovered that poly-cytosine (C) DNA can strongly bind to a diverse range of nanomaterials including nanocarbons (graphene oxide and carbon nanotubes), various metal oxides and transition-metal dichalcogenides. In this Concept article, we articulate the need for screening and potential artifacts associated with traditional aptamer selection methods for inorganic surfaces. Representative examples of application are discussed, and a few future research opportunities are proposed towards the end of this article. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. DNA polymerase preference determines PCR priming efficiency.

    PubMed

    Pan, Wenjing; Byrne-Steele, Miranda; Wang, Chunlin; Lu, Stanley; Clemmons, Scott; Zahorchak, Robert J; Han, Jian

    2014-01-30

    Polymerase chain reaction (PCR) is one of the most important developments in modern biotechnology. However, PCR is known to introduce biases, especially during multiplex reactions. Recent studies have implicated the DNA polymerase as the primary source of bias, particularly initiation of polymerization on the template strand. In our study, amplification from a synthetic library containing a 12 nucleotide random portion was used to provide an in-depth characterization of DNA polymerase priming bias. The synthetic library was amplified with three commercially available DNA polymerases using an anchored primer with a random 3' hexamer end. After normalization, the next generation sequencing (NGS) results of the amplified libraries were directly compared to the unamplified synthetic library. Here, high throughput sequencing was used to systematically demonstrate and characterize DNA polymerase priming bias. We demonstrate that certain sequence motifs are preferred over others as primers where the six nucleotide sequences at the 3' end of the primer, as well as the sequences four base pairs downstream of the priming site, may influence priming efficiencies. DNA polymerases in the same family from two different commercial vendors prefer similar motifs, while another commercially available enzyme from a different DNA polymerase family prefers different motifs. Furthermore, the preferred priming motifs are GC-rich. The DNA polymerase preference for certain sequence motifs was verified by amplification from single-primer templates. We incorporated the observed DNA polymerase preference into a primer-design program that guides the placement of the primer to an optimal location on the template. DNA polymerase priming bias was characterized using a synthetic library amplification system and NGS. The characterization of DNA polymerase priming bias was then utilized to guide the primer-design process and demonstrate varying amplification efficiencies among three commercially available DNA polymerases. The results suggest that the interaction of the DNA polymerase with the primer:template junction during the initiation of DNA polymerization is very important in terms of overall amplification bias and has broader implications for both the primer design process and multiplex PCR.

  10. Biosensing of BCR/ABL fusion gene using an intensity-interrogation surface plasmon resonance imaging system

    NASA Astrophysics Data System (ADS)

    Wu, Jiangling; Huang, Yu; Bian, Xintong; Li, DanDan; Cheng, Quan; Ding, Shijia

    2016-10-01

    In this work, a custom-made intensity-interrogation surface plasmon resonance imaging (SPRi) system has been developed to directly detect a specific sequence of BCR/ABL fusion gene in chronic myelogenous leukemia (CML). The variation in the reflected light intensity detected from the sensor chip composed of gold islands array is proportional to the change of refractive index due to the selective hybridization of surface-bound DNA probes with target ssDNA. SPRi measurements were performed with different concentrations of synthetic target DNA sequence. The calibration curve of synthetic target sequence shows a good relationship between the concentration of synthetic target and the change of reflected light intensity. The detection limit of this SPRi measurement could approach 10.29 nM. By comparing SPRi images, the target ssDNA and non-complementary DNA sequence are able to be distinguished. This SPRi system has been applied for assay of BCR/ABL fusion gene extracted from real samples. This nucleic acid-based SPRi biosensor therefore offers an alternative high-effective, high-throughput label-free tool for DNA detection in biomedical research and molecular diagnosis.

  11. DNA extraction for streamlined metagenomics of diverse environmental samples.

    PubMed

    Marotz, Clarisse; Amir, Amnon; Humphrey, Greg; Gaffney, James; Gogul, Grant; Knight, Rob

    2017-06-01

    A major bottleneck for metagenomic sequencing is rapid and efficient DNA extraction. Here, we compare the extraction efficiencies of three magnetic bead-based platforms (KingFisher, epMotion, and Tecan) to a standardized column-based extraction platform across a variety of sample types, including feces, oral, skin, soil, and water. Replicate sample plates were extracted and prepared for 16S rRNA gene amplicon sequencing in parallel to assess extraction bias and DNA quality. The data demonstrate that any effect of extraction method on sequencing results was small compared with the variability across samples; however, the KingFisher platform produced the largest number of high-quality reads in the shortest amount of time. Based on these results, we have identified an extraction pipeline that dramatically reduces sample processing time without sacrificing bacterial taxonomic or abundance information.

  12. TRX-LOGOS - a graphical tool to demonstrate DNA information content dependent upon backbone dynamics in addition to base sequence.

    PubMed

    Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A

    2015-01-01

    It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.

  13. Gene discovery in Eimeria tenella by immunoscreening cDNA expression libraries of sporozoites and schizonts with chicken intestinal antibodies.

    PubMed

    Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie

    2003-04-02

    Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.

  14. The kinetoplast DNA of the Australian trypanosome, Trypanosoma copemani, shares features with Trypanosoma cruzi and Trypanosoma lewisi.

    PubMed

    Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew

    2018-05-17

    Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Genomic Heat Shock Element Sequences Drive Cooperative Human Heat Shock Factor 1 DNA Binding and Selectivity*

    PubMed Central

    Jaeger, Alex M.; Makley, Leah N.; Gestwicki, Jason E.; Thiele, Dennis J.

    2014-01-01

    The heat shock transcription factor 1 (HSF1) activates expression of a variety of genes involved in cell survival, including protein chaperones, the protein degradation machinery, anti-apoptotic proteins, and transcription factors. Although HSF1 activation has been linked to amelioration of neurodegenerative disease, cancer cells exhibit a dependence on HSF1 for survival. Indeed, HSF1 drives a program of gene expression in cancer cells that is distinct from that activated in response to proteotoxic stress, and HSF1 DNA binding activity is elevated in cycling cells as compared with arrested cells. Active HSF1 homotrimerizes and binds to a DNA sequence consisting of inverted repeats of the pentameric sequence nGAAn, known as heat shock elements (HSEs). Recent comprehensive ChIP-seq experiments demonstrated that the architecture of HSEs is very diverse in the human genome, with deviations from the consensus sequence in the spacing, orientation, and extent of HSE repeats that could influence HSF1 DNA binding efficacy and the kinetics and magnitude of target gene expression. To understand the mechanisms that dictate binding specificity, HSF1 was purified as either a monomer or trimer and used to evaluate DNA-binding site preferences in vitro using fluorescence polarization and thermal denaturation profiling. These results were compared with quantitative chromatin immunoprecipitation assays in vivo. We demonstrate a role for specific orientations of extended HSE sequences in driving preferential HSF1 DNA binding to target loci in vivo. These studies provide a biochemical basis for understanding differential HSF1 target gene recognition and transcription in neurodegenerative disease and in cancer. PMID:25204655

  16. Identification of tissue-embedded ascarid larvae by ribosomal DNA sequencing.

    PubMed

    Ishiwata, Kenji; Shinohara, Akio; Yagi, Kinpei; Horii, Yoichiro; Tsuchiya, Kimiyuki; Nawa, Yukifumi

    2004-01-01

    Polymerase chain reaction (PCR) was applied to identify tissue-embedded ascarid nematode larvae. Two sequences of the internal transcribed spacer (ITS) regions of ribosomal DNA (rDNA), ITS1 and ITS2, of the ascarid parasites were amplified and compared with those of ascarid-nematodes registered in a DNA database (GenBank). The ITS sequences of the PCR products obtained from the ascarid parasite specimen in our laboratory were compatible with those of registered adult Ascaris and Toxocara parasites. PCR amplification of the ITS regions was sensitive enough to detect a single larva of Ascaris suum mixed with porcine liver tissue. Using this method, ascarid larvae embedded in the liver of a naturally infected turkey were identified as Toxocara canis. These results suggest that even a single larva embedded in tissues from patients with larva migrans could be identified by sequencing the ITS regions.

  17. Phylogenetic position of the North American isolate of Pasteuria that parasitizes the soybean cyst nematode, Heterodera glycines, as inferred from 16S rDNA sequence analysis.

    PubMed

    Atibalentja, N; Noel, G R; Domier, L L

    2000-03-01

    A 1341 bp sequence of the 16S rDNA of an undescribed species of Pasteuria that parasitizes the soybean cyst nematode, Heterodera glycines, was determined and then compared with a homologous sequence of Pasteuria ramosa, a parasite of cladoceran water fleas of the family Daphnidae. The two Pasteuria sequences, which diverged from each other by a dissimilarity index of 7%, also were compared with the 16S rDNA sequences of 30 other bacterial species to determine the phylogenetic position of the genus Pasteuria among the Gram-positive eubacteria. Phylogenetic analyses using maximum-likelihood, maximum-parsimony and neighbour-joining methods showed that the Heterodera glycines-infecting Pasteuria and its sister species, P. ramosa, form a distinct line of descent within the Alicyclobacillus group of the Bacillaceae. These results are consistent with the view that the genus Pasteuria is a deeply rooted member of the Clostridium-Bacillus-Streptococcus branch of the Gram-positive eubacteria, neither related to the actinomycetes nor closely related to true endospore-forming bacteria.

  18. Flow cytometry sorting of nuclei enables the first global characterization of Paramecium germline DNA and transposable elements.

    PubMed

    Guérin, Frédéric; Arnaiz, Olivier; Boggetto, Nicole; Denby Wilkes, Cyril; Meyer, Eric; Sperling, Linda; Duharcourt, Sandra

    2017-04-26

    DNA elimination is developmentally programmed in a wide variety of eukaryotes, including unicellular ciliates, and leads to the generation of distinct germline and somatic genomes. The ciliate Paramecium tetraurelia harbors two types of nuclei with different functions and genome structures. The transcriptionally inactive micronucleus contains the complete germline genome, while the somatic macronucleus contains a reduced genome streamlined for gene expression. During development of the somatic macronucleus, the germline genome undergoes massive and reproducible DNA elimination events. Availability of both the somatic and germline genomes is essential to examine the genome changes that occur during programmed DNA elimination and ultimately decipher the mechanisms underlying the specific removal of germline-limited sequences. We developed a novel experimental approach that uses flow cell imaging and flow cytometry to sort subpopulations of nuclei to high purity. We sorted vegetative micronuclei and macronuclei during development of P. tetraurelia. We validated the method by flow cell imaging and by high throughput DNA sequencing. Our work establishes the proof of principle that developing somatic macronuclei can be sorted from a complex biological sample to high purity based on their size, shape and DNA content. This method enabled us to sequence, for the first time, the germline DNA from pure micronuclei and to identify novel transposable elements. Sequencing the germline DNA confirms that the Pgm domesticated transposase is required for the excision of all ~45,000 Internal Eliminated Sequences. Comparison of the germline DNA and unrearranged DNA obtained from PGM-silenced cells reveals that the latter does not provide a faithful representation of the germline genome. We developed a flow cytometry-based method to purify P. tetraurelia nuclei to high purity and provided quality control with flow cell imaging and high throughput DNA sequencing. We identified 61 germline transposable elements including the first Paramecium retrotransposons. This approach paves the way to sequence the germline genomes of P. aurelia sibling species for future comparative genomic studies.

  19. Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM)☆

    PubMed Central

    Parson, Walther; Strobl, Christina; Huber, Gabriela; Zimmermann, Bettina; Gomes, Sibylle M.; Souto, Luis; Fendt, Liane; Delport, Rhena; Langit, Reina; Wootton, Sharon; Lagacé, Robert; Irwin, Jodi

    2013-01-01

    Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics. PMID:23948325

  20. Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing.

    PubMed

    Cartwright, Joseph F; Anderson, Karin; Longworth, Joseph; Lobb, Philip; James, David C

    2018-06-01

    High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ∼40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5 kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding DNA. The putative ratio of non-synonymous and synonymous changes within the open reading frames (ORFs) in the plasmid sequence indicates that natural selection does not impact upon the prevalence of these mutations. Here we have demonstrated the abundance of mutations that fall outside of the reported range of detection of next generation sequencing (NGS) and second generation sequencing (SGS) platforms, providing a methodology capable of being utilized in cell line development platforms to identify the fidelity of recombinant genes throughout the production process. © 2018 Wiley Periodicals, Inc.

  1. RAD tag sequencing as a source of SNP markers in Cynara cardunculus L

    PubMed Central

    2012-01-01

    Background The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. Results RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. Conclusion The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria. PMID:22214349

  2. Characterisation and In Silico Analysis of Interleukin-4 cDNA of Nilgai (Boselaphus tragocamelus) and Indian Buffalo (Bubalus bubalis)

    PubMed Central

    Saini, M.; Palai, T. K.; Das, D. K.; Hatle, K. M.; Gupta, P. K.

    2013-01-01

    Interleukin-4 (IL-4) produced from Th2 cells modulates both innate and adaptive immune responses. It is a common belief that wild animals possess better immunity against diseases than domestic and laboratory animals; however, the immune system of wild animals is not fully explored yet. Therefore, a comparative study was designed to explore the wildlife immunity through characterisation of IL-4 cDNA of nilgai, a wild ruminant, and Indian buffalo, a domestic ruminant. Total RNA was extracted from peripheral blood mononuclear cells of nilgai and Indian buffalo and reverse transcribed into cDNA. Respective cDNA was further cloned and sequenced. Sequences were analysed in silico and compared with their homologues available at GenBank. The deduced 135 amino acid protein of nilgai IL-4 is 95.6% similar to that of Indian buffalo. N-linked glycosylation sequence, leader sequence, Cysteine residues in the signal peptide region, and 3′ UTR of IL-4 were found to be conserved across species. Six nonsynonymous nucleotide substitutions were found in Indian buffalo compared to nilgai amino acid sequence. Tertiary structure of this protein in both species was modeled, and it was found that this protein falls under 4-helical cytokines superfamily and short chain cytokine family. Phylogenetic analysis revealed a single cluster of ruminants including both nilgai and Indian buffalo that was placed distinct from other nonruminant mammals. PMID:24348167

  3. The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).

    PubMed

    Liang, Jian-Ying; Lin, Rui-Qing

    2016-11-01

    In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.

  4. Analysis of sequence variability in the macronuclear DNA of Paramecium tetraurelia: A somatic view of the germline

    PubMed Central

    Duret, Laurent; Cohen, Jean; Jubin, Claire; Dessen, Philippe; Goût, Jean-François; Mousset, Sylvain; Aury, Jean-Marc; Jaillon, Olivier; Noël, Benjamin; Arnaiz, Olivier; Bétermier, Mireille; Wincker, Patrick; Meyer, Eric; Sperling, Linda

    2008-01-01

    Ciliates are the only unicellular eukaryotes known to separate germinal and somatic functions. Diploid but silent micronuclei transmit the genetic information to the next sexual generation. Polyploid macronuclei express the genetic information from a streamlined version of the genome but are replaced at each sexual generation. The macronuclear genome of Paramecium tetraurelia was recently sequenced by a shotgun approach, providing access to the gene repertoire. The 72-Mb assembly represents a consensus sequence for the somatic DNA, which is produced after sexual events by reproducible rearrangements of the zygotic genome involving elimination of repeated sequences, precise excision of unique-copy internal eliminated sequences (IES), and amplification of the cellular genes to high copy number. We report use of the shotgun sequencing data (>106 reads representing 13× coverage of a completely homozygous clone) to evaluate variability in the somatic DNA produced by these developmental genome rearrangements. Although DNA amplification appears uniform, both of the DNA elimination processes produce sequence heterogeneity. The variability that arises from IES excision allowed identification of hundreds of putative new IESs, compared to 42 that were previously known, and revealed cases of erroneous excision of segments of coding sequences. We demonstrate that IESs in coding regions are under selective pressure to introduce premature termination of translation in case of excision failure. PMID:18256234

  5. An ancient trans-kingdom horizontal transfer of Penelope -like retroelements from arthropods to conifers

    Treesearch

    Xuan Lin; Nurul Faridi; Claudio Casola

    2016-01-01

    Comparative genomics analyses empowered by the wealth of sequenced genomes have revealed numerous instances of horizontal DNA transfers between distantly related species. In  eukaryotes, repetitive DNA sequences known as transposable elements (TEs) are especially prone to  move across species boundaries. Such horizontal transposon transfers, or HTTs, are relatively  ...

  6. Comparing COI and ITS as DNA barcode markers for mushrooms and allies (Agaricomycotina).

    PubMed

    Dentinger, Bryn T M; Didukh, Maryna Y; Moncalvo, Jean-Marc

    2011-01-01

    DNA barcoding is an approach to rapidly identify species using short, standard genetic markers. The mitochondrial cytochrome oxidase I gene (COI) has been proposed as the universal barcode locus, but its utility for barcoding in mushrooms (ca. 20,000 species) has not been established. We succeeded in generating 167 partial COI sequences (~450 bp) representing ~100 morphospecies from ~650 collections of Agaricomycotina using several sets of new primers. Large introns (~1500 bp) at variable locations were detected in ~5% of the sequences we obtained. We suspect that widespread presence of large introns is responsible for our low PCR success (~30%) with this locus. We also sequenced the nuclear internal transcribed spacer rDNA regions (ITS) to compare with COI. Among the small proportion of taxa for which COI could be sequenced, COI and ITS perform similarly as a barcode. However, in a densely sampled set of closely related taxa, COI was less divergent than ITS and failed to distinguish all terminal clades. Given our results and the wealth of ITS data already available in public databases, we recommend that COI be abandoned in favor of ITS as the primary DNA barcode locus in mushrooms.

  7. Comparing COI and ITS as DNA Barcode Markers for Mushrooms and Allies (Agaricomycotina)

    PubMed Central

    Dentinger, Bryn T. M.; Didukh, Maryna Y.; Moncalvo, Jean-Marc

    2011-01-01

    DNA barcoding is an approach to rapidly identify species using short, standard genetic markers. The mitochondrial cytochrome oxidase I gene (COI) has been proposed as the universal barcode locus, but its utility for barcoding in mushrooms (ca. 20,000 species) has not been established. We succeeded in generating 167 partial COI sequences (∼450 bp) representing ∼100 morphospecies from ∼650 collections of Agaricomycotina using several sets of new primers. Large introns (∼1500 bp) at variable locations were detected in ∼5% of the sequences we obtained. We suspect that widespread presence of large introns is responsible for our low PCR success (∼30%) with this locus. We also sequenced the nuclear internal transcribed spacer rDNA regions (ITS) to compare with COI. Among the small proportion of taxa for which COI could be sequenced, COI and ITS perform similarly as a barcode. However, in a densely sampled set of closely related taxa, COI was less divergent than ITS and failed to distinguish all terminal clades. Given our results and the wealth of ITS data already available in public databases, we recommend that COI be abandoned in favor of ITS as the primary DNA barcode locus in mushrooms. PMID:21966418

  8. Efficient isolation method for high-quality genomic DNA from cicada exuviae.

    PubMed

    Nguyen, Hoa Quynh; Kim, Ye Inn; Borzée, Amaël; Jang, Yikweon

    2017-10-01

    In recent years, animal ethics issues have led researchers to explore nondestructive methods to access materials for genetic studies. Cicada exuviae are among those materials because they are cast skins that individuals left after molt and are easily collected. In this study, we aim to identify the most efficient extraction method to obtain high quantity and quality of DNA from cicada exuviae. We compared relative DNA yield and purity of six extraction protocols, including both manual protocols and available commercial kits, extracting from four different exoskeleton parts. Furthermore, amplification and sequencing of genomic DNA were evaluated in terms of availability of sequencing sequence at the expected genomic size. Both the choice of protocol and exuvia part significantly affected DNA yield and purity. Only samples that were extracted using the PowerSoil DNA Isolation kit generated gel bands of expected size as well as successful sequencing results. The failed attempts to extract DNA using other protocols could be partially explained by a low DNA yield from cicada exuviae and partly by contamination with humic acids that exist in the soil where cicada nymphs reside before emergence, as shown by spectroscopic measurements. Genomic DNA extracted from cicada exuviae could provide valuable information for species identification, allowing the investigation of genetic diversity across consecutive broods, or spatiotemporal variation among various populations. Consequently, we hope to provide a simple method to acquire pure genomic DNA applicable for multiple research purposes.

  9. A simple, rapid, high-fidelity and cost-effective PCR-based two-step DNA synthesis method for long gene sequences.

    PubMed

    Xiong, Ai-Sheng; Yao, Quan-Hong; Peng, Ri-He; Li, Xian; Fan, Hui-Qin; Cheng, Zong-Ming; Li, Yi

    2004-07-07

    Chemical synthesis of DNA sequences provides a powerful tool for modifying genes and for studying gene function, structure and expression. Here, we report a simple, high-fidelity and cost-effective PCR-based two-step DNA synthesis (PTDS) method for synthesis of long segments of DNA. The method involves two steps. (i) Synthesis of individual fragments of the DNA of interest: ten to twelve 60mer oligonucleotides with 20 bp overlap are mixed and a PCR reaction is carried out with high-fidelity DNA polymerase Pfu to produce DNA fragments that are approximately 500 bp in length. (ii) Synthesis of the entire sequence of the DNA of interest: five to ten PCR products from the first step are combined and used as the template for a second PCR reaction using high-fidelity DNA polymerase pyrobest, with the two outermost oligonucleotides as primers. Compared with the previously published methods, the PTDS method is rapid (5-7 days) and suitable for synthesizing long segments of DNA (5-6 kb) with high G + C contents, repetitive sequences or complex secondary structures. Thus, the PTDS method provides an alternative tool for synthesizing and assembling long genes with complex structures. Using the newly developed PTDS method, we have successfully obtained several genes of interest with sizes ranging from 1.0 to 5.4 kb.

  10. Quantitation of next generation sequencing library preparation protocol efficiencies using droplet digital PCR assays - a systematic comparison of DNA library preparation kits for Illumina sequencing.

    PubMed

    Aigrain, Louise; Gu, Yong; Quail, Michael A

    2016-06-13

    The emergence of next-generation sequencing (NGS) technologies in the past decade has allowed the democratization of DNA sequencing both in terms of price per sequenced bases and ease to produce DNA libraries. When it comes to preparing DNA sequencing libraries for Illumina, the current market leader, a plethora of kits are available and it can be difficult for the users to determine which kit is the most appropriate and efficient for their applications; the main concerns being not only cost but also minimal bias, yield and time efficiency. We compared 9 commercially available library preparation kits in a systematic manner using the same DNA sample by probing the amount of DNA remaining after each protocol steps using a new droplet digital PCR (ddPCR) assay. This method allows the precise quantification of fragments bearing either adaptors or P5/P7 sequences on both ends just after ligation or PCR enrichment. We also investigated the potential influence of DNA input and DNA fragment size on the final library preparation efficiency. The overall library preparations efficiencies of the libraries show important variations between the different kits with the ones combining several steps into a single one exhibiting some final yields 4 to 7 times higher than the other kits. Detailed ddPCR data also reveal that the adaptor ligation yield itself varies by more than a factor of 10 between kits, certain ligation efficiencies being so low that it could impair the original library complexity and impoverish the sequencing results. When a PCR enrichment step is necessary, lower adaptor-ligated DNA inputs leads to greater amplification yields, hiding the latent disparity between kits. We describe a ddPCR assay that allows us to probe the efficiency of the most critical step in the library preparation, ligation, and to draw conclusion on which kits is more likely to preserve the sample heterogeneity and reduce the need of amplification.

  11. HYBRIDIZATION PROPERTIES OF DNA SEQUENCES DIRECTING THE SYNTHESIS OF MESSENGER RNA AND HETEROGENEOUS NUCLEAR RNA

    PubMed Central

    Greenberg, Jay R.; Perry, Robert P.

    1971-01-01

    The relationship of the DNA sequences from which polyribosomal messenger RNA (mRNA) and heterogeneous nuclear RNA (NRNA) of mouse L cells are transcribed was investigated by means of hybridization kinetics and thermal denaturation of the hybrids. Hybridization was performed in formamide solutions at DNA excess. Under these conditions most of the hybridizing mRNA and NRNA react at values of Dot (DNA concentration multiplied by time) expected for RNA transcribed from the nonrepeated or rarely repeated fraction of the genome. However, a fraction of both mRNA and NRNA hybridize at values of Dot about 10,000 times lower, and therefore must be transcribed from highly redundant DNA sequences. The fraction of NRNA hybridizing to highly repeated sequences is about 1.7 times greater than the corresponding fraction of mRNA. The hybrids formed by the rapidly reacting fractions of both NRNA and mRNA melt over a narrow temperature range with a midpoint about 11°C below that of native L cell DNA. This indicates that these hybrids consist of partially complementary sequences with approximately 11% mismatching of bases. Hybrids formed by the slowly reacting fraction of NRNA melt within 4°–6°C of native DNA, indicating very little, if any, mismatching of bases. Hybrids of the slowly reacting components of mRNA, formed under conditions of sufficiently low RNA input, have a high thermal stability, similar to that observed for hybrids of the slowly reacting NRNA component. However, when higher inputs of mRNA are used, hybrids are formed which have a strikingly lower thermal stability. This observation can be explained by assuming that there is sufficient similarity among the relatively rare DNA sequences coding for mRNA so that under hybridization conditions, in which these DNA sequences are not truly in excess, reversible hybrids exhibiting a considerable amount of mispairing are formed. The fact that a comparable phenomenon has not been observed for NRNA may mean that there is less similarity among the relatively rare DNA sequences coding for NRNA than there is among the rare sequences coding for mRNA. PMID:4999767

  12. Global DNA methylation analysis using methyl-sensitive amplification polymorphism (MSAP).

    PubMed

    Yaish, Mahmoud W; Peng, Mingsheng; Rothstein, Steven J

    2014-01-01

    DNA methylation is a crucial epigenetic process which helps control gene transcription activity in eukaryotes. Information regarding the methylation status of a regulatory sequence of a particular gene provides important knowledge of this transcriptional control. DNA methylation can be detected using several methods, including sodium bisulfite sequencing and restriction digestion using methylation-sensitive endonucleases. Methyl-Sensitive Amplification Polymorphism (MSAP) is a technique used to study the global DNA methylation status of an organism and hence to distinguish between two individuals based on the DNA methylation status determined by the differential digestion pattern. Therefore, this technique is a useful method for DNA methylation mapping and positional cloning of differentially methylated genes. In this technique, genomic DNA is first digested with a methylation-sensitive restriction enzyme such as HpaII, and then the DNA fragments are ligated to adaptors in order to facilitate their amplification. Digestion using a methylation-insensitive isoschizomer of HpaII, MspI is used in a parallel digestion reaction as a loading control in the experiment. Subsequently, these fragments are selectively amplified by fluorescently labeled primers. PCR products from different individuals are compared, and once an interesting polymorphic locus is recognized, the desired DNA fragment can be isolated from a denaturing polyacrylamide gel, sequenced and identified based on DNA sequence similarity to other sequences available in the database. We will use analysis of met1, ddm1, and atmbd9 mutants and wild-type plants treated with a cytidine analogue, 5-azaC, or zebularine to demonstrate how to assess the genetic modulation of DNA methylation in Arabidopsis. It should be noted that despite the fact that MSAP is a reliable technique used to fish for polymorphic methylated loci, its power is limited to the restriction recognition sites of the enzymes used in the genomic DNA digestion.

  13. Molecular structure and chromosome distribution of three repetitive DNA families in Anemone hortensis L. (Ranunculaceae).

    PubMed

    Mlinarec, Jelena; Chester, Mike; Siljak-Yakovlev, Sonja; Papes, Drazena; Leitch, Andrew R; Besendorfer, Visnja

    2009-01-01

    The structure, abundance and location of repetitive DNA sequences on chromosomes can characterize the nature of higher plant genomes. Here we report on three new repeat DNA families isolated from Anemone hortensis L.; (i) AhTR1, a family of satellite DNA (stDNA) composed of a 554-561 bp long EcoRV monomer; (ii) AhTR2, a stDNA family composed of a 743 bp long HindIII monomer and; (iii) AhDR, a repeat family composed of a 945 bp long HindIII fragment that exhibits some sequence similarity to Ty3/gypsy-like retroelements. Fluorescence in-situ hybridization (FISH) to metaphase chromosomes of A. hortensis (2n = 16) revealed that both AhTR1 and AhTR2 sequences co-localized with DAPI-positive AT-rich heterochromatic regions. AhTR1 sequences occur at intercalary DAPI bands while AhTR2 sequences occur at 8-10 terminally located heterochromatic blocks. In contrast AhDR sequences are dispersed over all chromosomes as expected of a Ty3/gypsy-like element. AhTR2 and AhTR1 repeat families include polyA- and polyT-tracks, AT/TA-motifs and a pentanucleotide sequence (CAAAA) that may have consequences for chromatin packing and sequence homogeneity. AhTR2 repeats also contain TTTAGGG motifs and degenerate variants. We suggest that they arose by interspersion of telomeric repeats with subtelomeric repeats, before hybrid unit(s) amplified through the heterochromatic domain. The three repetitive DNA families together occupy approximately 10% of the A. hortensis genome. Comparative analyses of eight Anemone species revealed that the divergence of the A. hortensis genome was accompanied by considerable modification and/or amplification of repeats.

  14. Classification of European Mtdnas from an Analysis of Three European Populations

    PubMed Central

    Torroni, A.; Huoponen, K.; Francalacci, P.; Petrozzi, M.; Morelli, L.; Scozzari, R.; Obinu, D.; Savontaus, M. L.; Wallace, D. C.

    1996-01-01

    Mitochondrial DNA (mtDNA) sequence variation was examined in Finns, Swedes and Tuscans by PCR amplification and restriction analysis. About 99% of the mtDNAs were subsumed within 10 mtDNA haplogroups (H, I, J, K, M, T, U, V, W, and X) suggesting that the identified haplogroups could encompass virtually all European mtDNAs. Because both hypervariable segments of the mtDNA control region were previously sequenced in the Tuscan samples, the mtDNA haplogroups and control region sequences could be compared. Using a combination of haplogroup-specific restriction site changes and control region nucleotide substitutions, the distribution of the haplogroups was surveyed through the published restriction site polymorphism and control region sequence data of Caucasoids. This supported the conclusion that most haplogroups observed in Europe are Caucasoid-specific, and that at least some of them occur at varying frequencies in different Caucasoid populations. The classification of almost all European mtDNA variation in a number of well defined haplogroups could provide additional insights about the origin and relationships of Caucasoid populations and the process of human colonization of Europe, and is valuable for the definition of the role played by mtDNA backgrounds in the expression of pathological mtDNA mutations PMID:8978068

  15. Bypassing bacterial infection in phage display by sequencing DNA released from phage particles.

    PubMed

    Villequey, Camille; Kong, Xu-Dong; Heinis, Christian

    2017-11-01

    Phage display relies on a bacterial infection step in which the phage particles are replicated to perform multiple affinity selection rounds and to enable the identification of isolated clones by DNA sequencing. While this process is efficient for wild-type phage, the bacterial infection rate of phage with mutant or chemically modified coat proteins can be low. For example, a phage mutant with a disulfide-free p3 coat protein, used for the selection of bicyclic peptides, has a more than 100-fold reduced infection rate compared to the wild-type. A potential strategy for bypassing the bacterial infection step is to directly sequence DNA extracted from phage particles after a single round of phage panning using high-throughput sequencing. In this work, we have quantified the fraction of phage clones that can be identified by directly sequencing DNA from phage particles. The results show that the DNA of essentially all of the phage particles can be 'decoded', and that the sequence coverage for mutants equals that of amplified DNA extracted from cells infected with wild-type phage. This procedure is particularly attractive for selections with phage that have a compromised infection capacity, and it may allow phage display to be performed with particles that are not infective at all. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Analysis on the DNA Fingerprinting of Aspergillus Oryzae Mutant Induced by High Hydrostatic Pressure

    NASA Astrophysics Data System (ADS)

    Wang, Hua; Zhang, Jian; Yang, Fan; Wang, Kai; Shen, Si-Le; Liu, Bing-Bing; Zou, Bo; Zou, Guang-Tian

    2011-01-01

    The mutant strains of aspergillus oryzae (HP300a) are screened under 300 MPa for 20 min. Compared with the control strains, the screened mutant strains have unique properties such as genetic stability, rapid growth, lots of spores, and high protease activity. Random amplified polymorphic DNA (RAPD) and inter simple sequence repeats (ISSR) are used to analyze the DNA fingerprinting of HP300a and the control strains. There are 67.9% and 51.3% polymorphic bands obtained by these two markers, respectively, indicating significant genetic variations between HP300a and the control strains. In addition, comparison of HP300a and the control strains, the genetic distances of random sequence and simple sequence repeat of DNA are 0.51 and 0.34, respectively.

  17. A novel method of genomic DNA extraction for Cactaceae1

    PubMed Central

    Fehlberg, Shannon D.; Allen, Jessica M.; Church, Kathleen

    2013-01-01

    • Premise of the study: Genetic studies of Cactaceae can at times be impeded by difficult sampling logistics and/or high mucilage content in tissues. Simplifying sampling and DNA isolation through the use of cactus spines has not previously been investigated. • Methods and Results: Several protocols for extracting DNA from spines were tested and modified to maximize yield, amplification, and sequencing. Sampling of and extraction from spines resulted in a simplified protocol overall and complete avoidance of mucilage as compared to typical tissue extractions. Sequences from one nuclear and three plastid regions were obtained across eight genera and 20 species of cacti using DNA extracted from spines. • Conclusions: Genomic DNA useful for amplification and sequencing can be obtained from cactus spines. The protocols described here are valuable for any cactus species, but are particularly useful for investigators interested in sampling living collections, extensive field sampling, and/or conservation genetic studies. PMID:25202521

  18. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data

    PubMed Central

    Feng, Hao; Conneely, Karen N.; Wu, Hao

    2014-01-01

    DNA methylation is an important epigenetic modification that has essential roles in cellular processes including gene regulation, development and disease and is widely dysregulated in most types of cancer. Recent advances in sequencing technology have enabled the measurement of DNA methylation at single nucleotide resolution through methods such as whole-genome bisulfite sequencing and reduced representation bisulfite sequencing. In DNA methylation studies, a key task is to identify differences under distinct biological contexts, for example, between tumor and normal tissue. A challenge in sequencing studies is that the number of biological replicates is often limited by the costs of sequencing. The small number of replicates leads to unstable variance estimation, which can reduce accuracy to detect differentially methylated loci (DML). Here we propose a novel statistical method to detect DML when comparing two treatment groups. The sequencing counts are described by a lognormal-beta-binomial hierarchical model, which provides a basis for information sharing across different CpG sites. A Wald test is developed for hypothesis testing at each CpG site. Simulation results show that the proposed method yields improved DML detection compared to existing methods, particularly when the number of replicates is low. The proposed method is implemented in the Bioconductor package DSS. PMID:24561809

  19. Next generation sequencing of DNA-launched Chikungunya vaccine virus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hidajat, Rachmat; Nickols, Brian; Forrester, Naomi

    Chikungunya virus (CHIKV) represents a pandemic threat with no approved vaccine available. Recently, we described a novel vaccination strategy based on iDNA® infectious clone designed to launch a live-attenuated CHIKV vaccine from plasmid DNA in vitro or in vivo. As a proof of concept, we prepared iDNA plasmid pCHIKV-7 encoding the full-length cDNA of the 181/25 vaccine. The DNA-launched CHIKV-7 virus was prepared and compared to the 181/25 virus. Illumina HiSeq2000 sequencing revealed that with the exception of the 3′ untranslated region, CHIKV-7 viral RNA consistently showed a lower frequency of single-nucleotide polymorphisms than the 181/25 RNA including at themore » E2-12 and E2-82 residues previously identified as attenuating mutations. In the CHIKV-7, frequencies of reversions at E2-12 and E2-82 were 0.064% and 0.086%, while in the 181/25, frequencies were 0.179% and 0.133%, respectively. We conclude that the DNA-launched virus has a reduced probability of reversion mutations, thereby enhancing vaccine safety. - Highlights: • Chikungunya virus (CHIKV) is an emerging pandemic threat. • In vivo DNA-launched attenuated CHIKV is a novel vaccine technology. • DNA-launched virus was sequenced using HiSeq2000 and compared to the 181/25 virus. • DNA-launched virus has lower frequency of SNPs at E2-12 and E2-82 attenuation loci.« less

  20. The genetic diversity of Epstein-Barr virus in the setting of transplantation relative to non-transplant settings: A feasibility study.

    PubMed

    Allen, Upton D; Hu, Pingzhao; Pereira, Sergio L; Robinson, Joan L; Paton, Tara A; Beyene, Joseph; Khodai-Booran, Nasser; Dipchand, Anne; Hébert, Diane; Ng, Vicky; Nalpathamkalam, Thomas; Read, Stanley

    2016-02-01

    This study examines EBV strains from transplant patients and patients with IM by sequencing major EBV genes. We also used NGS to detect EBV DNA within total genomic DNA, and to evaluate its genetic variation. Sanger sequencing of major EBV genes was used to compare SNVs from samples taken from transplant patients vs. patients with IM. We sequenced EBV DNA from a healthy EBV-seropositive individual on a HiSeq 2000 instrument. Data were mapped to the EBV reference genomes (AG876 and B95-8). The number of EBNA2 SNVs was higher than for EBNA1 and the other genes sequenced within comparable reference coordinates. For EBNA2, there was a median of 15 SNV among transplant samples compared with 10 among IM samples (p = 0.036). EBNA1 showed little variation between samples. For NGS, we identified 640 and 892 variants at an unadjusted p value of 5 × 10(-8) for AG876 and B95-8 genomes, respectively. We used complementary sequence strategies to examine EBV genetic diversity and its application to transplantation. The results provide the framework for further characterization of EBV strains and related outcomes after organ transplantation. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  1. 16S rDNA-based metagenomic analysis of dental plaque and lung bacteria in patients with severe acute exacerbations of chronic obstructive pulmonary disease.

    PubMed

    Tan, L; Wang, H; Li, C; Pan, Y

    2014-12-01

    Acute exacerbations of chronic obstructive pulmonary disease (AE-COPD) are leading causes of mortality in hospital intensive care units. We sought to determine whether dental plaque biofilms might harbor pathogenic bacteria that can eventually cause lung infections in patients with severe AE-COPD. Paired samples of subgingival plaque biofilm and tracheal aspirate were collected from 53 patients with severe AE-COPD. Total bacterial DNA was extracted from each sample individually for polymerase chain reaction amplification and/or generation of bacterial 16S rDNA sequences and cDNA libraries. We used a metagenomic approach, based on bacterial 16S rDNA sequences, to compare the distribution of species present in dental plaque and lung. Analysis of 1060 sequences (20 clones per patient) revealed a wide range of aerobic, anaerobic, pathogenic, opportunistic, novel and uncultivable bacterial species. Species indistinguishable between the paired subgingival plaque and tracheal aspirate samples (97-100% similarity in 16S rDNA sequence) were dental plaque pathogens (Aggregatibacter actinomycetemcomitans, Capnocytophaga sputigena, Porphyromonas gingivalis, Tannerella forsythia and Treponema denticola) and lung pathogens (Acinetobacter baumannii, Klebsiella pneumoniae, Pseudomonas aeruginosa and Streptococcus pneumoniae). Real-time polymerase chain reaction of 16S rDNA indicated lower levels of Pseudomonas aeruginosa and Porphyromonas gingivalis colonizing the dental plaques compared with the paired tracheal aspirate samples. These results support the hypothesis that dental bacteria may contribute to the pathology of severe AE-COPD. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  2. Identifying active foraminifera in the Sea of Japan using metatranscriptomic approach

    NASA Astrophysics Data System (ADS)

    Lejzerowicz, Franck; Voltsky, Ivan; Pawlowski, Jan

    2013-02-01

    Metagenetics represents an efficient and rapid tool to describe environmental diversity patterns of microbial eukaryotes based on ribosomal DNA sequences. However, the results of metagenetic studies are often biased by the presence of extracellular DNA molecules that are persistent in the environment, especially in deep-sea sediment. As an alternative, short-lived RNA molecules constitute a good proxy for the detection of active species. Here, we used a metatranscriptomic approach based on RNA-derived (cDNA) sequences to study the diversity of the deep-sea benthic foraminifera and compared it to the metagenetic approach. We analyzed 257 ribosomal DNA and cDNA sequences obtained from seven sediments samples collected in the Sea of Japan at depths ranging from 486 to 3665 m. The DNA and RNA-based approaches gave a similar view of the taxonomic composition of foraminiferal assemblage, but differed in some important points. First, the cDNA dataset was dominated by sequences of rotaliids and robertiniids, suggesting that these calcareous species, some of which have been observed in Rose Bengal stained samples, are the most active component of foraminiferal community. Second, the richness of monothalamous (single-chambered) foraminifera was particularly high in DNA extracts from the deepest samples, confirming that this group of foraminifera is abundant but not necessarily very active in the deep-sea sediments. Finally, the high divergence of undetermined sequences in cDNA dataset indicate the limits of our database and lack of knowledge about some active but possibly rare species. Our study demonstrates the capability of the metatranscriptomic approach to detect active foraminiferal species and prompt its use in future high-throughput sequencing-based environmental surveys.

  3. Applications of statistical physics and information theory to the analysis of DNA sequences

    NASA Astrophysics Data System (ADS)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  4. Genetic variation among the Mapuche Indians from the Patagonian region of Argentina: mitochondrial DNA sequence variation and allele frequencies of several nuclear genes.

    PubMed

    Ginther, C; Corach, D; Penacino, G A; Rey, J A; Carnese, F R; Hutz, M H; Anderson, A; Just, J; Salzano, F M; King, M C

    1993-01-01

    DNA samples from 60 Mapuche Indians, representing 39 maternal lineages, were genetically characterized for (1) nucleotide sequences of the mtDNA control region; (2) presence or absence of a nine base duplication in mtDNA region V; (3) HLA loci DRB1 and DQA1; (4) variation at three nuclear genes with short tandem repeats; and (5) variation at the polymorphic marker D2S44. The genetic profile of the Mapuche population was compared to other Amerinds and to worldwide populations. Two highly polymorphic portions of the mtDNA control region, comprising 650 nucleotides, were amplified by the polymerase chain reaction (PCR) and directly sequenced. The 39 maternal lineages were defined by two or three generation families identified by the Mapuches. These 39 lineages included 19 different mtDNA sequences that could be grouped into four classes. The same classes of sequences appear in other Amerinds from North, Central, and South American populations separated by thousands of miles, suggesting that the origin of the mtDNA patterns predates the migration to the Americas. The mtDNA sequence similarity between Amerind populations suggests that the migration throughout the Americas occurred rapidly relative to the mtDNA mutation rate. HLA DRB1 alleles 1602 and 1402 were frequent among the Mapuches. These alleles also occur at high frequency among other Amerinds in North and South America, but not among Spanish, Chinese or African-American populations. The high frequency of these alleles throughout the Americas, and their specificity to the Americas, supports the hypothesis that Mapuches and other Amerind groups are closely related.(ABSTRACT TRUNCATED AT 250 WORDS)

  5. Quantification of Functionalised Gold Nanoparticle-Targeted Knockdown of Gene Expression in HeLa Cells

    PubMed Central

    Jiwaji, Meesbah; Sandison, Mairi E.; Reboud, Julien; Stevenson, Ross; Daly, Rónán; Barkess, Gráinne; Faulds, Karen; Kolch, Walter; Graham, Duncan; Girolami, Mark A.; Cooper, Jonathan M.; Pitt, Andrew R.

    2014-01-01

    Introduction Gene therapy continues to grow as an important area of research, primarily because of its potential in the treatment of disease. One significant area where there is a need for better understanding is in improving the efficiency of oligonucleotide delivery to the cell and indeed, following delivery, the characterization of the effects on the cell. Methods In this report, we compare different transfection reagents as delivery vehicles for gold nanoparticles functionalized with DNA oligonucleotides, and quantify their relative transfection efficiencies. The inhibitory properties of small interfering RNA (siRNA), single-stranded RNA (ssRNA) and single-stranded DNA (ssDNA) sequences targeted to human metallothionein hMT-IIa are also quantified in HeLa cells. Techniques used in this study include fluorescence and confocal microscopy, qPCR and Western analysis. Findings We show that the use of transfection reagents does significantly increase nanoparticle transfection efficiencies. Furthermore, siRNA, ssRNA and ssDNA sequences all have comparable inhibitory properties to ssDNA sequences immobilized onto gold nanoparticles. We also show that functionalized gold nanoparticles can co-localize with autophagosomes and illustrate other factors that can affect data collection and interpretation when performing studies with functionalized nanoparticles. Conclusions The desired outcome for biological knockdown studies is the efficient reduction of a specific target; which we demonstrate by using ssDNA inhibitory sequences targeted to human metallothionein IIa gene transcripts that result in the knockdown of both the mRNA transcript and the target protein. PMID:24926959

  6. Diversity of halophilic archaea from six hypersaline environments in Turkey.

    PubMed

    Ozcan, Birgul; Ozcengiz, Gulay; Coleri, Arzu; Cokmus, Cumhur

    2007-06-01

    The diversity of archaeal strains from six hypersaline environments in Turkey was analyzed by comparing their phenotypic characteristics and 16S rDNA sequences. Thirty-three isolates were characterized in terms of their phenotypic properties including morphological and biochemical characteristics, susceptibility to different antibiotics, and total lipid and plasmid contents, and finally compared by 16S rDNA gene sequences. The results showed that all isolates belong to the family Halobacteriaceae. Phylogenetic analyses using approximately 1,388 bp comparisions of 16S rDNA sequences demonstrated that all isolates clustered closely to species belonging to 9 genera, namely Halorubrum (8 isolates), Natrinema (5 isolates), Haloarcula (4 isolates), Natronococcus (4 isolates), Natrialba (4 isolates), Haloferax (3 isolates), Haloterrigena (3 isolates), Halalkalicoccus (1 isolate), and Halomicrobium (1 isolate). The results revealed a high diversity among the isolated halophilic strains and indicated that some of these strains constitute new taxa of extremely halophilic archaea.

  7. Detection of Hepatozoon felis in Ticks Collected from Free-Ranging Amur Tigers ( Panthera tigris altaica), Russian Far East, 2002-12.

    PubMed

    Thomas, Lindsay H; Seryodkin, Ivan V; Goodrich, John M; Miquelle, Dale G; Birtles, Richard J; Lewis, John C M

    2016-07-01

    We collected 69 ticks from nine, free-ranging Amur tigers ( Panthera tigris altaica) between 2002 and 2011 and investigated them for tick-borne pathogens. DNA was extracted using alkaline digestion and PCR was performed to detect apicomplexan organisms. Partial 18S rDNA amplification products were obtained from 14 ticks from four tigers, of which 13 yielded unambiguous nucleotide sequence data. Comparative sequence analysis revealed all 13 partial 18S rDNA sequences were most similar to those belonging to strains of Hepatozoon felis (>564/572 base-pair identity, >99% sequence similarity). Although this tick-borne protozoon pathogen has been detected in wild felids from many parts of the world, this is the first record from the Russian Far East.

  8. HLA genotyping by next-generation sequencing of complementary DNA.

    PubMed

    Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya

    2017-11-28

    Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.

  9. Comparative sequence analysis revealed altered chromosomal organization and a novel insertion sequence encoding DNA modification and potentially stress-related functions in an Escherichia coli O157:H7 foodborne isolate

    USDA-ARS?s Scientific Manuscript database

    We recently described the complete genome of enterohemorrhagic Escherichia coli (EHEC) O157:H7 strain NADC 6564, an isolate of strain 86-24 linked to the 1986 disease outbreak. In the current study, we compared the chromosomal sequence of NADC 6564 to the well-characterized chromosomal sequences of ...

  10. DYZ1 arrays show sequence variation between the monozygotic males

    PubMed Central

    2014-01-01

    Background Monozygotic twins (MZT) are an important resource for genetical studies in the context of normal and diseased genomes. In the present study we used DYZ1, a satellite fraction present in the form of tandem arrays on the long arm of the human Y chromosome, as a tool to uncover sequence variations between the monozygotic males. Results We detected copy number variation, frequent insertions and deletions within the sequences of DYZ1 arrays amongst all the three sets of twins used in the present study. MZT1b showed loss of 35 bp compared to that in 1a, whereas 2a showed loss of 31 bp compared to that in 2b. Similarly, 3b showed 10 bp insertion compared to that in 3a. MZT1a germline DNA showed loss of 5 bp and 1b blood DNA showed loss of 26 bp compared to that of 1a blood and 1b germline DNA, respectively. Of the 69 restriction sites detected in DYZ1 arrays, MboII, BsrI, TspEI and TaqI enzymes showed frequent loss and or gain amongst all the 3 pairs studied. MZT1 pair showed loss/gain of VspI, BsrDI, AgsI, PleI, TspDTI, TspEI, TfiI and TaqI restriction sites in both blood and germline DNA. All the three sets of MZT showed differences in the number of DYZ1 copies. FISH signals reflected somatic mosaicism of the DYZ1 copies across the cells. Conclusions DYZ1 showed both sequence and copy number variation between the MZT males. Sequence variation was also noticed between germline and blood DNA samples of the same individual as we observed at least in one set of sample. The result suggests that DYZ1 faithfully records all the genetical changes occurring after the twining which may be ascribed to the environmental factors. PMID:24495361

  11. Nature and distribution of feline sarcoma virus nucleotide sequences.

    PubMed Central

    Frankel, A E; Gilbert, J H; Porzig, K J; Scolnick, E M; Aaronson, S A

    1979-01-01

    The genomes of three independent isolates of feline sarcoma virus (FeSV) were compared by molecular hybridization techniques. Using complementary DNAs prepared from two strains, SM- and ST-FeSV, common complementary DNA'S were selected by sequential hybridization to FeSV and feline leukemia virus RNAs. These DNAs were shown to be highly related among the three independent sarcoma virus isolates. FeSV-specific complementary DNAs were prepared by selection for hybridization by the homologous FeSV RNA and against hybridization by fline leukemia virus RNA. Sarcoma virus-specific sequences of SM-FeSV were shown to differ from those of either ST- or GA-FeSV strains, whereas ST-FeSV-specific DNA shared extensive sequence homology with GA-FeSV. By molecular hybridization, each set of FeSV-specific sequences was demonstrated to be present in normal cat cellular DNA in approximately one copy per haploid genome and was conserved throughout Felidae. In contrast, FeSV-common sequences were present in multiple DNA copies and were found only in Mediterranean cats. The present results are consistent with the concept that each FeSV strain has arisen by a mechanism involving recombination between feline leukemia virus and cat cellular DNA sequences, the latter represented within the cat genome in a manner analogous to that of a cellular gene. PMID:225544

  12. Comparison of variable region 3 sequences of human immunodeficiency virus type 1 from infected children with the RNA and DNA sequences of the virus populations of their mothers.

    PubMed Central

    Scarlatti, G; Leitner, T; Halapi, E; Wahlberg, J; Marchisio, P; Clerici-Schoeller, M A; Wigzell, H; Fenyö, E M; Albert, J; Uhlén, M

    1993-01-01

    We have compared the variable region 3 sequences from 10 human immunodeficiency virus type 1 (HIV-1)-infected infants to virus sequences from the corresponding mothers. The sequences were derived from DNA of uncultured peripheral blood mononuclear cells (PBMC), DNA of cultured PBMC, and RNA from serum collected at or shortly after delivery. The infected infants, in contrast to the mothers, harbored homogeneous virus populations. Comparison of sequences from the children and clones derived from DNA of the corresponding mothers showed that the transmitted virus represented either a minor or a major virus population of the mother. In contrast to an earlier study, we found no evidence of selection of minor virus variants during transmission. Furthermore, the transmitted virus variant did not show any characteristic molecular features. In some cases the transmitted virus was more related to the virus RNA population of the mother and in other cases it was more related to the virus DNA population. This suggests that either cell-free or cell-associated virus may be transmitted. These data will help AIDS researchers to understand the mechanism of transmission and to plan strategies for prevention of transmission. PMID:8446584

  13. Comprehensive Survey of Genetic Diversity in Chloroplast Genomes and 45S nrDNAs within Panax ginseng Species

    PubMed Central

    Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Lee, Hyun Oh; Joh, Ho Jun; Kim, Nam-Hoon; Park, Hyun-Seung; Yang, Tae-Jin

    2015-01-01

    We report complete sequences of chloroplast (cp) genome and 45S nuclear ribosomal DNA (45S nrDNA) for 11 Panax ginseng cultivars. We have obtained complete sequences of cp and 45S nrDNA, the representative barcoding target sequences for cytoplasm and nuclear genome, respectively, based on low coverage NGS sequence of each cultivar. The cp genomes sizes ranged from 156,241 to 156,425 bp and the major size variation was derived from differences in copy number of tandem repeats in the ycf1 gene and in the intergenic regions of rps16-trnUUG and rpl32-trnUAG. The complete 45S nrDNA unit sequences were 11,091 bp, representing a consensus single transcriptional unit with an intergenic spacer region. Comparative analysis of these sequences as well as those previously reported for three Chinese accessions identified very rare but unique polymorphism in the cp genome within P. ginseng cultivars. There were 12 intra-species polymorphisms (six SNPs and six InDels) among 14 cultivars. We also identified five SNPs from 45S nrDNA of 11 Korean ginseng cultivars. From the 17 unique informative polymorphic sites, we developed six reliable markers for analysis of ginseng diversity and cultivar authentication. PMID:26061692

  14. Comparative Analyses of DNA Methylation and Sequence Evolution Using Nasonia Genomes

    PubMed Central

    Park, Jungsun; Peng, Zuogang; Zeng, Jia; Elango, Navin; Park, Taesung; Wheeler, Dave; Werren, John H.; Yi, Soojin V.

    2011-01-01

    The functional and evolutionary significance of DNA methylation in insect genomes remains to be resolved. Nasonia is well situated for comparative analyses of DNA methylation and genome evolution, since the genomes of a moderately distant outgroup species as well as closely related sibling species are available. Using direct sequencing of bisulfite-converted DNA, we uncovered a substantial level of DNA methylation in 17 of 18 Nasonia vitripennis genes and a strong correlation between methylation level and CpG depletion. Notably, in the sex-determining locus transformer, the exon that is alternatively spliced between the sexes is heavily methylated in both males and females, whereas other exons are only sparsely methylated. Orthologous genes of the honeybee and Nasonia show highly similar relative levels of CpG depletion, despite ∼190 My divergence. Densely and sparsely methylated genes in these species also exhibit similar functional enrichments. We found that the degree of CpG depletion is negatively correlated with substitution rates between closely related Nasonia species for synonymous, nonsynonymous, and intron sites. This suggests that mutation rates increase with decreasing levels of germ line methylation. Thus, DNA methylation is prevalent in the Nasonia genome, may participate in regulatory processes such as sex determination and alternative splicing, and is correlated with several aspects of genome and sequence evolution. PMID:21693438

  15. NGS-based likelihood ratio for identifying contributors in two- and three-person DNA mixtures.

    PubMed

    Chan Mun Wei, Joshua; Zhao, Zicheng; Li, Shuai Cheng; Ng, Yen Kaow

    2018-06-01

    DNA fingerprinting, also known as DNA profiling, serves as a standard procedure in forensics to identify a person by the short tandem repeat (STR) loci in their DNA. By comparing the STR loci between DNA samples, practitioners can calculate a probability of match to identity the contributors of a DNA mixture. Most existing methods are based on 13 core STR loci which were identified by the Federal Bureau of Investigation (FBI). Analyses based on these loci of DNA mixture for forensic purposes are highly variable in procedures, and suffer from subjectivity as well as bias in complex mixture interpretation. With the emergence of next-generation sequencing (NGS) technologies, the sequencing of billions of DNA molecules can be parallelized, thus greatly increasing throughput and reducing the associated costs. This allows the creation of new techniques that incorporate more loci to enable complex mixture interpretation. In this paper, we propose a computation for likelihood ratio that uses NGS (next generation sequencing) data for DNA testing on mixed samples. We have applied the method to 4480 simulated DNA mixtures, which consist of various mixture proportions of 8 unrelated whole-genome sequencing data. The results confirm the feasibility of utilizing NGS data in DNA mixture interpretations. We observed an average likelihood ratio as high as 285,978 for two-person mixtures. Using our method, all 224 identity tests for two-person mixtures and three-person mixtures were correctly identified. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Evaluation of Two Highly-Multiplexed Custom Panels for Massively Parallel Semiconductor Sequencing on Paraffin DNA

    PubMed Central

    Kotoula, Vassiliki; Lyberopoulou, Aggeliki; Papadopoulou, Kyriaki; Charalambous, Elpida; Alexopoulou, Zoi; Gakou, Chryssa; Lakis, Sotiris; Tsolaki, Eleftheria; Lilakos, Konstantinos; Fountzilas, George

    2015-01-01

    Background—Aim Massively parallel sequencing (MPS) holds promise for expanding cancer translational research and diagnostics. As yet, it has been applied on paraffin DNA (FFPE) with commercially available highly multiplexed gene panels (100s of DNA targets), while custom panels of low multiplexing are used for re-sequencing. Here, we evaluated the performance of two highly multiplexed custom panels on FFPE DNA. Methods Two custom multiplex amplification panels (B, 373 amplicons; T, 286 amplicons) were coupled with semiconductor sequencing on DNA samples from FFPE breast tumors and matched peripheral blood samples (n samples: 316; n libraries: 332). The two panels shared 37% DNA targets (common or shifted amplicons). Panel performance was evaluated in paired sample groups and quartets of libraries, where possible. Results Amplicon read ratios yielded similar patterns per gene with the same panel in FFPE and blood samples; however, performance of common amplicons differed between panels (p<0.001). FFPE genotypes were compared for 1267 coding and non-coding variant replicates, 999 out of which (78.8%) were concordant in different paired sample combinations. Variant frequency was highly reproducible (Spearman’s rho 0.959). Repeatedly discordant variants were of high coverage / low frequency (p<0.001). Genotype concordance was (a) high, for intra-run duplicates with the same panel (mean±SD: 97.2±4.7, 95%CI: 94.8–99.7, p<0.001); (b) modest, when the same DNA was analyzed with different panels (mean±SD: 81.1±20.3, 95%CI: 66.1–95.1, p = 0.004); and (c) low, when different DNA samples from the same tumor were compared with the same panel (mean±SD: 59.9±24.0; 95%CI: 43.3–76.5; p = 0.282). Low coverage / low frequency variants were validated with Sanger sequencing even in samples with unfavourable DNA quality. Conclusions Custom MPS may yield novel information on genomic alterations, provided that data evaluation is adjusted to tumor tissue FFPE DNA. To this scope, eligibility of all amplicons along with variant coverage and frequency need to be assessed. PMID:26039550

  17. Effects of nucleoside analog incorporation on DNA binding to the DNA binding domain of the GATA-1 erythroid transcription factor.

    PubMed

    Foti, M; Omichinski, J G; Stahl, S; Maloney, D; West, J; Schweitzer, B I

    1999-02-05

    We investigate here the effects of the incorporation of the nucleoside analogs araC (1-beta-D-arabinofuranosylcytosine) and ganciclovir (9-[(1,3-dihydroxy-2-propoxy)methyl] guanine) into the DNA binding recognition sequence for the GATA-1 erythroid transcription factor. A 10-fold decrease in binding affinity was observed for the ganciclovir-substituted DNA complex in comparison to an unmodified DNA of the same sequence composition. AraC substitution did not result in any changes in binding affinity. 1H-15N HSQC and NOESY NMR experiments revealed a number of chemical shift changes in both DNA and protein in the ganciclovir-modified DNA-protein complex when compared to the unmodified DNA-protein complex. These changes in chemical shift and binding affinity suggest a change in the binding mode of the complex when ganciclovir is incorporated into the GATA DNA binding site.

  18. Adeno-associated virus inverted terminal repeats stimulate gene editing.

    PubMed

    Hirsch, M L

    2015-02-01

    Advancements in genome editing have relied on technologies to specifically damage DNA which, in turn, stimulates DNA repair including homologous recombination (HR). As off-target concerns complicate the therapeutic translation of site-specific DNA endonucleases, an alternative strategy to stimulate gene editing based on fragile DNA was investigated. To do this, an episomal gene-editing reporter was generated by a disruptive insertion of the adeno-associated virus (AAV) inverted terminal repeat (ITR) into the egfp gene. Compared with a non-structured DNA control sequence, the ITR induced DNA damage as evidenced by increased gamma-H2AX and Mre11 foci formation. As local DNA damage stimulates HR, ITR-mediated gene editing was investigated using DNA oligonucleotides as repair substrates. The AAV ITR stimulated gene editing >1000-fold in a replication-independent manner and was not biased by the polarity of the repair oligonucleotide. Analysis of additional human DNA sequences demonstrated stimulation of gene editing to varying degrees. In particular, inverted yet not direct, Alu repeats induced gene editing, suggesting a role for DNA structure in the repair event. Collectively, the results demonstrate that inverted DNA repeats stimulate gene editing via double-strand break repair in an episomal context and allude to efficient gene editing of the human chromosome using fragile DNA sequences.

  19. Use of wavelet-packet transforms to develop an engineering model for multifractal characterization of mutation dynamics in pathological and nonpathological gene sequences

    NASA Astrophysics Data System (ADS)

    Walker, David Lee

    1999-12-01

    This study uses dynamical analysis to examine in a quantitative fashion the information coding mechanism in DNA sequences. This exceeds the simple dichotomy of either modeling the mechanism by comparing DNA sequence walks as Fractal Brownian Motion (fbm) processes. The 2-D mappings of the DNA sequences for this research are from Iterated Function System (IFS) (Also known as the ``Chaos Game Representation'' (CGR)) mappings of the DNA sequences. This technique converts a 1-D sequence into a 2-D representation that preserves subsequence structure and provides a visual representation. The second step of this analysis involves the application of Wavelet Packet Transforms, a recently developed technique from the field of signal processing. A multi-fractal model is built by using wavelet transforms to estimate the Hurst exponent, H. The Hurst exponent is a non-parametric measurement of the dynamism of a system. This procedure is used to evaluate gene- coding events in the DNA sequence of cystic fibrosis mutations. The H exponent is calculated for various mutation sites in this gene. The results of this study indicate the presence of anti-persistent, random walks and persistent ``sub-periods'' in the sequence. This indicates the hypothesis of a multi-fractal model of DNA information encoding warrants further consideration. This work examines the model's behavior in both pathological (mutations) and non-pathological (healthy) base pair sequences of the cystic fibrosis gene. These mutations both natural and synthetic were introduced by computer manipulation of the original base pair text files. The results show that disease severity and system ``information dynamics'' correlate. These results have implications for genetic engineering as well as in mathematical biology. They suggest that there is scope for more multi-fractal models to be developed.

  20. Optimized mtDNA Control Region Primer Extension Capture Analysis for Forensically Relevant Samples and Highly Compromised mtDNA of Different Age and Origin

    PubMed Central

    Eduardoff, Mayra; Xavier, Catarina; Strobl, Christina; Casas-Vargas, Andrea; Parson, Walther

    2017-01-01

    The analysis of mitochondrial DNA (mtDNA) has proven useful in forensic genetics and ancient DNA (aDNA) studies, where specimens are often highly compromised and DNA quality and quantity are low. In forensic genetics, the mtDNA control region (CR) is commonly sequenced using established Sanger-type Sequencing (STS) protocols involving fragment sizes down to approximately 150 base pairs (bp). Recent developments include Massively Parallel Sequencing (MPS) of (multiplex) PCR-generated libraries using the same amplicon sizes. Molecular genetic studies on archaeological remains that harbor more degraded aDNA have pioneered alternative approaches to target mtDNA, such as capture hybridization and primer extension capture (PEC) methods followed by MPS. These assays target smaller mtDNA fragment sizes (down to 50 bp or less), and have proven to be substantially more successful in obtaining useful mtDNA sequences from these samples compared to electrophoretic methods. Here, we present the modification and optimization of a PEC method, earlier developed for sequencing the Neanderthal mitochondrial genome, with forensic applications in mind. Our approach was designed for a more sensitive enrichment of the mtDNA CR in a single tube assay and short laboratory turnaround times, thus complying with forensic practices. We characterized the method using sheared, high quantity mtDNA (six samples), and tested challenging forensic samples (n = 2) as well as compromised solid tissue samples (n = 15) up to 8 kyrs of age. The PEC MPS method produced reliable and plausible mtDNA haplotypes that were useful in the forensic context. It yielded plausible data in samples that did not provide results with STS and other MPS techniques. We addressed the issue of contamination by including four generations of negative controls, and discuss the results in the forensic context. We finally offer perspectives for future research to enable the validation and accreditation of the PEC MPS method for final implementation in forensic genetic laboratories. PMID:28934125

  1. The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.

    PubMed

    Shirasawa, Kenta; Isuzugawa, Kanji; Ikenaga, Mitsunobu; Saito, Yutaro; Yamamoto, Toshiya; Hirakawa, Hideki; Isobe, Sachiko

    2017-10-01

    We determined the genome sequence of sweet cherry (Prunus avium) using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. We predicted 43,349 complete and partial protein-encoding genes. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site-associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers. The genomic information helps us to identify agronomically important genes and will accelerate genetic studies and breeding programs for sweet cherries. Further information on the genomic sequences and DNA markers is available in DBcherry (http://cherry.kazusa.or.jp (8 May 2017, date last accessed)). © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  2. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    PubMed

    Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

    2012-01-01

    Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  3. A Coalescent-Based Estimator of Admixture From DNA Sequences

    PubMed Central

    Wang, Jinliang

    2006-01-01

    A variety of estimators have been developed to use genetic marker information in inferring the admixture proportions (parental contributions) of a hybrid population. The majority of these estimators used allele frequency data, ignored molecular information that is available in markers such as microsatellites and DNA sequences, and assumed that mutations are absent since the admixture event. As a result, these estimators may fail to deliver an estimate or give rather poor estimates when admixture is ancient and thus mutations are not negligible. A previous molecular estimator based its inference of admixture proportions on the average coalescent times between pairs of genes taken from within and between populations. In this article I propose an estimator that considers the entire genealogy of all of the sampled genes and infers admixture proportions from the numbers of segregating sites in DNA sequence samples. By considering the genealogy of all sequences rather than pairs of sequences, this new estimator also allows the joint estimation of other interesting parameters in the admixture model, such as admixture time, divergence time, population size, and mutation rate. Comparative analyses of simulated data indicate that the new coalescent estimator generally yields better estimates of admixture proportions than the previous molecular estimator, especially when the parental populations are not highly differentiated. It also gives reasonably accurate estimates of other admixture parameters. A human mtDNA sequence data set was analyzed to demonstrate the method, and the analysis results are discussed and compared with those from previous studies. PMID:16624918

  4. p53 Specifically Binds Triplex DNA In Vitro and in Cells

    PubMed Central

    Brázdová, Marie; Tichý, Vlastimil; Helma, Robert; Bažantová, Pavla; Polášková, Alena; Krejčí, Aneta; Petr, Marek; Navrátilová, Lucie; Tichá, Olga; Nejedlý, Karel; Bennink, Martin L.; Subramaniam, Vinod; Bábková, Zuzana; Martínek, Tomáš; Lexa, Matej; Adámik, Matej

    2016-01-01

    Triplex DNA is implicated in a wide range of biological activities, including regulation of gene expression and genomic instability leading to cancer. The tumor suppressor p53 is a central regulator of cell fate in response to different type of insults. Sequence and structure specific modes of DNA recognition are core attributes of the p53 protein. The focus of this work is the structure-specific binding of p53 to DNA containing triplex-forming sequences in vitro and in cells and the effect on p53-driven transcription. This is the first DNA binding study of full-length p53 and its deletion variants to both intermolecular and intramolecular T.A.T triplexes. We demonstrate that the interaction of p53 with intermolecular T.A.T triplex is comparable to the recognition of CTG-hairpin non-B DNA structure. Using deletion mutants we determined the C-terminal DNA binding domain of p53 to be crucial for triplex recognition. Furthermore, strong p53 recognition of intramolecular T.A.T triplexes (H-DNA), stabilized by negative superhelicity in plasmid DNA, was detected by competition and immunoprecipitation experiments, and visualized by AFM. Moreover, chromatin immunoprecipitation revealed p53 binding T.A.T forming sequence in vivo. Enhanced reporter transactivation by p53 on insertion of triplex forming sequence into plasmid with p53 consensus sequence was observed by luciferase reporter assays. In-silico scan of human regulatory regions for the simultaneous presence of both consensus sequence and T.A.T motifs identified a set of candidate p53 target genes and p53-dependent activation of several of them (ABCG5, ENOX1, INSR, MCC, NFAT5) was confirmed by RT-qPCR. Our results show that T.A.T triplex comprises a new class of p53 binding sites targeted by p53 in a DNA structure-dependent mode in vitro and in cells. The contribution of p53 DNA structure-dependent binding to the regulation of transcription is discussed. PMID:27907175

  5. Intrinsic flexibility of B-DNA: the experimental TRX scale.

    PubMed

    Heddi, Brahim; Oguey, Christophe; Lavelle, Christophe; Foloppe, Nicolas; Hartmann, Brigitte

    2010-01-01

    B-DNA flexibility, crucial for DNA-protein recognition, is sequence dependent. Free DNA in solution would in principle be the best reference state to uncover the relation between base sequences and their intrinsic flexibility; however, this has long been hampered by a lack of suitable experimental data. We investigated this relationship by compiling and analyzing a large dataset of NMR (31)P chemical shifts in solution. These measurements reflect the BI <--> BII equilibrium in DNA, intimately correlated to helicoidal descriptors of the curvature, winding and groove dimensions. Comparing the ten complementary DNA dinucleotide steps indicates that some steps are much more flexible than others. This malleability is primarily controlled at the dinucleotide level, modulated by the tetranucleotide environment. Our analyses provide an experimental scale called TRX that quantifies the intrinsic flexibility of the ten dinucleotide steps in terms of Twist, Roll, and X-disp (base pair displacement). Applying the TRX scale to DNA sequences optimized for nucleosome formation reveals a 10 base-pair periodic alternation of stiff and flexible regions. Thus, DNA flexibility captured by the TRX scale is relevant to nucleosome formation, suggesting that this scale may be of general interest to better understand protein-DNA recognition.

  6. Homology between DNA polymerases of poxviruses, herpesviruses, and adenoviruses: nucleotide sequence of the vaccinia virus DNA polymerase gene.

    PubMed Central

    Earl, P L; Jones, E V; Moss, B

    1986-01-01

    A 5400-base-pair segment of the vaccinia virus genome was sequenced and an open reading frame of 938 codons was found precisely where the DNA polymerase had been mapped by transfer of a phosphonoacetate-resistance marker. A single nucleotide substitution changing glycine at position 347 to aspartic acid accounts for the drug resistance of the mutant vaccinia virus. The 5' end of the DNA polymerase mRNA was located 80 base pairs before the methionine codon initiating the open reading frame. Correspondence between the predicted Mr 108,577 polypeptide and the 110,000 purified enzyme indicates that little or no proteolytic processing occurs. Extensive homology, extending over 435 amino acids, was found upon comparing the DNA polymerase of vaccinia virus and DNA polymerase of Epstein-Barr virus. A highly conserved sequence of 14 amino acids in the carboxyl-terminal regions of the above DNA polymerases is also present at a similar location in adenovirus DNA polymerase. This structure, which is predicted to form a turn flanked by beta-pleated sheets, may form part of an essential binding or catalytic site that accounts for its presence in DNA polymerases of poxviruses, herpesviruses, and adenoviruses. Images PMID:3012524

  7. Effect of Noise on DNA Sequencing via Transverse Electronic Transport

    PubMed Central

    Krems, Matt; Zwolak, Michael; Pershin, Yuriy V.; Di Ventra, Massimiliano

    2009-01-01

    Abstract Previous theoretical studies have shown that measuring the transverse current across DNA strands while they translocate through a nanopore or channel may provide a statistically distinguishable signature of the DNA bases, and may thus allow for rapid DNA sequencing. However, fluctuations of the environment, such as ionic and DNA motion, introduce important scattering processes that may affect the viability of this approach to sequencing. To understand this issue, we have analyzed a simple model that captures the role of this complex environment in electronic dephasing and its ability to remove charge carriers from current-carrying states. We find that these effects do not strongly influence the current distributions due to the off-resonant nature of tunneling through the nucleotides—a result we expect to be a common feature of transport in molecular junctions. In particular, only large scattering strengths, as compared to the energetic gap between the molecular states and the Fermi level, significantly alter the form of the current distributions. Since this gap itself is quite large, the current distributions remain protected from this type of noise, further supporting the possibility of using transverse electronic transport measurements for DNA sequencing. PMID:19804730

  8. Nonneutral mitochondrial DNA variation in humans and chimpanzees

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nachman, M.W.; Aquadro, C.F.; Brown, W.M.

    1996-03-01

    We sequenced the NADH dehydrogenase subunit 3 (ND3) gene from a sample of 61 humans, five common chimpanzees, and one gorilla to test whether patterns of mitochondrial DNA (mtDNA) variation are consistent with a neutral model of molecular evolution. Within humans and within chimpanzees, the ratio of replacement to silent nucleotide substitutions was higher than observed in comparisons between species, contrary to neutral expectations. To test the generality of this result, we reanalyzed published human RFLP data from the entire mitochondrial genome. Gains of restriction sites relative to a known human mtDNA sequence were used to infer unambiguous nucleotide substitutions.more » We also compared the complete mtDNA sequences of three humans. Both the RFLP data and the sequence data reveal a higher ratio of replacement to silent nucleotide substitutions within humans than is seen between species. This pattern is observed at most or all human mitochondrial genes and is inconsistent with a strictly neutral model. These data suggest that many mitochondrial protein polymorphisms are slightly deleterious, consistent with studies of human mitochondrial diseases. 59 refs., 2 figs., 8 tabs.« less

  9. Molecular coevolution of mammalian ribosomal gene terminator sequences and the transcription termination factor TTF-I.

    PubMed Central

    Evers, R; Grummt, I

    1995-01-01

    Both the DNA elements and the nuclear factors that direct termination of ribosomal gene transcription exhibit species-specific differences. Even between mammals--e.g., human and mouse--the termination signals are not identical and the respective transcription termination factors (TTFs) which bind to the terminator sequence are not fully interchangeable. To elucidate the molecular basis for this species-specificity, we have cloned TTF-I from human and mouse cells and compared their structural and functional properties. Recombinant TTF-I exhibits species-specific DNA binding and terminates transcription both in cell-free transcription assays and in transfection experiments. Chimeric constructs of mouse TTF-I and human TTF-I reveal that the major determinant for species-specific DNA binding resides within the C terminus of TTF-I. Replacing 31 C-terminal amino acids of mouse TTF-I with the homologous human sequences relaxes the DNA-binding specificity and, as a consequence, allows the chimeric factor to bind the human terminator sequence and to specifically stop rDNA transcription. Images Fig. 2 Fig. 3 Fig. 4 PMID:7597036

  10. Functionalized gold nanoparticles as additive to form polymer/metal composite matrix for improved DNA sequencing by capillary electrophoresis.

    PubMed

    Zhou, Dan; Yang, Liping; Yang, Runmiao; Song, Weihua; Peng, Shuhua; Wang, Yanmei

    2009-11-15

    A new matrix additive, poly (N,N-dimethylacrylamide)-functionalized gold nanoparticle (GNP-PDMA), was prepared by "grafting-to" approach, and then incorporated into quasi-interpenetrating network (quasi-IPN) composed of linear polyacrylamide (LPA, 3.3 MDa) and PDMA to form novel polymer/metal composite sieving matrix (quasi-IPN/GNP-PDMA) for DNA sequencing by capillary electrophoresis. Without complete optimization, quasi-IPN/GNP-PDMA yielded a readlength of 801 bases at 98% accuracy in about 64 min by using the ABI 310 Genetic Analyzer at 50 degrees C and 150 V/cm. Compared with previous quasi-IPN/GNPs, quasi-IPN/GNP-PDMA can further improve DNA sequencing performances. This is because the presence of GNP-PDMA can improve the compatibility of GNPs with the whole sequencing system, enhance the entanglement degree of networks, and increase the GNP concentration in system, which consequently lead to higher restriction and stability, higher apparent molecular weight (MW), and smaller pore size of the total sieving networks. Furthermore, the composite matrix was also compared with quasi-IPN containing higher-MW LPA and commercial POP-6. The results indicate that the composite matrix is a promising one for DNA sequencing to achieve full automation due to the separation provided with high resolution, speediness, excellent reproducibility, and easy loading in the presence of GNP-PDMA.

  11. A Simple Method for the Extraction, PCR-amplification, Cloning, and Sequencing of Pasteuria 16S rDNA from Small Numbers of Endospores.

    PubMed

    Atibalentja, N; Noel, G R; Ciancio, A

    2004-03-01

    For many years the taxonomy of the genus Pasteuria has been marred with confusion because the bacterium could not be cultured in vitro and, therefore, descriptions were based solely on morphological, developmental, and pathological characteristics. The current study sought to devise a simple method for PCR-amplification, cloning, and sequencing of Pasteuria 16S rDNA from small numbers of endospores, with no need for prior DNA purification. Results show that DNA extracts from plain glass bead-beating of crude suspensions containing 10,000 endospores at 0.2 x 10 endospores ml(-1) were sufficient for PCR-amplification of Pasteuria 16S rDNA, when used in conjunction with specific primers. These results imply that for P. penetrans and P. nishizawae only one parasitized female of Meloidogyne spp. and Heterodera glycines, respectively, should be sufficient, and as few as eight cadavers of Belonolaimus longicaudatus with an average number of 1,250 endospores of "Candidatus Pasteuria usgae" are needed for PCR-amplification of Pasteuria 16S rDNA. The method described in this paper should facilitate the sequencing of the 16S rDNA of the many Pasteuria isolates that have been reported on nematodes and, consequently, expedite the classification of those isolates through comparative sequence analysis.

  12. Engineering of a DNA Polymerase for Direct m6 A Sequencing.

    PubMed

    Aschenbrenner, Joos; Werner, Stephan; Marchand, Virginie; Adam, Martina; Motorin, Yuri; Helm, Mark; Marx, Andreas

    2018-01-08

    Methods for the detection of RNA modifications are of fundamental importance for advancing epitranscriptomics. N 6 -methyladenosine (m 6 A) is the most abundant RNA modification in mammalian mRNA and is involved in the regulation of gene expression. Current detection techniques are laborious and rely on antibody-based enrichment of m 6 A-containing RNA prior to sequencing, since m 6 A modifications are generally "erased" during reverse transcription (RT). To overcome the drawbacks associated with indirect detection, we aimed to generate novel DNA polymerase variants for direct m 6 A sequencing. Therefore, we developed a screen to evolve an RT-active KlenTaq DNA polymerase variant that sets a mark for N 6 -methylation. We identified a mutant that exhibits increased misincorporation opposite m 6 A compared to unmodified A. Application of the generated DNA polymerase in next-generation sequencing allowed the identification of m 6 A sites directly from the sequencing data of untreated RNA samples. © 2017 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.

  13. High-quality mtDNA control region sequences from 680 individuals sampled across the Netherlands to establish a national forensic mtDNA reference database.

    PubMed

    Chaitanya, Lakshmi; van Oven, Mannis; Brauer, Silke; Zimmermann, Bettina; Huber, Gabriela; Xavier, Catarina; Parson, Walther; de Knijff, Peter; Kayser, Manfred

    2016-03-01

    The use of mitochondrial DNA (mtDNA) for maternal lineage identification often marks the last resort when investigating forensic and missing-person cases involving highly degraded biological materials. As with all comparative DNA testing, a match between evidence and reference sample requires a statistical interpretation, for which high-quality mtDNA population frequency data are crucial. Here, we determined, under high quality standards, the complete mtDNA control-region sequences of 680 individuals from across the Netherlands sampled at 54 sites, covering the entire country with 10 geographic sub-regions. The complete mtDNA control region (nucleotide positions 16,024-16,569 and 1-576) was amplified with two PCR primers and sequenced with ten different sequencing primers using the EMPOP protocol. Haplotype diversity of the entire sample set was very high at 99.63% and, accordingly, the random-match probability was 0.37%. No population substructure within the Netherlands was detected with our dataset. Phylogenetic analyses were performed to determine mtDNA haplogroups. Inclusion of these high-quality data in the EMPOP database (accession number: EMP00666) will improve its overall data content and geographic coverage in the interest of all EMPOP users worldwide. Moreover, this dataset will serve as (the start of) a national reference database for mtDNA applications in forensic and missing person casework in the Netherlands. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  14. Profiling DNA methylome landscapes of mammalian cells with single-cell reduced-representation bisulfite sequencing.

    PubMed

    Guo, Hongshan; Zhu, Ping; Guo, Fan; Li, Xianlong; Wu, Xinglong; Fan, Xiaoying; Wen, Lu; Tang, Fuchou

    2015-05-01

    The heterogeneity of DNA methylation within a population of cells necessitates DNA methylome profiling at single-cell resolution. Recently, we developed a single-cell reduced-representation bisulfite sequencing (scRRBS) technique in which we modified the original RRBS method by integrating all the experimental steps before PCR amplification into a single-tube reaction. These modifications enable scRRBS to provide digitized methylation information on ∼1 million CpG sites within an individual diploid mouse or human cell at single-base resolution. Compared with the single-cell bisulfite sequencing (scBS) technique, scRRBS covers fewer CpG sites, but it provides better coverage for CpG islands (CGIs), which are likely to be the most informative elements for DNA methylation. The entire procedure takes ∼3 weeks, and it requires strong molecular biology skills.

  15. Scar-less multi-part DNA assembly design automation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hillson, Nathan J.

    The present invention provides a method of a method of designing an implementation of a DNA assembly. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which to assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding flanking homology sequences to each of the DNA oligos. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which tomore » assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding optimized overhang sequences to each of the DNA oligos.« less

  16. The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing.

    PubMed

    Kennedy, Nicholas A; Walker, Alan W; Berry, Susan H; Duncan, Sylvia H; Farquarson, Freda M; Louis, Petra; Thomson, John M; Satsangi, Jack; Flint, Harry J; Parkhill, Julian; Lees, Charlie W; Hold, Georgina L

    2014-01-01

    Determining bacterial community structure in fecal samples through DNA sequencing is an important facet of intestinal health research. The impact of different commercially available DNA extraction kits upon bacterial community structures has received relatively little attention. The aim of this study was to analyze bacterial communities in volunteer and inflammatory bowel disease (IBD) patient fecal samples extracted using widely used DNA extraction kits in established gastrointestinal research laboratories. Fecal samples from two healthy volunteers (H3 and H4) and two relapsing IBD patients (I1 and I2) were investigated. DNA extraction was undertaken using MoBio Powersoil and MP Biomedicals FastDNA SPIN Kit for Soil DNA extraction kits. PCR amplification for pyrosequencing of bacterial 16S rRNA genes was performed in both laboratories on all samples. Hierarchical clustering of sequencing data was done using the Yue and Clayton similarity coefficient. DNA extracted using the FastDNA kit and the MoBio kit gave median DNA concentrations of 475 (interquartile range 228-561) and 22 (IQR 9-36) ng/µL respectively (p<0.0001). Hierarchical clustering of sequence data by Yue and Clayton coefficient revealed four clusters. Samples from individuals H3 and I2 clustered by patient; however, samples from patient I1 extracted with the MoBio kit clustered with samples from patient H4 rather than the other I1 samples. Linear modelling on relative abundance of common bacterial families revealed significant differences between kits; samples extracted with MoBio Powersoil showed significantly increased Bacteroidaceae, Ruminococcaceae and Porphyromonadaceae, and lower Enterobacteriaceae, Lachnospiraceae, Clostridiaceae, and Erysipelotrichaceae (p<0.05). This study demonstrates significant differences in DNA yield and bacterial DNA composition when comparing DNA extracted from the same fecal sample with different extraction kits. This highlights the importance of ensuring that samples in a study are prepared with the same method, and the need for caution when cross-comparing studies that use different methods.

  17. Performance evaluation of a mitogenome capture and Illumina sequencing protocol using non-probative, case-type skeletal samples: Implications for the use of a positive control in a next-generation sequencing procedure.

    PubMed

    Marshall, Charla; Sturk-Andreaggi, Kimberly; Daniels-Higginbotham, Jennifer; Oliver, Robert Sean; Barritt-Ross, Suzanne; McMahon, Timothy P

    2017-11-01

    Next-generation ancient DNA technologies have the potential to assist in the analysis of degraded DNA extracted from forensic specimens. Mitochondrial genome (mitogenome) sequencing, specifically, may be of benefit to samples that fail to yield forensically relevant genetic information using conventional PCR-based techniques. This report summarizes the Armed Forces Medical Examiner System's Armed Forces DNA Identification Laboratory's (AFMES-AFDIL) performance evaluation of a Next-Generation Sequencing protocol for degraded and chemically treated past accounting samples. The procedure involves hybridization capture for targeted enrichment of mitochondrial DNA, massively parallel sequencing using Illumina chemistry, and an automated bioinformatic pipeline for forensic mtDNA profile generation. A total of 22 non-probative samples and associated controls were processed in the present study, spanning a range of DNA quantity and quality. Data were generated from over 100 DNA libraries by ten DNA analysts over the course of five months. The results show that the mitogenome sequencing procedure is reliable and robust, sensitive to low template (one ng control DNA) as well as degraded DNA, and specific to the analysis of the human mitogenome. Haplotypes were overall concordant between NGS replicates and with previously generated Sanger control region data. Due to the inherent risk for contamination when working with low-template, degraded DNA, a contamination assessment was performed. The consumables were shown to be void of human DNA contaminants and suitable for forensic use. Reagent blanks and negative controls were analyzed to determine the background signal of the procedure. This background signal was then used to set analytical and reporting thresholds, which were designated at 4.0X (limit of detection) and 10.0X (limit of quantiation) average coverage across the mitogenome, respectively. Nearly all human samples exceeded the reporting threshold, although coverage was reduced in chemically treated samples resulting in a ∼58% passing rate for these poor-quality samples. A concordance assessment demonstrated the reliability of the NGS data when compared to known Sanger profiles. One case sample was shown to be mixed with a co-processed sample and two reagent blanks indicated the presence of DNA above the analytical threshold. This contamination was attributed to sequencing crosstalk from simultaneously sequenced high-quality samples to include the positive control. Overall this study demonstrated that hybridization capture and Illumina sequencing provide a viable method for mitogenome sequencing of degraded and chemically treated skeletal DNA samples, yet may require alternative measures of quality control. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  18. Biofilm-Growing Bacteria Involved in the Corrosion of Concrete Wastewater Pipes: Protocols for Comparative Metagenomic Analyses

    EPA Science Inventory

    Advances in high-throughput next-generation sequencing (NGS) technology for direct sequencing of environmental DNA (i.e. shotgun metagenomics) is transforming the field of microbiology. NGS technologies are now regularly being applied in comparative metagenomic studies, which pr...

  19. Entire plastid phylogeny of the carrot genus (Daucus, Apiaceae): Concordance with nuclear data and mitochondrial and nuclear DNA insertions to the plastid.

    PubMed

    Spooner, David M; Ruess, Holly; Iorizzo, Massimo; Senalik, Douglas; Simon, Philipp

    2017-02-01

    We explored the phylogenetic utility of entire plastid DNA sequences in Daucus and compared the results with prior phylogenetic results using plastid and nuclear DNA sequences. We used Illumina sequencing to obtain full plastid sequences of 37 accessions of 20 Daucus taxa and outgroups, analyzed the data with phylogenetic methods, and examined evidence for mitochondrial DNA transfer to the plastid ( Dc MP). Our phylogenetic trees of the entire data set were highly resolved, with 100% bootstrap support for most of the external and many of the internal clades, except for the clade of D. carota and its most closely related species D. syrticus . Subsets of the data, including regions traditionally used as phylogenetically informative regions, provide various degrees of soft congruence with the entire data set. There are areas of hard incongruence, however, with phylogenies using nuclear data. We extended knowledge of a mitochondrial to plastid DNA insertion sequence previously named Dc MP and identified the first instance in flowering plants of a sequence of potential nuclear genome origin inserted into the plastid genome. There is a relationship of inverted repeat junction classes and repeat DNA to phylogeny, but no such relationship with nonsynonymous mutations. Our data have allowed us to (1) produce a well-resolved plastid phylogeny of Daucus , (2) evaluate subsets of the entire plastid data for phylogeny, (3) examine evidence for plastid and nuclear DNA phylogenetic incongruence, and (4) examine mitochondrial and nuclear DNA insertion into the plastid. © 2017 Spooner et al. Published by the Botanical Society of America. This work is licensed under a Creative Commons public domain license (CC0 1.0).

  20. DNA sequence database as a tool to identify decapod crustaceans on the São Paulo coastline.

    PubMed

    Mantelatto, Fernando L; Terossi, Mariana; Negri, Mariana; Buranelli, Raquel C; Robles, Rafael; Magalhães, Tatiana; Tamburus, Ana Francisca; Rossi, Natália; Miyazaki, Mayara J

    2017-09-05

    DNA barcoding has emerged as an efficient tool for taxonomy and other biodiversity fields. The vast and speciose group of decapod crustaceans is not an exception in the current scenario and comparing short DNA fragments has enabled researchers to overcome some taxonomic impediments to help broadening knowledge on the diversity of this group of crustaceans. Brazil is considered as an important area in terms of global marine biodiversity and some regions stand out in terms of decapod fauna, such as the São Paulo coastline. Thus, the aim of this study is to obtain sequences of the mitochondrial markers (COI and 16S) for decapod crustaceans distributed at the São Paulo coastline and to test the accuracy of these markers for species identification from this region by comparing our sequences to those already present in the GenBank database. We sampled along almost the 300 km of the São Paulo coastline from estuaries to offshore islands during the development of a multidisciplinary research project that took place for 5 years. All the species were processed to obtain the DNA sequences. The diversity of the decapod fauna on the São Paulo coastline comprises at least 404 species. We were able to collect 256 of those species and sequence of at least one of the target genes from 221. By testing the accuracy of these two DNA markers as a tool for identification, we were able to check our own identifications, including new records in GenBank, spot potential mistakes in GenBank, and detect potential new species.

  1. Cell transformation mediated by chromosomal deoxyribonucleic acid of polyoma virus-transformed cells.

    PubMed Central

    Della Valle, G; Fenton, R G; Basilico, C

    1981-01-01

    To study the mechanism of deoxyribonucleic acid (DNA)-mediated gene transfer, normal rat cells were transfected with total cellular DNA extracted from polyoma virus-transformed cells. This resulted in the appearance of the transformed phenotype in 1 X 10(-6) to 3 X 10(-6) of the transfected cells. Transformation was invariably associated with the acquisition of integrated viral DNA sequences characteristic of the donor DNA. This was caused not by the integration of free DNA molecules, but by the transfer of large DNA fragments (10 to 20 kilobases) containing linked cellular and viral sequences. Although Southern blot analysis showed that integration did not appear to occur in a homologous region of the recipient chromosome, the frequency of transformation was rather high when compared with that of purified polyoma DNA, perhaps due to "position" effects or to the high efficiency of recombination of large DNA fragments. Images PMID:6100965

  2. Characterization of the Complete Mitochondrial Genome Sequence of Spirometra erinaceieuropaei (Cestoda: Diphyllobothriidae) from China

    PubMed Central

    Liu, Guo-Hua; Li, Chun; Li, Jia-Yuan; Zhou, Dong-Hui; Xiong, Rong-Chuan; Lin, Rui-Qing; Zou, Feng-Cai; Zhu, Xing-Quan

    2012-01-01

    Sparganosis, caused by the plerocercoid larvae of members of the genus Spirometra, can cause significant public health problem and considerable economic losses. In the present study, the complete mitochondrial DNA (mtDNA) sequence of Spirometra erinaceieuropaei from China was determined, characterized and compared with that of S. erinaceieuropaei from Japan. The gene arrangement in the mt genome sequences of S. erinaceieuropaei from China and Japan is identical. The identity of the mt genomes was 99.1% between S. erinaceieuropaei from China and Japan, and the complete mtDNA sequence of S. erinaceieuropaei from China is slightly shorter (2 bp) than that from Japan. Phylogenetic analysis of S. erinaceieuropaei with other representative cestodes using two different computational algorithms [Bayesian inference (BI) and maximum likelihood (ML)] based on concatenated amino acid sequences of 12 protein-coding genes, revealed that S. erinaceieuropaei is closely related to Diphyllobothrium spp., supporting classification based on morphological features. The present study determined the complete mtDNA sequences of S. erinaceieuropaei from China that provides novel genetic markers for studying the population genetics and molecular epidemiology of S. erinaceieuropaei in humans and animals. PMID:22553464

  3. High quality methylome-wide investigations through next-generation sequencing of DNA from a single archived dry blood spot

    PubMed Central

    Aberg, Karolina A.; Xie, Lin Y.; Nerella, Srilaxmi; Copeland, William E.; Costello, E. Jane; van den Oord, Edwin J.C.G.

    2013-01-01

    The potential importance of DNA methylation in the etiology of complex diseases has led to interest in the development of methylome-wide association studies (MWAS) aimed at interrogating all methylation sites in the human genome. When using blood as biomaterial for a MWAS the DNA is typically extracted directly from fresh or frozen whole blood that was collected via venous puncture. However, DNA extracted from dry blood spots may also be an alternative starting material. In the present study, we apply a methyl-CpG binding domain (MBD) protein enrichment-based technique in combination with next generation sequencing (MBD-seq) to assess the methylation status of the ~27 million CpGs in the human autosomal reference genome. We investigate eight methylomes using DNA from blood spots. This data are compared with 1,500 methylomes previously assayed with the same MBD-seq approach using DNA from whole blood. When investigating the sequence quality and the enrichment profile across biological features, we find that DNA extracted from blood spots gives comparable results with DNA extracted from whole blood. Only if the amount of starting material is ≤ 0.5µg DNA we observe a slight decrease in the assay performance. In conclusion, we show that high quality methylome-wide investigations using MBD-seq can be conducted in DNA extracted from archived dry blood spots without sacrificing quality and without bias in enrichment profile as long as the amount of starting material is sufficient. In general, the amount of DNA extracted from a single blood spot is sufficient for methylome-wide investigations with the MBD-seq approach. PMID:23644822

  4. High quality methylome-wide investigations through next-generation sequencing of DNA from a single archived dry blood spot.

    PubMed

    Aberg, Karolina A; Xie, Lin Y; Nerella, Srilaxmi; Copeland, William E; Costello, E Jane; van den Oord, Edwin J C G

    2013-05-01

    The potential importance of DNA methylation in the etiology of complex diseases has led to interest in the development of methylome-wide association studies (MWAS) aimed at interrogating all methylation sites in the human genome. When using blood as biomaterial for a MWAS the DNA is typically extracted directly from fresh or frozen whole blood that was collected via venous puncture. However, DNA extracted from dry blood spots may also be an alternative starting material. In the present study, we apply a methyl-CpG binding domain (MBD) protein enrichment-based technique in combination with next generation sequencing (MBD-seq) to assess the methylation status of the ~27 million CpGs in the human autosomal reference genome. We investigate eight methylomes using DNA from blood spots. This data are compared with 1,500 methylomes previously assayed with the same MBD-seq approach using DNA from whole blood. When investigating the sequence quality and the enrichment profile across biological features, we find that DNA extracted from blood spots gives comparable results with DNA extracted from whole blood. Only if the amount of starting material is ≤ 0.5µg DNA we observe a slight decrease in the assay performance. In conclusion, we show that high quality methylome-wide investigations using MBD-seq can be conducted in DNA extracted from archived dry blood spots without sacrificing quality and without bias in enrichment profile as long as the amount of starting material is sufficient. In general, the amount of DNA extracted from a single blood spot is sufficient for methylome-wide investigations with the MBD-seq approach.

  5. Genome Sequences of Populus tremula Chloroplast and Mitochondrion: Implications for Holistic Poplar Breeding

    PubMed Central

    Mader, Malte; Le Paslier, Marie-Christine; Bounon, Rémi; Berard, Aurélie; Vettori, Cristina; Schroeder, Hilke; Leplé, Jean-Charles; Fladung, Matthias

    2016-01-01

    Complete Populus genome sequences are available for the nucleus (P. trichocarpa; section Tacamahaca) and for chloroplasts (seven species), but not for mitochondria. Here, we provide the complete genome sequences of the chloroplast and the mitochondrion for the clones P. tremula W52 and P. tremula x P. alba 717-1B4 (section Populus). The organization of the chloroplast genomes of both Populus clones is described. A phylogenetic tree constructed from all available complete chloroplast DNA sequences of Populus was not congruent with the assignment of the related species to different Populus sections. In total, 3,024 variable nucleotide positions were identified among all compared Populus chloroplast DNA sequences. The 5-prime part of the LSC from trnH to atpA showed the highest frequency of variations. The variable positions included 163 positions with SNPs allowing for differentiating the two clones with P. tremula chloroplast genomes (W52, 717-1B4) from the other seven Populus individuals. These potential P. tremula-specific SNPs were displayed as a whole-plastome barcode on the P. tremula W52 chloroplast DNA sequence. Three of these SNPs and one InDel in the trnH-psbA linker were successfully validated by Sanger sequencing in an extended set of Populus individuals. The complete mitochondrial genome sequence of P. tremula is the first in the family of Salicaceae. The mitochondrial genomes of the two clones are 783,442 bp (W52) and 783,513 bp (717-1B4) in size, structurally very similar and organized as single circles. DNA sequence regions with high similarity to the W52 chloroplast sequence account for about 2% of the W52 mitochondrial genome. The mean SNP frequency was found to be nearly six fold higher in the chloroplast than in the mitochondrial genome when comparing 717-1B4 with W52. The availability of the genomic information of all three DNA-containing cell organelles will allow a holistic approach in poplar molecular breeding in the future. PMID:26800039

  6. Widespread Transient Hoogsteen Base-Pairs in Canonical Duplex DNA with Variable Energetics

    PubMed Central

    Alvey, Heidi S.; Gottardo, Federico L.; Nikolova, Evgenia N.; Al-Hashimi, Hashim M.

    2015-01-01

    Hoogsteen base-pairing involves a 180 degree rotation of the purine base relative to Watson-Crick base-pairing within DNA duplexes, creating alternative DNA conformations that can play roles in recognition, damage induction, and replication. Here, using Nuclear Magnetic Resonance R1ρ relaxation dispersion, we show that transient Hoogsteen base-pairs occur across more diverse sequence and positional contexts than previously anticipated. We observe sequence-specific variations in Hoogsteen base-pair energetic stabilities that are comparable to variations in Watson-Crick base-pair stability, with Hoogsteen base-pairs being more abundant for energetically less favorable Watson-Crick base-pairs. Our results suggest that the variations in Hoogsteen stabilities and rates of formation are dominated by variations in Watson-Crick base pair stability, suggesting a late transition state for the Watson-Crick to Hoogsteen conformational switch. The occurrence of sequence and position-dependent Hoogsteen base-pairs provide a new potential mechanism for achieving sequence-dependent DNA transactions. PMID:25185517

  7. Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data

    PubMed Central

    Flickinger, Matthew; Jun, Goo; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min

    2015-01-01

    DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%–20%), contamination-adjusted calls eliminate 48%–77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%. PMID:26235984

  8. In silico characterization and analysis of RTBP1 and NgTRF1 protein through MD simulation and molecular docking - A comparative study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-02-06

    Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  9. In Silico Characterization and Analysis of RTBP1 and NgTRF1 Protein Through MD Simulation and Molecular Docking: A Comparative Study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-09-01

    Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  10. Synthesis and DNA interaction of a mixed proflavine-phenanthroline Tröger base.

    PubMed

    Baldeyrou, Brigitte; Tardy, Christelle; Bailly, Christian; Colson, Pierre; Houssier, Claude; Charmantray, Franck; Demeunynck, Martine

    2002-04-01

    We report the synthesis of an asymmetric Tröger base containing the two well characterised DNA binding chromophores, proflavine and phenanthroline. The mode of interaction of the hybrid molecule was investigated by circular and linear dichroism experiments and a biochemical assay using DNA topoisomerase I. The data are compatible with a model in which the proflavine moiety intercalates between DNA base pairs and the phenanthroline ring occupies the DNA groove. DNase I cleavage experiments were carried out to investigate the sequence preference of the hybrid ligand and a well resolved footprint was detected at a site encompassing two adjacent 5'-GTC.5-GAC triplets. The sequence preference of the asymmetric molecule is compared to that of the symmetric analogues.

  11. High-throughput sequencing of three Lemnoideae (duckweeds) chloroplast genomes from total DNA.

    PubMed

    Wang, Wenqin; Messing, Joachim

    2011-01-01

    Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  12. High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA

    PubMed Central

    Wang, Wenqin; Messing, Joachim

    2011-01-01

    Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804

  13. In vitro gene expression by cationized derivatives of an artificial protein with repeated RGD sequences, Pronectin.

    PubMed

    Hosseinkhani, Hossein; Tabata, Yasuhiko

    2003-01-09

    The objective of this study is to investigate the efficiency of a non-viral gene carrier with RGD sequences, Pronectin F(+) for gene transfection. The Pronectin F(+) was cationized by introducing ethylenediamine (Ed), spermidine (Sd), and spermine (Sm) to the hydroxyl groups while the corresponding gelatin derivative was prepared similarly because gelatin also has one RGD sequence per molecule. The zeta potential and molecular size of Pronectin F(+) and gelatin derivatives were examined before and after polyion complexation with a plasmid DNA of luciferase. When complexed with the plasmid DNA at the Pronectin F(+)/plasmid DNA mixing ratio of 50, the complex exhibited a zeta potential of about 10 mV, which is similar to that of the gelatin derivative-plasmid DNA complex. Irrespective of the type of Pronectin F(+) and gelatin derivatives, their complexation enabled the apparent molecular size of plasmid DNA to reduce to about 200 nm, the size decreasing with the increased derivative/plasmid DNA weight mixing ratio. The rat gastric mucosal (RGM)-1 cells treated with both complexes exhibited significantly stronger luciferase activities than free plasmid DNA although the enhanced extent was significant for the Sm derivative compared with the corresponding Ed and Sd derivatives. Cell attachment was enhanced by the Pronectin F(+) derivative to a significant high extent compared with the gelatin derivative. The amount of plasmid DNA internalized into the cells was enhanced by the complexation with every Pronectin F(+) derivative compared with the gelatin derivative. For both of Pronectin F(+) and gelatin carriers, the buffering capacity of Sm derivatives was higher than that of Ed and Sd derivatives and comparable to that of polyethyleneimine. It is likely that the high efficiency of gene transfection for the Sm derivative is due to the superior buffering effect. We conclude that the Sm derivative of Pronectin F(+) is promising as a non-viral vector of gene transfection.

  14. A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties.

    PubMed

    Pan, Gaofeng; Jiang, Limin; Tang, Jijun; Guo, Fei

    2018-02-08

    DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k -gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.

  15. The impact of targeting repetitive BamHI-W sequences on the sensitivity and precision of EBV DNA quantification.

    PubMed

    Sanosyan, Armen; Fayd'herbe de Maudave, Alexis; Bollore, Karine; Zimmermann, Valérie; Foulongne, Vincent; Van de Perre, Philippe; Tuaillon, Edouard

    2017-01-01

    Viral load monitoring and early Epstein-Barr virus (EBV) DNA detection are essential in routine laboratory testing, especially in preemptive management of Post-transplant Lymphoproliferative Disorder. Targeting the repetitive BamHI-W sequence was shown to increase the sensitivity of EBV DNA quantification, but the variability of BamHI-W reiterations was suggested to be a source of quantification bias. We aimed to assess the extent of variability associated with BamHI-W PCR and its impact on the sensitivity of EBV DNA quantification using the 1st WHO international standard, EBV strains and clinical samples. Repetitive BamHI-W- and LMP2 single- sequences were amplified by in-house qPCRs and BXLF-1 sequence by a commercial assay (EBV R-gene™, BioMerieux). Linearity and limits of detection of in-house methods were assessed. The impact of repeated versus single target sequences on EBV DNA quantification precision was tested on B95.8 and Raji cell lines, possessing 11 and 7 copies of the BamHI-W sequence, respectively, and on clinical samples. BamHI-W qPCR demonstrated a lower limit of detection compared to LMP2 qPCR (2.33 log10 versus 3.08 log10 IU/mL; P = 0.0002). BamHI-W qPCR underestimated the EBV DNA load on Raji strain which contained fewer BamHI-W copies than the WHO standard derived from the B95.8 EBV strain (mean bias: - 0.21 log10; 95% CI, -0.54 to 0.12). Comparison of BamHI-W qPCR versus LMP2 and BXLF-1 qPCR showed an acceptable variability between EBV DNA levels in clinical samples with the mean bias being within 0.5 log10 IU/mL EBV DNA, whereas a better quantitative concordance was observed between LMP2 and BXLF-1 assays. Targeting BamHI-W resulted to a higher sensitivity compared to LMP2 but the variable reiterations of BamHI-W segment are associated with higher quantification variability. BamHI-W can be considered for clinical and therapeutic monitoring to detect an early EBV DNA and a dynamic change in viral load.

  16. The impact of targeting repetitive BamHI-W sequences on the sensitivity and precision of EBV DNA quantification

    PubMed Central

    Fayd’herbe de Maudave, Alexis; Bollore, Karine; Zimmermann, Valérie; Foulongne, Vincent; Van de Perre, Philippe; Tuaillon, Edouard

    2017-01-01

    Background Viral load monitoring and early Epstein-Barr virus (EBV) DNA detection are essential in routine laboratory testing, especially in preemptive management of Post-transplant Lymphoproliferative Disorder. Targeting the repetitive BamHI-W sequence was shown to increase the sensitivity of EBV DNA quantification, but the variability of BamHI-W reiterations was suggested to be a source of quantification bias. We aimed to assess the extent of variability associated with BamHI-W PCR and its impact on the sensitivity of EBV DNA quantification using the 1st WHO international standard, EBV strains and clinical samples. Methods Repetitive BamHI-W- and LMP2 single- sequences were amplified by in-house qPCRs and BXLF-1 sequence by a commercial assay (EBV R-gene™, BioMerieux). Linearity and limits of detection of in-house methods were assessed. The impact of repeated versus single target sequences on EBV DNA quantification precision was tested on B95.8 and Raji cell lines, possessing 11 and 7 copies of the BamHI-W sequence, respectively, and on clinical samples. Results BamHI-W qPCR demonstrated a lower limit of detection compared to LMP2 qPCR (2.33 log10 versus 3.08 log10 IU/mL; P = 0.0002). BamHI-W qPCR underestimated the EBV DNA load on Raji strain which contained fewer BamHI-W copies than the WHO standard derived from the B95.8 EBV strain (mean bias: - 0.21 log10; 95% CI, -0.54 to 0.12). Comparison of BamHI-W qPCR versus LMP2 and BXLF-1 qPCR showed an acceptable variability between EBV DNA levels in clinical samples with the mean bias being within 0.5 log10 IU/mL EBV DNA, whereas a better quantitative concordance was observed between LMP2 and BXLF-1 assays. Conclusions Targeting BamHI-W resulted to a higher sensitivity compared to LMP2 but the variable reiterations of BamHI-W segment are associated with higher quantification variability. BamHI-W can be considered for clinical and therapeutic monitoring to detect an early EBV DNA and a dynamic change in viral load. PMID:28850597

  17. Measurement of fetal fraction in cell-free DNA from maternal plasma using a panel of insertion/deletion polymorphisms.

    PubMed

    Barrett, Angela N; Xiong, Li; Tan, Tuan Z; Advani, Henna V; Hua, Rui; Laureano-Asibal, Cecille; Soong, Richie; Biswas, Arijit; Nagarajan, Niranjan; Choolani, Mahesh

    2017-01-01

    Cell-free DNA from maternal plasma can be used for non-invasive prenatal testing for aneuploidies and single gene disorders, and also has applications as a biomarker for monitoring high-risk pregnancies, such as those at risk of pre-eclampsia. On average, the fractional cell-free fetal DNA concentration in plasma is approximately 15%, but can vary from less than 4% to greater than 30%. Although quantification of cell-free fetal DNA is straightforward in the case of a male fetus, there is no universal fetal marker; in a female fetus measurement is more challenging. We have developed a panel of multiplexed insertion/deletion polymorphisms that can measure fetal fraction in all pregnancies in a simple, targeted sequencing reaction. A multiplex panel of primers was designed for 35 indels plus a ZFX/ZFY amplicon. cfDNA was extracted from plasma from 157 pregnant women, and maternal genomic DNA was extracted for 20 of these samples for panel validation. Sixty-one samples from pregnancies with a male fetus were subjected to whole genome sequencing on the Ion Proton sequencing platform, and fetal fraction derived from Y chromosome counts was compared to fetal fraction measured using the indel panel. A total of 157 cell-free DNA samples were sequenced using the indel panel, and informativity was assessed, along with the proportion of fetal DNA. Using gDNA we optimised the indel panel, removing amplicons giving rise to PCR bias. Good correlation was found between fetal fraction using indels and using whole genome sequencing of the Y chromosome (Spearmans r = 0.69). A median of 12 indels were informative per sample. The indel panel was informative in 157/157 cases (mean fetal fraction 14.4% (±0.58%)). Using our targeted next generation sequencing panel we can readily assess the fetal DNA percentage in male and female pregnancies.

  18. Measurement of fetal fraction in cell-free DNA from maternal plasma using a panel of insertion/deletion polymorphisms

    PubMed Central

    Xiong, Li; Tan, Tuan Z.; Advani, Henna V.; Hua, Rui; Laureano-Asibal, Cecille; Soong, Richie; Biswas, Arijit; Nagarajan, Niranjan; Choolani, Mahesh

    2017-01-01

    Objective Cell-free DNA from maternal plasma can be used for non-invasive prenatal testing for aneuploidies and single gene disorders, and also has applications as a biomarker for monitoring high-risk pregnancies, such as those at risk of pre-eclampsia. On average, the fractional cell-free fetal DNA concentration in plasma is approximately 15%, but can vary from less than 4% to greater than 30%. Although quantification of cell-free fetal DNA is straightforward in the case of a male fetus, there is no universal fetal marker; in a female fetus measurement is more challenging. We have developed a panel of multiplexed insertion/deletion polymorphisms that can measure fetal fraction in all pregnancies in a simple, targeted sequencing reaction. Methods A multiplex panel of primers was designed for 35 indels plus a ZFX/ZFY amplicon. cfDNA was extracted from plasma from 157 pregnant women, and maternal genomic DNA was extracted for 20 of these samples for panel validation. Sixty-one samples from pregnancies with a male fetus were subjected to whole genome sequencing on the Ion Proton sequencing platform, and fetal fraction derived from Y chromosome counts was compared to fetal fraction measured using the indel panel. A total of 157 cell-free DNA samples were sequenced using the indel panel, and informativity was assessed, along with the proportion of fetal DNA. Results Using gDNA we optimised the indel panel, removing amplicons giving rise to PCR bias. Good correlation was found between fetal fraction using indels and using whole genome sequencing of the Y chromosome (Spearmans r = 0.69). A median of 12 indels were informative per sample. The indel panel was informative in 157/157 cases (mean fetal fraction 14.4% (±0.58%)). Conclusions Using our targeted next generation sequencing panel we can readily assess the fetal DNA percentage in male and female pregnancies. PMID:29084245

  19. A DNA Mini-Barcoding System for Authentication of Processed Fish Products.

    PubMed

    Shokralla, Shadi; Hellberg, Rosalee S; Handy, Sara M; King, Ian; Hajibabaei, Mehrdad

    2015-10-30

    Species substitution is a form of seafood fraud for the purpose of economic gain. DNA barcoding utilizes species-specific DNA sequence information for specimen identification. Previous work has established the usability of short DNA sequences-mini-barcodes-for identification of specimens harboring degraded DNA. This study aims at establishing a DNA mini-barcoding system for all fish species commonly used in processed fish products in North America. Six mini-barcode primer pairs targeting short (127-314 bp) fragments of the cytochrome c oxidase I (CO1) DNA barcode region were developed by examining over 8,000 DNA barcodes from species in the U.S. Food and Drug Administration (FDA) Seafood List. The mini-barcode primer pairs were then tested against 44 processed fish products representing a range of species and product types. Of the 44 products, 41 (93.2%) could be identified at the species or genus level. The greatest mini-barcoding success rate found with an individual primer pair was 88.6% compared to 20.5% success rate achieved by the full-length DNA barcode primers. Overall, this study presents a mini-barcoding system that can be used to identify a wide range of fish species in commercial products and may be utilized in high throughput DNA sequencing for authentication of heavily processed fish products.

  20. A single-molecule sequencing assay for the comprehensive profiling of T4 DNA ligase fidelity and bias during DNA end-joining.

    PubMed

    Potapov, Vladimir; Ong, Jennifer L; Langhorst, Bradley W; Bilotti, Katharina; Cahoon, Dan; Canton, Barry; Knight, Thomas F; Evans, Thomas C; Lohman, Gregory Js

    2018-05-08

    DNA ligases are key enzymes in molecular and synthetic biology that catalyze the joining of breaks in duplex DNA and the end-joining of DNA fragments. Ligation fidelity (discrimination against the ligation of substrates containing mismatched base pairs) and bias (preferential ligation of particular sequences over others) have been well-studied in the context of nick ligation. However, almost no data exist for fidelity and bias in end-joining ligation contexts. In this study, we applied Pacific Biosciences Single-Molecule Real-Time sequencing technology to directly sequence the products of a highly multiplexed ligation reaction. This method has been used to profile the ligation of all three-base 5'-overhangs by T4 DNA ligase under typical ligation conditions in a single experiment. We report the relative frequency of all ligation products with or without mismatches, the position-dependent frequency of each mismatch, and the surprising observation that 5'-TNA overhangs ligate extremely inefficiently compared to all other Watson-Crick pairings. The method can easily be extended to profile other ligases, end-types (e.g. blunt ends and overhangs of different lengths), and the effect of adjacent sequence on the ligation results. Further, the method has the potential to provide new insights into the thermodynamics of annealing and the kinetics of end-joining reactions.

  1. Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

    DOEpatents

    Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S

    2013-06-25

    A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.

  2. Direct PCR Offers a Fast and Reliable Alternative to Conventional DNA Isolation Methods for Gut Microbiomes.

    PubMed

    Videvall, Elin; Strandh, Maria; Engelbrecht, Anel; Cloete, Schalk; Cornwallis, Charlie K

    2017-01-01

    The gut microbiome of animals is emerging as an important factor influencing ecological and evolutionary processes. A major bottleneck in obtaining microbiome data from large numbers of samples is the time-consuming laboratory procedures required, specifically the isolation of DNA and generation of amplicon libraries. Recently, direct PCR kits have been developed that circumvent conventional DNA extraction steps, thereby streamlining the laboratory process by reducing preparation time and costs. However, the reliability and efficacy of direct PCR for measuring host microbiomes have not yet been investigated other than in humans with 454 sequencing. Here, we conduct a comprehensive evaluation of the microbial communities obtained with direct PCR and the widely used Mo Bio PowerSoil DNA extraction kit in five distinct gut sample types (ileum, cecum, colon, feces, and cloaca) from 20 juvenile ostriches, using 16S rRNA Illumina MiSeq sequencing. We found that direct PCR was highly comparable over a range of measures to the DNA extraction method in cecal, colon, and fecal samples. However, the two methods significantly differed in samples with comparably low bacterial biomass: cloacal and especially ileal samples. We also sequenced 100 replicate sample pairs to evaluate repeatability during both extraction and PCR stages and found that both methods were highly consistent for cecal, colon, and fecal samples ( r s > 0.7) but had low repeatability for cloacal ( r s = 0.39) and ileal ( r s = -0.24) samples. This study indicates that direct PCR provides a fast, cheap, and reliable alternative to conventional DNA extraction methods for retrieving 16S rRNA data, which can aid future gut microbiome studies. IMPORTANCE The microbial communities of animals can have large impacts on their hosts, and the number of studies using high-throughput sequencing to measure gut microbiomes is rapidly increasing. However, the library preparation procedure in microbiome research is both costly and time-consuming, especially for large numbers of samples. We investigated a cheaper and faster direct PCR method designed to bypass the DNA isolation steps during 16S rRNA library preparation and compared it with a standard DNA extraction method. We used both techniques on five different gut sample types collected from 20 juvenile ostriches and sequenced samples with Illumina MiSeq. The methods were highly comparable and highly repeatable in three sample types with high microbial biomass (cecum, colon, and feces), but larger differences and low repeatability were found in the microbiomes obtained from the ileum and cloaca. These results will help microbiome researchers assess library preparation procedures and plan their studies accordingly.

  3. Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

    DOEpatents

    Gardner, Shea N [San Leandro, CA; Mariella, Jr., Raymond P.; Christian, Allen T [Tracy, CA; Young, Jennifer A [Berkeley, CA; Clague, David S [Livermore, CA

    2011-01-18

    A method of fabricating a DNA molecule of user-defined sequence. The method comprises the steps of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an even or odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths. In one embodiment starting sequence fragments are of different lengths, n, n+1, n+2, etc.

  4. Role of DNA conformation & energetic insights in Msx-1-DNA recognition as revealed by molecular dynamics studies on specific and nonspecific complexes.

    PubMed

    Kachhap, Sangita; Singh, Balvinder

    2015-01-01

    In most of homeodomain-DNA complexes, glutamine or lysine is present at 50th position and interacts with 5th and 6th nucleotide of core recognition region. Molecular dynamics simulations of Msx-1-DNA complex (Q50-TG) and its variant complexes, that is specific (Q50K-CC), nonspecific (Q50-CC) having mutation in DNA and (Q50K-TG) in protein, have been carried out. Analysis of protein-DNA interactions and structure of DNA in specific and nonspecific complexes show that amino acid residues use sequence-dependent shape of DNA to interact. The binding free energies of all four complexes were analysed to define role of amino acid residue at 50th position in terms of binding strength considering the variation in DNA on stability of protein-DNA complexes. The order of stability of protein-DNA complexes shows that specific complexes are more stable than nonspecific ones. Decomposition analysis shows that N-terminal amino acid residues have been found to contribute maximally in binding free energy of protein-DNA complexes. Among specific protein-DNA complexes, K50 contributes more as compared to Q50 towards binding free energy in respective complexes. The sequence dependence of local conformation of DNA enables Q50/Q50K to make hydrogen bond with nucleotide(s) of DNA. The changes in amino acid sequence of protein are accommodated and stabilized around TAAT core region of DNA having variation in nucleotides.

  5. Construction of a small Mus musculus repetitive DNA library: identification of a new satellite sequence in Mus musculus.

    PubMed Central

    Pietras, D F; Bennett, K L; Siracusa, L D; Woodworth-Gutai, M; Chapman, V M; Gross, K W; Kane-Haas, C; Hastie, N D

    1983-01-01

    We report the construction of a small library of recombinant plasmids containing Mus musculus repetitive DNA inserts. The repetitive cloned fraction was derived from denatured genomic DNA by reassociation to a Cot value at which repetitive, but not unique, sequences have reannealed followed by exhaustive S1 nuclease treatment to degrade single stranded DNA. Initial characterizations of this library by colony filter hybridizations have led to the identification of a previously undetected M. musculus minor satellite as well as to clones containing M. musculus major satellite sequences. This new satellite is repeated 10-20 times less than the major satellite in the M. musculus genome. It has a repeat length of 130 nucleotides compared with the M. musculus major satellite with a repeat length of 234 nucleotides. Sequence analysis of the minor satellite has shown that it has a 29 base pair region with extensive homology to one of the major satellite repeating subunits. We also show by in situ hybridization that this minor satellite sequence is located at the centromeres and possibly the arms of at least half the M musculus chromosomes. Sequences related to the minor satellite have been found in the DNA of a related Mus species, Mus spretus, and may represent the major satellite of that species. Images PMID:6314268

  6. DNA Barcodes for Species Identification in the Hyperdiverse Ant Genus Pheidole (Formicidae: Myrmicinae)

    PubMed Central

    Ng'endo, R.N.; Osiemo, Z.B.; Brandl, R.

    2013-01-01

    DNA sequencing is increasingly being used to assist in species identification in order to overcome taxonomic impediment. However, few studies attempt to compare the results of these molecular studies with a more traditional species delineation approach based on morphological characters. Mitochondrial DNA Cytochrome oxidase subunit 1 (CO1) gene was sequenced, measuring 636 base pairs, from 47 ants of the genus Pheidole (Formicidae: Myrmicinae) collected in the Brazilian Atlantic Forest to test whether the morphology-based assignment of individuals into species is supported by DNA-based species delimitation. Twenty morphospecies were identified, whereas the barcoding analysis identified 19 Molecular Operational Taxonomic Units (MOTUs). Fifteen out of the 19 DNA-based clusters allocated, using sequence divergence thresholds of 2% and 3%, matched with morphospecies. Both thresholds yielded the same number of MOTUs. Only one MOTU was successfully identified to species level using the CO1 sequences of Pheidole species already in the Genbank. The average pairwise sequence divergence for all 47 sequences was 19%, ranging between 0–25%. In some cases, however, morphology and molecular based methods differed in their assignment of individuals to morphospecies or MOTUs. The occurrence of distinct mitochondrial lineages within morphological species highlights groups for further detailed genetic and morphological studies, and therefore a pluralistic approach using several methods to understand the taxonomy of difficult lineages is advocated. PMID:23902257

  7. Phylogenetic Position of a Copper Age Sheep (Ovis aries) Mitochondrial DNA

    PubMed Central

    Olivieri, Cristina; Ermini, Luca; Rizzi, Ermanno; Corti, Giorgio; Luciani, Stefania; Marota, Isolina; De Bellis, Gianluca; Rollo, Franco

    2012-01-01

    Background Sheep (Ovis aries) were domesticated in the Fertile Crescent region about 9,000-8,000 years ago. Currently, few mitochondrial (mt) DNA studies are available on archaeological sheep. In particular, no data on archaeological European sheep are available. Methodology/Principal Findings Here we describe the first portion of mtDNA sequence of a Copper Age European sheep. DNA was extracted from hair shafts which were part of the clothes of the so-called Tyrolean Iceman or Ötzi (5,350 - 5,100 years before present). Mitochondrial DNA (a total of 2,429 base pairs, encompassing a portion of the control region, tRNAPhe, a portion of the 12S rRNA gene, and the whole cytochrome B gene) was sequenced using a mixed sequencing procedure based on PCR amplification and 454 sequencing of pooled amplification products. We have compared the sequence with the corresponding sequence of 334 extant lineages. Conclusions/Significance A phylogenetic network based on a new cladistic notation for the mitochondrial diversity of domestic sheep shows that the Ötzi's sheep falls within haplogroup B, thus demonstrating that sheep belonging to this haplogroup were already present in the Alps more than 5,000 years ago. On the other hand, the lineage of the Ötzi's sheep is defined by two transitions (16147, and 16440) which, assembled together, define a motif that has not yet been identified in modern sheep populations. PMID:22457789

  8. Strongylus asini (Nematoda, Strongyloidea): genetic relationships with other Strongylus species determined by ribosomal DNA.

    PubMed

    Hung, G C; Jacobs, D E; Krecek, R C; Gasser, R B; Chilton, N B

    1996-12-01

    Genomic DNA was isolated from adult Strongylus asini collected from zebra. The second ribosomal transcribed spacer (ITS-2) was amplified and sequenced using polymerase chain reaction (PCR) based techniques. The DNA sequence was compared with previously published data for 3 related Strongylus species. A PCR-linked restriction fragment length polymorphism method allowed the 4 species to be differentiated unequivocally. The ITS-2 sequence of S. asini was found to be more similar to those of S. edentatus (87.1%) and S. equinus (95.3%) than to that of S vulgaris (73.9%). This result confirms that S. Asini and S vulgaris represent separate species and supports the retention of the 4 species within 1 genus.

  9. Highly conserved D-loop-like nuclear mitochondrial sequences (Numts) in tiger (Panthera tigris).

    PubMed

    Zhang, Wenping; Zhang, Zhihe; Shen, Fujun; Hou, Rong; Lv, Xiaoping; Yue, Bisong

    2006-08-01

    Using oligonucleotide primers designed to match hypervariable segments I (HVS-1) of Panthera tigris mitochondrial DNA (mtDNA), we amplified two different PCR products (500 bp and 287 bp) in the tiger (Panthera tigris), but got only one PCR product (287 bp) in the leopard (Panthera pardus). Sequence analyses indicated that the sequence of 287 bp was a D-loop-like nuclear mitochondrial sequence (Numts), indicating a nuclear transfer that occurred approximately 4.8-17 million years ago in the tiger and 4.6-16 million years ago in the leopard. Although the mtDNA D-loop sequence has a rapid rate of evolution, the 287-bp Numts are highly conserved; they are nearly identical in tiger subspecies and only 1.742% different between tiger and leopard. Thus, such sequences represent molecular 'fossils' that can shed light on evolution of the mitochondrial genome and may be the most appropriate outgroup for phylogenetic analysis. This is also proved by comparing the phylogenetic trees reconstructed using the D-loop sequence of snow leopard and the 287-bp Numts as outgroup.

  10. Composition and immuno-stimulatory properties of extracellular DNA from mouse gut flora.

    PubMed

    Qi, Ce; Li, Ya; Yu, Ren-Qiang; Zhou, Sheng-Li; Wang, Xing-Guo; Le, Guo-Wei; Jin, Qing-Zhe; Xiao, Hang; Sun, Jin

    2017-11-28

    To demonstrate that specific bacteria might release bacterial extracellular DNA (eDNA) to exert immunomodulatory functions in the mouse small intestine. Extracellular DNA was extracted using phosphate buffered saline with 0.5 mmol/L dithiothreitol combined with two phenol extractions. TOTO-1 iodide, a cell-impermeant and high-affinity nucleic acid stain, was used to confirm the existence of eDNA in the mucus layers of the small intestine and colon in healthy Male C57BL/6 mice. Composition difference of eDNA and intracellular DNA (iDNA) of the small intestinal mucus was studied by Illumina sequencing and terminal restriction fragment length polymorphism (T-RFLP). Stimulation of cytokine production by eDNA was studied in RAW264.7 cells in vitro . TOTO-1 iodide staining confirmed existence of eDNA in loose mucus layer of the mouse colon and thin surface mucus layer of the small intestine. Illumina sequencing analysis and T-RFLP revealed that the composition of the eDNA in the small intestinal mucus was significantly different from that of the iDNA of the small intestinal mucus bacteria. Illumina Miseq sequencing showed that the eDNA sequences came mainly from Gram-negative bacteria of Bacteroidales S24-7. By contrast, predominant bacteria of the small intestinal flora comprised Gram-positive bacteria. Both eDNA and iDNA were added to native or lipopolysaccharide-stimulated Raw267.4 macrophages, respectively. The eDNA induced significantly lower tumor necrosis factor-α/interleukin-10 (IL-10) and IL-6/IL-10 ratios than iDNA, suggesting the predominance for maintaining immune homeostasis of the gut. Our results indicated that degraded bacterial genomic DNA was mainly released by Gram-negative bacteria, especially Bacteroidales-S24-7 and Stenotrophomonas genus in gut mucus of mice. They decreased pro-inflammatory activity compared to total gut flora genomic DNA.

  11. Identification of forensic samples by using an infrared-based automatic DNA sequencer.

    PubMed

    Ricci, Ugo; Sani, Ilaria; Klintschar, Michael; Cerri, Nicoletta; De Ferrari, Francesco; Giovannucci Uzielli, Maria Luisa

    2003-06-01

    We have recently introduced a new protocol for analyzing all core loci of the Federal Bureau of Investigation's (FBI) Combined DNA Index System (CODIS) with an infrared (IR) automatic DNA sequencer (LI-COR 4200). The amplicons were labeled with forward oligonucleotide primers, covalently linked to a new infrared fluorescent molecule (IRDye 800). The alleles were displayed as familiar autoradiogram-like images with real-time detection. This protocol was employed for paternity testing, population studies, and identification of degraded forensic samples. We extensively analyzed some simulated forensic samples and mixed stains (blood, semen, saliva, bones, and fixed archival embedded tissues), comparing the results with donor samples. Sensitivity studies were also performed for the four multiplex systems. Our results show the efficiency, reliability, and accuracy of the IR system for the analysis of forensic samples. We also compared the efficiency of the multiplex protocol with ultraviolet (UV) technology. Paternity tests, undegraded DNA samples, and real forensic samples were analyzed with this approach based on IR technology and with UV-based automatic sequencers in combination with commercially-available kits. The comparability of the results with the widespread UV methods suggests that it is possible to exchange data between laboratories using the same core group of markers but different primer sets and detection methods.

  12. Effect of DNA Extraction Methods on the Apparent Structure of Yak Rumen Microbial Communities as Revealed by 16S rDNA Sequencing.

    PubMed

    Chen, Ya-Bing; Lan, Dao-Liang; Tang, Cheng; Yang, Xiao-Nong; Li, Jian

    2015-01-01

    To more efficiently identify the microbial community of the yak rumen, the standardization of DNA extraction is key to ensure fidelity while studying environmental microbial communities. In this study, we systematically compared the efficiency of several extraction methods based on DNA yield, purity, and 16S rDNA sequencing to determine the optimal DNA extraction methods whose DNA products reflect complete bacterial communities. The results indicate that method 6 (hexadecyltrimethylammomium bromide-lysozyme-physical lysis by bead beating) is recommended for the DNA isolation of the rumen microbial community due to its high yield, operational taxonomic unit, bacterial diversity, and excellent cell-breaking capability. The results also indicate that the bead-beating step is necessary to effectively break down the cell walls of all of the microbes, especially Gram-positive bacteria. Another aim of this study was to preliminarily analyze the bacterial community via 16S rDNA sequencing. The microbial community spanned approximately 21 phyla, 35 classes, 75 families, and 112 genera. A comparative analysis showed some variations in the microbial community between yaks and cattle that may be attributed to diet and environmental differences. Interestingly, numerous uncultured or unclassified bacteria were found in yak rumen, suggesting that further research is required to determine the specific functional and ecological roles of these bacteria in yak rumen. In summary, the investigation of the optimal DNA extraction methods and the preliminary evaluation of the bacterial community composition of yak rumen support further identification of the specificity of the rumen microbial community in yak and the discovery of distinct gene resources.

  13. Tissue-specific DNA methylation is conserved across human, mouse, and rat, and driven by primary sequence conservation.

    PubMed

    Zhou, Jia; Sears, Renee L; Xing, Xiaoyun; Zhang, Bo; Li, Daofeng; Rockweiler, Nicole B; Jang, Hyo Sik; Choudhary, Mayank N K; Lee, Hyung Joo; Lowdon, Rebecca F; Arand, Jason; Tabers, Brianne; Gu, C Charles; Cicero, Theodore J; Wang, Ting

    2017-09-12

    Uncovering mechanisms of epigenome evolution is an essential step towards understanding the evolution of different cellular phenotypes. While studies have confirmed DNA methylation as a conserved epigenetic mechanism in mammalian development, little is known about the conservation of tissue-specific genome-wide DNA methylation patterns. Using a comparative epigenomics approach, we identified and compared the tissue-specific DNA methylation patterns of rat against those of mouse and human across three shared tissue types. We confirmed that tissue-specific differentially methylated regions are strongly associated with tissue-specific regulatory elements. Comparisons between species revealed that at a minimum 11-37% of tissue-specific DNA methylation patterns are conserved, a phenomenon that we define as epigenetic conservation. Conserved DNA methylation is accompanied by conservation of other epigenetic marks including histone modifications. Although a significant amount of locus-specific methylation is epigenetically conserved, the majority of tissue-specific DNA methylation is not conserved across the species and tissue types that we investigated. Examination of the genetic underpinning of epigenetic conservation suggests that primary sequence conservation is a driving force behind epigenetic conservation. In contrast, evolutionary dynamics of tissue-specific DNA methylation are best explained by the maintenance or turnover of binding sites for important transcription factors. Our study extends the limited literature of comparative epigenomics and suggests a new paradigm for epigenetic conservation without genetic conservation through analysis of transcription factor binding sites.

  14. Interpreting the biological relevance of bioinformatic analyses with T-DNA sequence for protein allergenicity.

    PubMed

    Harper, B; McClain, S; Ganko, E W

    2012-08-01

    Global regulatory agencies require bioinformatic sequence analysis as part of their safety evaluation for transgenic crops. Analysis typically focuses on encoded proteins and adjacent endogenous flanking sequences. Recently, regulatory expectations have expanded to include all reading frames of the inserted DNA. The intent is to provide biologically relevant results that can be used in the overall assessment of safety. This paper evaluates the relevance of assessing the allergenic potential of all DNA reading frames found in common food genes using methods considered for the analysis of T-DNA sequences used in transgenic crops. FASTA and BLASTX algorithms were used to compare genes from maize, rice, soybean, cucumber, melon, watermelon, and tomato using international regulatory guidance. Results show that BLASTX for maize yielded 7254 alignments that exceeded allergen similarity thresholds and 210,772 alignments that matched eight or more consecutive amino acids with an allergen; other crops produced similar results. This analysis suggests that each nontransgenic crop has a much greater potential for allergenic risk than what has been observed clinically. We demonstrate that a meaningful safety assessment is unlikely to be provided by using methods with inherently high frequencies of false positive alignments when broadly applied to all reading frames of DNA sequence. Copyright © 2012 Elsevier Inc. All rights reserved.

  15. Molecular identification and phylogenetic analysis of important medicinal plant species in genus Paeonia based on rDNA-ITS, matK, and rbcL DNA barcode sequences.

    PubMed

    Kim, W J; Ji, Y; Choi, G; Kang, Y M; Yang, S; Moon, B C

    2016-08-05

    This study was performed to identify and analyze the phylogenetic relationship among four herbaceous species of the genus Paeonia, P. lactiflora, P. japonica, P. veitchii, and P. suffruticosa, using DNA barcodes. These four species, which are commonly used in traditional medicine as Paeoniae Radix and Moutan Radicis Cortex, are pharmaceutically defined in different ways in the national pharmacopoeias in Korea, Japan, and China. To authenticate the different species used in these medicines, we evaluated rDNA-internal transcribed spacers (ITS), matK and rbcL regions, which provide information capable of effectively distinguishing each species from one another. Seventeen samples were collected from different geographic regions in Korea and China, and DNA barcode regions were amplified using universal primers. Comparative analyses of these DNA barcode sequences revealed species-specific nucleotide sequences capable of discriminating the four Paeonia species. Among the entire sequences of three barcodes, marker nucleotides were identified at three positions in P. lactiflora, eleven in P. japonica, five in P. veitchii, and 25 in P. suffruticosa. Phylogenetic analyses also revealed four distinct clusters showing homogeneous clades with high resolution at the species level. The results demonstrate that the analysis of these three DNA barcode sequences is a reliable method for identifying the four Paeonia species and can be used to authenticate Paeoniae Radix and Moutan Radicis Cortex at the species level. Furthermore, based on the assessment of amplicon sizes, inter/intra-specific distances, marker nucleotides, and phylogenetic analysis, rDNA-ITS was the most suitable DNA barcode for identification of these species.

  16. Genotyping of 25 leukemia-associated genes in a single work flow by next-generation sequencing technology with low amounts of input template DNA.

    PubMed

    Rinke, Jenny; Schäfer, Vivien; Schmidt, Mathias; Ziermann, Janine; Kohlmann, Alexander; Hochhaus, Andreas; Ernst, Thomas

    2013-08-01

    We sought to establish a convenient, sensitive next-generation sequencing (NGS) method for genotyping the 26 most commonly mutated leukemia-associated genes in a single work flow and to optimize this method for low amounts of input template DNA. We designed 184 PCR amplicons that cover all of the candidate genes. NGS was performed with genomic DNA (gDNA) from a cohort of 10 individuals with chronic myelomonocytic leukemia. The results were compared with NGS data obtained from sequencing of DNA generated by whole-genome amplification (WGA) of 20 ng template gDNA. Differences between gDNA and WGA samples in variant frequencies were determined for 2 different WGA kits. For gDNA samples, 25 of 26 genes were successfully sequenced with a sensitivity of 5%, which was achieved by a median coverage of 492 reads (range, 308-636 reads) per amplicon. We identified 24 distinct mutations in 11 genes. With WGA samples, we reliably detected all mutations above 5% sensitivity with a median coverage of 506 reads (range, 256-653 reads) per amplicon. With all variants included in the analysis, WGA amplification by the 2 kits tested yielded differences in variant frequencies that ranged from -28.19% to +9.94% [mean (SD) difference, -0.2% (4.08%)] and from -35.03% to +18.67% [mean difference, -0.75% (5.12%)]. Our method permits simultaneous analysis of a wide range of leukemia-associated target genes in a single sequencing run. NGS can be performed after WGA of template DNA for reliable detection of variants without introducing appreciable bias.

  17. Mitochondrial DNA variant at HVI region as a candidate of genetic markers of type 2 diabetes

    NASA Astrophysics Data System (ADS)

    Gumilar, Gun Gun; Purnamasari, Yunita; Setiadi, Rahmat

    2016-02-01

    Mitochondrial DNA (mtDNA) is maternally inherited. mtDNA mutations which can contribute to the excess of maternal inheritance of type 2 diabetes. Due to the high mutation rate, one of the areas in the mtDNA that is often associated with the disease is the hypervariable region I (HVI). Therefore, this study was conducted to determine the genetic variants of human mtDNA HVI that related to the type 2 diabetes in four samples that were taken from four generations in one lineage. Steps being taken include the lyses of hair follicles, amplification of mtDNA HVI fragment using Polymerase Chain Reaction (PCR), detection of PCR products through agarose gel electrophoresis technique, the measurement of the concentration of mtDNA using UV-Vis spectrophotometer, determination of the nucleotide sequence via direct sequencing method and analysis of the sequencing results using SeqMan DNASTAR program. Based on the comparison between nucleotide sequence of samples and revised Cambridge Reference Sequence (rCRS) obtained six same mutations that these are C16147T, T16189C, C16193del, T16127C, A16235G, and A16293C. After comparing the data obtained to the secondary data from Mitomap and NCBI, it were found that two mutations, T16189C and T16217C, become candidates as genetic markers of type 2 diabetes even the mutations were found also in the generations of undiagnosed type 2 diabetes. The results of this study are expected to give contribution to the collection of human mtDNA database of genetic variants that associated to metabolic diseases, so that in the future it can be utilized in various fields, especially in medicine.

  18. Mitochondrial cytochrome c oxidase subunit 1 gene and nuclear rDNA regions of Enterobius vermicularis parasitic in captive chimpanzees with special reference to its relationship with pinworms in humans.

    PubMed

    Nakano, Tadao; Okamoto, Munehiro; Ikeda, Yatsukaho; Hasegawa, Hideo

    2006-12-01

    Sequences of mitochondrial cytochrome c oxidase subunit 1 (CO1) gene, nuclear internal transcribed spacer 2 (ITS2) region of ribosomal DNA (rDNA), and 5S rDNA of Enterobius vermicularis from captive chimpanzees in five zoos/institutions in Japan were analyzed and compared with those of pinworm eggs from humans in Japan. Three major types of variants appearing in both CO1 and ITS2 sequences, but showing no apparent connection, were observed among materials collected from the chimpanzees. Each one of them was also observed in pinworms in humans. Sequences of 5S rDNA were identical in the materials from chimpanzees and humans. Phylogenetic analysis of CO1 gene revealed three clusters with high bootstrap value, suggesting considerable divergence, presumably correlated with human evolution, has occurred in the human pinworms. The synonymy of E. gregorii with E. vermicularis is supported by the molecular evidence.

  19. DNA methylation and targeted sequencing of methyltransferases family genes in canine acute myeloid leukaemia, modelling human myeloid leukaemia.

    PubMed

    Bronzini, I; Aresu, L; Paganin, M; Marchioretto, L; Comazzi, S; Cian, F; Riondato, F; Marconato, L; Martini, V; Te Kronnie, G

    2017-09-01

    Tumours shows aberrant DNA methylation patterns, being hypermethylated or hypomethylated compared with normal tissues. In human acute myeloid leukaemia (hAML) mutations in DNA methyltransferase (DNMT3A) are associated to a more aggressive tumour behaviour. As AML is lethal in dogs, we defined global DNA methylation content, and screened the C-terminal domain of DNMT3 family of genes for sequence variants in 39 canine acute myeloid leukaemia (cAML) cases. A heterogeneous pattern of DNA methylation was found among cAML samples, with subsets of cases being hypermethylated or hypomethylated compared with healthy controls; four recurrent single nucleotide variations (SNVs) were found in DNMT3L gene. Although SNVs were not directly correlated to whole genome DNA methylation levels, all hypomethylated cAML cases were homozygous for the deleterious mutation at p.Arg222Trp. This study contributes to understand genetic modifications of cAML, leading up to studies that will elucidate the role of methylome alterations in the pathogenesis of AML in dogs. © 2016 John Wiley & Sons Ltd.

  20. mtDNA and the Origin of the Icelanders: Deciphering Signals of Recent Population History

    PubMed Central

    Helgason, Agnar; Sigurðardóttir, Sigrún; Gulcher, Jeffrey R.; Ward, Ryk; Stefánsson, Kári

    2000-01-01

    Previous attempts to investigate the origin of the Icelanders have provided estimates of ancestry ranging from a 98% British Isles contribution to an 86% Scandinavian contribution. We generated mitochondrial sequence data for 401 Icelandic individuals and compared these data with >2,500 other European sequences from published sources, to determine the probable origins of women who contributed to Iceland’s settlement. Although the mean number of base-pair differences is high in the Icelandic sequences and they are widely distributed in the overall European mtDNA phylogeny, we find a smaller number of distinct mitochondrial lineages, compared with most other European populations. The frequencies of a number of mtDNA lineages in the Icelanders deviate noticeably from those in neighboring populations, suggesting that founder effects and genetic drift may have had a considerable influence on the Icelandic gene pool. This is in accordance with available demographic evidence about Icelandic population history. A comparison with published mtDNA lineages from European populations indicates that, whereas most founding females probably originated from Scandinavia and the British Isles, lesser contributions from other populations may also have taken place. We present a highly resolved phylogenetic network for the Icelandic data, identifying a number of previously unreported mtDNA lineage clusters and providing a detailed depiction of the evolutionary relationships between European mtDNA clusters. Our findings indicate that European populations contain a large number of closely related mitochondrial lineages, many of which have not yet been sampled in the current comparative data set. Consequently, substantial increases in sample sizes that use mtDNA data will be needed to obtain valid estimates of the diverse ancestral mixtures that ultimately gave rise to contemporary populations. PMID:10712214

  1. Next generation sequencing analysis reveals a relationship between rDNA unit diversity and locus number in Nicotiana diploids

    PubMed Central

    2012-01-01

    Background Tandemly arranged nuclear ribosomal DNA (rDNA), encoding 18S, 5.8S and 26S ribosomal RNA (rRNA), exhibit concerted evolution, a pattern thought to result from the homogenisation of rDNA arrays. However rDNA homogeneity at the single nucleotide polymorphism (SNP) level has not been detailed in organisms with more than a few hundred copies of the rDNA unit. Here we study rDNA complexity in species with arrays consisting of thousands of units. Methods We examined homogeneity of genic (18S) and non-coding internally transcribed spacer (ITS1) regions of rDNA using Roche 454 and/or Illumina platforms in four angiosperm species, Nicotiana sylvestris, N. tomentosiformis, N. otophora and N. kawakamii. We compared the data with Southern blot hybridisation revealing the structure of intergenic spacer (IGS) sequences and with the number and distribution of rDNA loci. Results and Conclusions In all four species the intragenomic homogeneity of the 18S gene was high; a single ribotype makes up over 90% of the genes. However greater variation was observed in the ITS1 region, particularly in species with two or more rDNA loci, where >55% of rDNA units were a single ribotype, with the second most abundant variant accounted for >18% of units. IGS heterogeneity was high in all species. The increased number of ribotypes in ITS1 compared with 18S sequences may reflect rounds of incomplete homogenisation with strong selection for functional genic regions and relaxed selection on ITS1 variants. The relationship between the number of ITS1 ribotypes and the number of rDNA loci leads us to propose that rDNA evolution and complexity is influenced by locus number and/or amplification of orphaned rDNA units at new chromosomal locations. PMID:23259460

  2. DNA Barcoding Identifies Illegal Parrot Trade.

    PubMed

    Gonçalves, Priscila F M; Oliveira-Marques, Adriana R; Matsumoto, Tania E; Miyaki, Cristina Y

    2015-01-01

    Illegal trade threatens the survival of many wild species, and molecular forensics can shed light on various questions raised during the investigation of cases of illegal trade. Among these questions is the identity of the species involved. Here we report a case of a man who was caught in a Brazilian airport trying to travel with 58 avian eggs. He claimed they were quail eggs, but authorities suspected they were from parrots. The embryos never hatched and it was not possible to identify them based on morphology. As 29% of parrot species are endangered, the identity of the species involved was important to establish a stronger criminal case. Thus, we identified the embryos' species based on the analyses of mitochondrial DNA sequences (cytochrome c oxidase subunit I gene [COI] and 16S ribosomal DNA). Embryonic COI sequences were compared with those deposited in BOLD (The Barcode of Life Data System) while their 16S sequences were compared with GenBank sequences. Clustering analysis based on neighbor-joining was also performed using parrot COI and 16S sequences deposited in BOLD and GenBank. The results, based on both genes, indicated that 57 embryos were parrots (Alipiopsitta xanthops, Ara ararauna, and the [Amazona aestiva/A. ochrocephala] complex), and 1 was an owl. This kind of data can help criminal investigations and to design species-specific anti-poaching strategies, and demonstrate how DNA sequence analysis in the identification of bird species is a powerful conservation tool. © The American Genetic Association 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  3. [DNA marker-assisted selection of medicinal plants (Ⅰ) .Breeding research of disease-resistant cultivars of Panax notoginseng].

    PubMed

    Dong, Lin-Lin; Chen, Zhong-Jian; Wang, Yong; Wei, Fu-Gang; Zhang, Lian-Juan; Xu, Jiang; Wei, Guang-Fei; Wang, Rui; Yang, Juan; Liu, Wei-Lin; Li, Xi-Wen; Yu, Yu-Qi; Chen, Shi-Lin

    2017-01-01

    DNA marker-assisted selection of medicinal plants is based on the DNA polymorphism, selects the DNA sequences related to the phenotypes such as high yields, superior quality, stress-resistance and so on according to the technologies of molecular hybridization, polymerase chain reaction and high-throughput sequencing, and assists the breeding of new cultivars. This study bred the first disease-resistant cultivar of notoginseng "Miaoxiang Kangqi 1" using the technology of DNA marker-assisted selection of medicinal plants and systematic breeding. The disease-resistant cultivar of notoginseng contained 12 special SNPs based on the analysis of Restriction-site Associated DNA Sequencing (RAD-Seq). Among the SNP (record_519688) was related to the root rot-resistant characteristics, which indicated this SNP could serve as genetic markers of disease-resistant cultivars and assist the systematic breeding. Compared to the conventional cultivated cultivars, the incidence rate of root-rot and rust-rot in notoginseng seedlings decreased by 83.6% and 71.8%, respectively. The incidence rate of root-rot respectively declined by 43.6% and 62.9% in notoginseng cultivation for 2 and 3 years compared with those of the conventional cultivated cultivars. Additionally, the potential disease-resistant groups were screened based on the relative SNP, and this model enlarged the target groups and advanced the breeding efficiency. DNA marker-assisted selection of medicinal plants accelerated the breeding and promotion of new cultivars, and guaranteed the healthy development of Chinese medicinal materials industry. Copyright© by the Chinese Pharmaceutical Association.

  4. Functional DNA quantification guides accurate next-generation sequencing mutation detection in formalin-fixed, paraffin-embedded tumor biopsies

    PubMed Central

    2013-01-01

    The formalin-fixed, paraffin-embedded (FFPE) biopsy is a challenging sample for molecular assays such as targeted next-generation sequencing (NGS). We compared three methods for FFPE DNA quantification, including a novel PCR assay (‘QFI-PCR’) that measures the absolute copy number of amplifiable DNA, across 165 residual clinical specimens. The results reveal the limitations of commonly used approaches, and demonstrate the value of an integrated workflow using QFI-PCR to improve the accuracy of NGS mutation detection and guide changes in input that can rescue low quality FFPE DNA. These findings address a growing need for improved quality measures in NGS-based patient testing. PMID:24001039

  5. Barcoding of fresh water fishes from Pakistan.

    PubMed

    Karim, Asma; Iqbal, Asad; Akhtar, Rehan; Rizwan, Muhammad; Amar, Ali; Qamar, Usman; Jahan, Shah

    2016-07-01

    DNA bar-coding is a taxonomic method that uses small genetic markers in organisms' mitochondrial DNA (mt DNA) for identification of particular species. It uses sequence diversity in a 658-base pair fragment near the 5' end of the mitochondrial cytochrome c oxidase subunit 1 (CO1) gene as a tool for species identification. DNA barcoding is more accurate and reliable method as compared with the morphological identification. It is equally useful in juveniles as well as adult stages of fishes. The present study was conducted to identify three farm fish species of Pakistan (Cyprinus carpio, Cirrhinus mrigala, and Ctenopharyngodon idella) genetically. All of them belonged to family cyprinidae. CO1 gene was amplified. PCR products were sequenced and analyzed by bioinformatic software. Conspecific, congenric, and confamilial k2P nucleotide divergence was estimated. From these findings, it was concluded that the gene sequence, CO1, may serve as milestone for the identification of related species at molecular level.

  6. EMPOP-quality mtDNA control region sequences from Kashmiri of Azad Jammu & Kashmir, Pakistan.

    PubMed

    Rakha, Allah; Peng, Min-Sheng; Bi, Rui; Song, Jiao-Jiao; Salahudin, Zeenat; Adan, Atif; Israr, Muhammad; Yao, Yong-Gang

    2016-11-01

    The mitochondrial DNA (mtDNA) control region (nucleotide position 16024-576) sequences were generated through Sanger sequencing method for 317 self-identified Kashmiris from all districts of Azad Jammu & Kashmir Pakistan. The population sample set showed a total of 251 haplotypes, with a relatively high haplotype diversity (0.9977) and a low random match probability (0.54%). The containing matrilineal lineages belonging to three different phylogeographic origins of Western Eurasian (48.9%), South Asian (47.0%) and East Asian (4.1%). The present study was compared to previous data from Pakistan and other worldwide populations (Central Asia, Western Asia, and East & Southeast Asia). The dataset is made available through EMPOP under accession number EMP00679 and will serve as an mtDNA reference database in forensic casework in Pakistan. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  7. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    NASA Astrophysics Data System (ADS)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  8. Chromosome evolution in the Thermotogales: large-scale inversions and strain diversification of CRISPR sequences.

    PubMed

    DeBoy, Robert T; Mongodin, Emmanuel F; Emerson, Joanne B; Nelson, Karen E

    2006-04-01

    In the present study, the chromosomes of two members of the Thermotogales were compared. A whole-genome alignment of Thermotoga maritima MSB8 and Thermotoga neapolitana NS-E has revealed numerous large-scale DNA rearrangements, most of which are associated with CRISPR DNA repeats and/or tRNA genes. These DNA rearrangements do not include the putative origin of DNA replication but move within the same replichore, i.e., the same replicating half of the chromosome (delimited by the replication origin and terminus). Based on cumulative GC skew analysis, both the T. maritima and T. neapolitana lineages contain one or two major inverted DNA segments. Also, based on PCR amplification and sequence analysis of the DNA joints that are associated with the major rearrangements, the overall chromosome architecture was found to be conserved at most DNA joints for other strains of T. neapolitana. Taken together, the results from this analysis suggest that the observed chromosomal rearrangements in the Thermotogales likely occurred by successive inversions after their divergence from a common ancestor and before strain diversification. Finally, sequence analysis shows that size polymorphisms in the DNA joints associated with CRISPRs can be explained by expansion and possibly contraction of the DNA repeat and spacer unit, providing a tool for discerning the relatedness of strains from different geographic locations.

  9. Sources of PCR-induced distortions in high-throughput sequencing data sets

    PubMed Central

    Kebschull, Justus M.; Zador, Anthony M.

    2015-01-01

    PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

  10. Supervised DNA Barcodes species classification: analysis, comparisons and results

    PubMed Central

    2014-01-01

    Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333

  11. Genetic alterations of hepatocellular carcinoma by random amplified polymorphic DNA analysis and cloning sequencing of tumor differential DNA fragment

    PubMed Central

    Xian, Zhi-Hong; Cong, Wen-Ming; Zhang, Shu-Hui; Wu, Meng-Chao

    2005-01-01

    AIM: To study the genetic alterations and their association with clinicopathological characteristics of hepatocellular carcinoma (HCC), and to find the tumor related DNA fragments. METHODS: DNA isolated from tumors and corresponding noncancerous liver tissues of 56 HCC patients was amplified by random amplified polymorphic DNA (RAPD) with 10 random 10-mer arbitrary primers. The RAPD bands showing obvious differences in tumor tissue DNA corresponding to that of normal tissue were separated, purified, cloned and sequenced. DNA sequences were analyzed and compared with GenBank data. RESULTS: A total of 56 cases of HCC were demonstrated to have genetic alterations, which were detected by at least one primer. The detestability of genetic alterations ranged from 20% to 70% in each case, and 17.9% to 50% in each primer. Serum HBV infection, tumor size, histological grade, tumor capsule, as well as tumor intrahepatic metastasis, might be correlated with genetic alterations on certain primers. A band with a higher intensity of 480 bp or so amplified fragments in tumor DNA relative to normal DNA could be seen in 27 of 56 tumor samples using primer 4. Sequence analysis of these fragments showed 91% homology with Homo sapiens double homeobox protein DUX10 gene. CONCLUSION: Genetic alterations are a frequent event in HCC, and tumor related DNA fragments have been found in this study, which may be associated with hepatocarcin-ogenesis. RAPD is an effective method for the identification and analysis of genetic alterations in HCC, and may provide new information for further evaluating the molecular mechanism of hepatocarcinogenesis. PMID:15996039

  12. Caught in the act: the lifetime of synaptic intermediates during the search for homology on DNA

    PubMed Central

    Mani, Adam; Braslavsky, Ido; Arbel-Goren, Rinat; Stavans, Joel

    2010-01-01

    Homologous recombination plays pivotal roles in DNA repair and in the generation of genetic diversity. To locate homologous target sequences at which strand exchange can occur within a timescale that a cell’s biology demands, a single-stranded DNA-recombinase complex must search among a large number of sequences on a genome by forming synapses with chromosomal segments of DNA. A key element in the search is the time it takes for the two sequences of DNA to be compared, i.e. the synapse lifetime. Here, we visualize for the first time fluorescently tagged individual synapses formed by RecA, a prokaryotic recombinase, and measure their lifetime as a function of synapse length and differences in sequence between the participating DNAs. Surprisingly, lifetimes can be ∼10 s long when the DNAs are fully heterologous, and much longer for partial homology, consistently with ensemble FRET measurements. Synapse lifetime increases rapidly as the length of a region of full homology at either the 3′- or 5′-ends of the invading single-stranded DNA increases above 30 bases. A few mismatches can reduce dramatically the lifetime of synapses formed with nearly homologous DNAs. These results suggest the need for facilitated homology search mechanisms to locate homology successfully within the timescales observed in vivo. PMID:20044347

  13. Exploring the roles of DNA methylation in the metal-reducing bacterium Shewanella oneidensis MR-1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bendall, Matthew L.; Luong, Khai; Wetmore, Kelly M.

    2013-08-30

    We performed whole genome analyses of DNA methylation in Shewanella 17 oneidensis MR-1 to examine its possible role in regulating gene expression and 18 other cellular processes. Single-Molecule Real Time (SMRT) sequencing 19 revealed extensive methylation of adenine (N6mA) throughout the 20 genome. These methylated bases were located in five sequence motifs, 21 including three novel targets for Type I restriction/modification enzymes. The 22 sequence motifs targeted by putative methyltranferases were determined via 23 SMRT sequencing of gene knockout mutants. In addition, we found S. 24 oneidensis MR-1 cultures grown under various culture conditions displayed 25 different DNA methylation patterns.more » However, the small number of differentially 26 methylated sites could not be directly linked to the much larger number of 27 differentially expressed genes in these conditions, suggesting DNA methylation is 28 not a major regulator of gene expression in S. oneidensis MR-1. The enrichment 29 of methylated GATC motifs in the origin of replication indicate DNA methylation 30 may regulate genome replication in a manner similar to that seen in Escherichia 31 coli. Furthermore, comparative analyses suggest that many 32 Gammaproteobacteria, including all members of the Shewanellaceae family, may 33 also utilize DNA methylation to regulate genome replication.« less

  14. Molecular Cloning and Characterization of cDNA Encoding a Putative Stress-Induced Heat-Shock Protein from Camelus dromedarius

    PubMed Central

    Elrobh, Mohamed S.; Alanazi, Mohammad S.; Khan, Wajahatullah; Abduljaleel, Zainularifeen; Al-Amri, Abdullah; Bazzi, Mohammad D.

    2011-01-01

    Heat shock proteins are ubiquitous, induced under a number of environmental and metabolic stresses, with highly conserved DNA sequences among mammalian species. Camelus dromedaries (the Arabian camel) domesticated under semi-desert environments, is well adapted to tolerate and survive against severe drought and high temperatures for extended periods. This is the first report of molecular cloning and characterization of full length cDNA of encoding a putative stress-induced heat shock HSPA6 protein (also called HSP70B′) from Arabian camel. A full-length cDNA (2417 bp) was obtained by rapid amplification of cDNA ends (RACE) and cloned in pET-b expression vector. The sequence analysis of HSPA6 gene showed 1932 bp-long open reading frame encoding 643 amino acids. The complete cDNA sequence of the Arabian camel HSPA6 gene was submitted to NCBI GeneBank (accession number HQ214118.1). The BLAST analysis indicated that C. dromedaries HSPA6 gene nucleotides shared high similarity (77–91%) with heat shock gene nucleotide of other mammals. The deduced 643 amino acid sequences (accession number ADO12067.1) showed that the predicted protein has an estimated molecular weight of 70.5 kDa with a predicted isoelectric point (pI) of 6.0. The comparative analyses of camel HSPA6 protein sequences with other mammalian heat shock proteins (HSPs) showed high identity (80–94%). Predicted camel HSPA6 protein structure using Protein 3D structural analysis high similarities with human and mouse HSPs. Taken together, this study indicates that the cDNA sequences of HSPA6 gene and its amino acid and protein structure from the Arabian camel are highly conserved and have similarities with other mammalian species. PMID:21845074

  15. Accurate Prediction of Inducible Transcription Factor Binding Intensities In Vivo

    PubMed Central

    Siepel, Adam; Lis, John T.

    2012-01-01

    DNA sequence and local chromatin landscape act jointly to determine transcription factor (TF) binding intensity profiles. To disentangle these influences, we developed an experimental approach, called protein/DNA binding followed by high-throughput sequencing (PB–seq), that allows the binding energy landscape to be characterized genome-wide in the absence of chromatin. We applied our methods to the Drosophila Heat Shock Factor (HSF), which inducibly binds a target DNA sequence element (HSE) following heat shock stress. PB–seq involves incubating sheared naked genomic DNA with recombinant HSF, partitioning the HSF–bound and HSF–free DNA, and then detecting HSF–bound DNA by high-throughput sequencing. We compared PB–seq binding profiles with ones observed in vivo by ChIP–seq and developed statistical models to predict the observed departures from idealized binding patterns based on covariates describing the local chromatin environment. We found that DNase I hypersensitivity and tetra-acetylation of H4 were the most influential covariates in predicting changes in HSF binding affinity. We also investigated the extent to which DNA accessibility, as measured by digital DNase I footprinting data, could be predicted from MNase–seq data and the ChIP–chip profiles for many histone modifications and TFs, and found GAGA element associated factor (GAF), tetra-acetylation of H4, and H4K16 acetylation to be the most predictive covariates. Lastly, we generated an unbiased model of HSF binding sequences, which revealed distinct biophysical properties of the HSF/HSE interaction and a previously unrecognized substructure within the HSE. These findings provide new insights into the interplay between the genomic sequence and the chromatin landscape in determining transcription factor binding intensity. PMID:22479205

  16. Phylogenetic position of parabasalid symbionts from the termite Calotermes flavicollis based on small subunit rRNA sequences.

    PubMed

    Gerbod, D; Edgcomb, V P; Noël, C; Delgado-Viscogliosi, P; Viscogliosi, E

    2000-09-01

    Small subunit rDNA genes were amplified by polymerase chain reaction using specific primers from mixed-population DNA obtained from the whole hindgut of the termite Calotermes flavicollis. Comparative sequence analysis of the clones revealed two kinds of sequences that were both from parabasalid symbionts. In a molecular tree inferred by distance, parsimony and likelihood methods, and including 27 parabasalid sequences retrieved from the data bases, the sequences of the group II (clones Cf5 and Cf6) were closely related to the Devescovinidae/Calonymphidae species and thus were assigned to the Devescovinidae Foaina. The sequence of the group I (clone Cf1) emerged within the Trichomonadinae and strongly clustered with Tetratrichomonas gallinarum. On the basis of morphological data, the Monocercomonadidae Hexamastix termitis might be the most likely origin of this sequence.

  17. snpAD: An ancient DNA genotype caller.

    PubMed

    Prüfer, Kay

    2018-06-21

    The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.

  18. Molecular detection of fungal pathogens in clinical specimens by 18S rDNA high-throughput screening in comparison to ITS PCR and culture.

    PubMed

    Wagner, K; Springer, B; Pires, V P; Keller, P M

    2018-05-03

    The rising incidence of invasive fungal infections and the expanding spectrum of fungal pathogens makes early and accurate identification of the causative pathogen a daunting task. Diagnostics using molecular markers enable rapid identification of fungi, offer new insights into infectious disease dynamics, and open new possibilities for infectious disease control and prevention. We performed a retrospective study using clinical specimens (N = 233) from patients with suspected fungal infection previously subjected to culture and/or internal transcribed spacer (ITS) PCR. We used these specimens to evaluate a high-throughput screening method for fungal detection using automated DNA extraction (QIASymphony), fungal ribosomal small subunit (18S) rDNA RT-PCR and amplicon sequencing. Fungal sequences were compared with sequences from the curated, commercially available SmartGene IDNS database for pathogen identification. Concordance between 18S rDNA RT-PCR and culture results was 91%, and congruence between 18S rDNA RT-PCR and ITS PCR results was 94%. In addition, 18S rDNA RT-PCR and Sanger sequencing detected fungal pathogens in culture negative (N = 13) and ITS PCR negative specimens (N = 12) from patients with a clinically confirmed fungal infection. Our results support the use of the 18S rDNA RT-PCR diagnostic workflow for rapid and accurate identification of fungal pathogens in clinical specimens.

  19. Methylobacterium phyllosphaerae sp. nov., a pink-pigmented, facultative methylotroph from the phyllosphere of rice.

    PubMed

    Madhaiyan, Munusamy; Poonguzhali, Selvaraj; Kwon, Soon-Wo; Sa, Tong-Min

    2009-01-01

    A pink-pigmented, aerobic, facultatively methylotrophic bacterial strain, CBMB27T, isolated from leaf tissues of rice (Oryza sativa L. 'Dong-Jin'), was analysed using a polyphasic taxonomic approach. Comparative 16S rRNA gene sequence-based phylogenetic analysis placed the strain in a clade with the species Methylobacterium oryzae, Methylobacterium fujisawaense and Methylobacterium mesophilicum; strain CBMB27T showed sequence similarities of 98.3, 98.5 and 97.3 %, respectively, to the type strains of these three species. DNA-DNA hybridization experiments revealed low levels (<38 %) of DNA-DNA relatedness between strain CBMB27T and its closest relatives. The sequence of the 1-aminocyclopropane-1-carboxylate deaminase gene (acdS) in strain CBMB27T differed from those of close relatives. The major fatty acid of the isolate was C(18 : 1)omega7c and the G+C content of the genomic DNA was 66.8 mol%. Based on the results of 16S rRNA gene sequence analysis, DNA-DNA hybridization, and physiological and biochemical characterization, which enabled the isolate to be differentiated from all recognized species of the genus Methylobacterium, it was concluded that strain CBMB27T represents a novel species in the genus Methylobacterium for which the name Methylobacterium phyllosphaerae sp. nov. is proposed (type strain CBMB27T =LMG 24361T =KACC 11716T =DSM 19779T).

  20. Evidence for recombination of mitochondrial DNA in triploid crucian carp.

    PubMed

    Guo, Xinhong; Liu, Shaojun; Liu, Yun

    2006-03-01

    In this study, we report the complete mitochondrial DNA (mtDNA) sequences of the allotetraploid and triploid crucian carp and compare the complete mtDNA sequences between the triploid crucian carp and its female parent Japanese crucian carp and between the triploid crucian carp and its male parent allotetraploid. Our results indicate that the complete mtDNA nucleotide identity (98%) between the triploid crucian carp and its male parent allotetraploid was higher than that (93%) between the triploid crucian carp and its female parent Japanese crucian carp. Moreover, the presence of a pattern of identity and difference at synonymous sites of mitochondrial genomes between the triploid crucian carp and its parents provides direct evidence that triploid crucian carp possessed the recombination mtDNA fragment (12,759 bp) derived from the paternal fish. These results suggest that mtDNA recombination was derived from the fusion of the maternal and paternal mtDNAs. Compared with the haploid egg with one set of genome from the Japanese crucian carp, the diploid sperm with two sets of genomes from the allotetraploid could more easily make its mtDNA fuse with the mtDNA of the haploid egg. In addition, the triple hybrid nature of the triploid crucian carp probably allowed its better mtDNA recombination. In summary, our results provide the first evidence of mtDNA combination in polyploid fish.

  1. Programmable DNA-binding proteins from Burkholderia provide a fresh perspective on the TALE-like repeat domain

    PubMed Central

    de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas

    2014-01-01

    The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. PMID:24792163

  2. Synthesis of DNA

    DOEpatents

    Mariella, Jr., Raymond P.

    2008-11-18

    A method of synthesizing a desired double-stranded DNA of a predetermined length and of a predetermined sequence. Preselected sequence segments that will complete the desired double-stranded DNA are determined. Preselected segment sequences of DNA that will be used to complete the desired double-stranded DNA are provided. The preselected segment sequences of DNA are assembled to produce the desired double-stranded DNA.

  3. Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata.

    PubMed

    Fracassetti, Marco; Griffin, Philippa C; Willi, Yvonne

    2015-01-01

    Sequencing pooled DNA of multiple individuals from a population instead of sequencing individuals separately has become popular due to its cost-effectiveness and simple wet-lab protocol, although some criticism of this approach remains. Here we validated a protocol for pooled whole-genome re-sequencing (Pool-seq) of Arabidopsis lyrata libraries prepared with low amounts of DNA (1.6 ng per individual). The validation was based on comparing single nucleotide polymorphism (SNP) frequencies obtained by pooling with those obtained by individual-based Genotyping By Sequencing (GBS). Furthermore, we investigated the effect of sample number, sequencing depth per individual and variant caller on population SNP frequency estimates. For Pool-seq data, we compared frequency estimates from two SNP callers, VarScan and Snape; the former employs a frequentist SNP calling approach while the latter uses a Bayesian approach. Results revealed concordance correlation coefficients well above 0.8, confirming that Pool-seq is a valid method for acquiring population-level SNP frequency data. Higher accuracy was achieved by pooling more samples (25 compared to 14) and working with higher sequencing depth (4.1× per individual compared to 1.4× per individual), which increased the concordance correlation coefficient to 0.955. The Bayesian-based SNP caller produced somewhat higher concordance correlation coefficients, particularly at low sequencing depth. We recommend pooling at least 25 individuals combined with sequencing at a depth of 100× to produce satisfactory frequency estimates for common SNPs (minor allele frequency above 0.05).

  4. Museum genomics: low-cost and high-accuracy genetic data from historical specimens.

    PubMed

    Rowe, Kevin C; Singhal, Sonal; Macmanes, Matthew D; Ayroles, Julien F; Morelli, Toni Lyn; Rubidge, Emily M; Bi, Ke; Moritz, Craig C

    2011-11-01

    Natural history collections are unparalleled repositories of geographical and temporal variation in faunal conditions. Molecular studies offer an opportunity to uncover much of this variation; however, genetic studies of historical museum specimens typically rely on extracting highly degraded and chemically modified DNA samples from skins, skulls or other dried samples. Despite this limitation, obtaining short fragments of DNA sequences using traditional PCR amplification of DNA has been the primary method for genetic study of historical specimens. Few laboratories have succeeded in obtaining genome-scale sequences from historical specimens and then only with considerable effort and cost. Here, we describe a low-cost approach using high-throughput next-generation sequencing to obtain reliable genome-scale sequence data from a traditionally preserved mammal skin and skull using a simple extraction protocol. We show that single-nucleotide polymorphisms (SNPs) from the genome sequences obtained independently from the skin and from the skull are highly repeatable compared to a reference genome. © 2011 Blackwell Publishing Ltd.

  5. Nanoliter reactors improve multiple displacement amplification of genomes from single cells.

    PubMed

    Marcy, Yann; Ishoey, Thomas; Lasken, Roger S; Stockwell, Timothy B; Walenz, Brian P; Halpern, Aaron L; Beeson, Karen Y; Goldberg, Susanne M D; Quake, Stephen R

    2007-09-01

    Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-microl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.

  6. CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

    PubMed

    Hazes, Bart

    2014-02-28

    Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.

  7. Purification of nanogram-range immunoprecipitated DNA in ChIP-seq application.

    PubMed

    Zhong, Jian; Ye, Zhenqing; Lenz, Samuel W; Clark, Chad R; Bharucha, Adil; Farrugia, Gianrico; Robertson, Keith D; Zhang, Zhiguo; Ordog, Tamas; Lee, Jeong-Heon

    2017-12-21

    Chromatin immunoprecipitation-sequencing (ChIP-seq) is a widely used epigenetic approach for investigating genome-wide protein-DNA interactions in cells and tissues. The approach has been relatively well established but several key steps still require further improvement. As a part of the procedure, immnoprecipitated DNA must undergo purification and library preparation for subsequent high-throughput sequencing. Current ChIP protocols typically yield nanogram quantities of immunoprecipitated DNA mainly depending on the target of interest and starting chromatin input amount. However, little information exists on the performance of reagents used for the purification of such minute amounts of immunoprecipitated DNA in ChIP elution buffer and their effects on ChIP-seq data. Here, we compared DNA recovery, library preparation efficiency, and ChIP-seq results obtained with several commercial DNA purification reagents applied to 1 ng ChIP DNA and also investigated the impact of conditions under which ChIP DNA is stored. We compared DNA recovery of ten commercial DNA purification reagents and phenol/chloroform extraction from 1 to 50 ng of immunopreciptated DNA in ChIP elution buffer. The recovery yield was significantly different with 1 ng of DNA while similar in higher DNA amounts. We also observed that the low nanogram range of purified DNA is prone to loss during storage depending on the type of polypropylene tube used. The immunoprecipitated DNA equivalent to 1 ng of purified DNA was subject to DNA purification and library preparation to evaluate the performance of four better performing purification reagents in ChIP-seq applications. Quantification of library DNAs indicated the selected purification kits have a negligible impact on the efficiency of library preparation. The resulting ChIP-seq data were comparable with the dataset generated by ENCODE consortium and were highly correlated between the data from different purification reagents. This study provides comparative data on commercial DNA purification reagents applied to nanogram-range immunopreciptated ChIP DNA and evidence for the importance of storage conditions of low nanogram-range purified DNA. We verified consistent high performance of a subset of the tested reagents. These results will facilitate the improvement of ChIP-seq methodology for low-input applications.

  8. Kaposi's sarcoma-associated herpesvirus-like DNA sequences in AIDS-related body-cavity-based lymphomas.

    PubMed

    Cesarman, E; Chang, Y; Moore, P S; Said, J W; Knowles, D M

    1995-05-04

    DNA fragments that appeared to belong to an unidentified human herpesvirus were recently found in more than 90 percent of Kaposi's sarcoma lesions associated with the acquired immunodeficiency syndrome (AIDS). These fragments were also found in 6 of 39 tissue samples without Kaposi's sarcoma, including 3 malignant lymphomas, from patients with AIDS, but not in samples from patients without AIDS. We examined the DNA of 193 lymphomas from 42 patients with AIDS and 151 patients who did not have AIDS. We searched the DNA for sequences of Kaposi's sarcoma-associated herpesvirus (KSHV) by Southern blot hybridization, the polymerase chain reaction (PCR), or both. The PCR products in the positive samples were sequences and compared with the KSHV sequences in Kaposi's sarcoma tissues from patients with AIDS. KSHV sequences were identified in eight lymphomas in patients infected with the human immunodeficiency virus. All eight, and only these eight, were body-cavity-based lymphomas--that is, they were characterized by pleural, pericardial, or peritoneal lymphomatous effusions. All eight lymphomas also contained the Epstein-Barr viral genome. KSHV sequences were not found in the other 185 lymphomas. KSHV sequences were 40 to 80 times more abundant in the body-cavity-based lymphomas than in the Kaposi's sarcoma lesions. A high degree of conservation of KSHV sequences in Kaposi's sarcoma and in the eight lymphomas suggests the presence of the same agent in both lesions. The recently discovered KSHV DNA sequences occur in an unusual subgroup of AIDS-related B-cell lymphomas, but not in any other lymphoid neoplasm studied thus far. Our finding strongly suggests that a novel herpesvirus has a pathogenic role in AIDS-related body-cavity-based lymphomas.

  9. Nanopore Technology: A Simple, Inexpensive, Futuristic Technology for DNA Sequencing.

    PubMed

    Gupta, P D

    2016-10-01

    In health care, importance of DNA sequencing has been fully established. Sanger's Capillary Electrophoresis DNA sequencing methodology is time consuming, cumbersome, hence become more expensive. Lately, because of its versatility DNA sequencing became house hold name, and therefore, there is an urgent need of simple, fast, inexpensive, DNA sequencing technology. In the beginning of this century efforts were made, and Nanopore DNA sequencing technology was developed; still it is infancy, nevertheless, it is the futuristic technology.

  10. The genome-wide DNA sequence specificity of the anti-tumour drug bleomycin in human cells.

    PubMed

    Murray, Vincent; Chen, Jon K; Tanaka, Mark M

    2016-07-01

    The cancer chemotherapeutic agent, bleomycin, cleaves DNA at specific sites. For the first time, the genome-wide DNA sequence specificity of bleomycin breakage was determined in human cells. Utilising Illumina next-generation DNA sequencing techniques, over 200 million bleomycin cleavage sites were examined to elucidate the bleomycin genome-wide DNA selectivity. The genome-wide bleomycin cleavage data were analysed by four different methods to determine the cellular DNA sequence specificity of bleomycin strand breakage. For the most highly cleaved DNA sequences, the preferred site of bleomycin breakage was at 5'-GT* dinucleotide sequences (where the asterisk indicates the bleomycin cleavage site), with lesser cleavage at 5'-GC* dinucleotides. This investigation also determined longer bleomycin cleavage sequences, with preferred cleavage at 5'-GT*A and 5'- TGT* trinucleotide sequences, and 5'-TGT*A tetranucleotides. For cellular DNA, the hexanucleotide DNA sequence 5'-RTGT*AY (where R is a purine and Y is a pyrimidine) was the most highly cleaved DNA sequence. It was striking that alternating purine-pyrimidine sequences were highly cleaved by bleomycin. The highest intensity cleavage sites in cellular and purified DNA were very similar although there were some minor differences. Statistical nucleotide frequency analysis indicated a G nucleotide was present at the -3 position (relative to the cleavage site) in cellular DNA but was absent in purified DNA.

  11. DEPPDB - DNA electrostatic potential properties database. Electrostatic properties of genome DNA elements.

    PubMed

    Osypov, Alexander A; Krutinin, Gleb G; Krutinina, Eugenia A; Kamzolova, Svetlana G

    2012-04-01

    Electrostatic properties of genome DNA are important to its interactions with different proteins, in particular, related to transcription. DEPPDB - DNA Electrostatic Potential (and other Physical) Properties Database - provides information on the electrostatic and other physical properties of genome DNA combined with its sequence and annotation of biological and structural properties of genomes and their elements. Genomes are organized on taxonomical basis, supporting comparative and evolutionary studies. Currently, DEPPDB contains all completely sequenced bacterial, viral, mitochondrial, and plastids genomes according to the NCBI RefSeq, and some model eukaryotic genomes. Data for promoters, regulation sites, binding proteins, etc., are incorporated from established DBs and literature. The database is complemented by analytical tools. User sequences calculations are available. Case studies discovered electrostatics complementing DNA bending in E.coli plasmid BNT2 promoter functioning, possibly affecting host-environment metabolic switch. Transcription factors binding sites gravitate to high potential regions, confirming the electrostatics universal importance in protein-DNA interactions beyond the classical promoter-RNA polymerase recognition and regulation. Other genome elements, such as terminators, also show electrostatic peculiarities. Most intriguing are gene starts, exhibiting taxonomic correlations. The necessity of the genome electrostatic properties studies is discussed.

  12. A sequence-dependent rigid-base model of DNA

    NASA Astrophysics Data System (ADS)

    Gonzalez, O.; Petkevičiutė, D.; Maddocks, J. H.

    2013-02-01

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can successfully predict the nonlocal changes in the minimum energy configuration of an oligomer that are consequent upon a local change of sequence at the level of a single point mutation.

  13. A sequence-dependent rigid-base model of DNA.

    PubMed

    Gonzalez, O; Petkevičiūtė, D; Maddocks, J H

    2013-02-07

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can successfully predict the nonlocal changes in the minimum energy configuration of an oligomer that are consequent upon a local change of sequence at the level of a single point mutation.

  14. Bisulfite Conversion of DNA: Performance Comparison of Different Kits and Methylation Quantitation of Epigenetic Biomarkers that Have the Potential to Be Used in Non-Invasive Prenatal Testing

    PubMed Central

    Leontiou, Chrysanthia A.; Hadjidaniel, Michael D.; Mina, Petros; Antoniou, Pavlos; Ioannides, Marios; Patsalis, Philippos C.

    2015-01-01

    Introduction Epigenetic alterations, including DNA methylation, play an important role in the regulation of gene expression. Several methods exist for evaluating DNA methylation, but bisulfite sequencing remains the gold standard by which base-pair resolution of CpG methylation is achieved. The challenge of the method is that the desired outcome (conversion of unmethylated cytosines) positively correlates with the undesired side effects (DNA degradation and inappropriate conversion), thus several commercial kits try to adjust a balance between the two. The aim of this study was to compare the performance of four bisulfite conversion kits [Premium Bisulfite kit (Diagenode), EpiTect Bisulfite kit (Qiagen), MethylEdge Bisulfite Conversion System (Promega) and BisulFlash DNA Modification kit (Epigentek)] regarding conversion efficiency, DNA degradation and conversion specificity. Methods Performance was tested by combining fully methylated and fully unmethylated λ-DNA controls in a series of spikes by means of Sanger sequencing (0%, 25%, 50% and 100% methylated spikes) and Next-Generation Sequencing (0%, 3%, 5%, 7%, 10%, 25%, 50% and 100% methylated spikes). We also studied the methylation status of two of our previously published differentially methylated regions (DMRs) at base resolution by using spikes of chorionic villus sample in whole blood. Results The kits studied showed different but comparable results regarding DNA degradation, conversion efficiency and conversion specificity. However, the best performance was observed with the MethylEdge Bisulfite Conversion System (Promega) followed by the Premium Bisulfite kit (Diagenode). The DMRs, EP6 and EP10, were confirmed to be hypermethylated in the CVS and hypomethylated in whole blood. Conclusion Our findings indicate that the MethylEdge Bisulfite Conversion System (Promega) was shown to have the best performance among the kits. In addition, the methylation level of two of our DMRs, EP6 and EP10, was confirmed. Finally, we showed that bisulfite amplicon sequencing is a suitable approach for methylation analysis of targeted regions. PMID:26247357

  15. Bisulfite Conversion of DNA: Performance Comparison of Different Kits and Methylation Quantitation of Epigenetic Biomarkers that Have the Potential to Be Used in Non-Invasive Prenatal Testing.

    PubMed

    Leontiou, Chrysanthia A; Hadjidaniel, Michael D; Mina, Petros; Antoniou, Pavlos; Ioannides, Marios; Patsalis, Philippos C

    2015-01-01

    Epigenetic alterations, including DNA methylation, play an important role in the regulation of gene expression. Several methods exist for evaluating DNA methylation, but bisulfite sequencing remains the gold standard by which base-pair resolution of CpG methylation is achieved. The challenge of the method is that the desired outcome (conversion of unmethylated cytosines) positively correlates with the undesired side effects (DNA degradation and inappropriate conversion), thus several commercial kits try to adjust a balance between the two. The aim of this study was to compare the performance of four bisulfite conversion kits [Premium Bisulfite kit (Diagenode), EpiTect Bisulfite kit (Qiagen), MethylEdge Bisulfite Conversion System (Promega) and BisulFlash DNA Modification kit (Epigentek)] regarding conversion efficiency, DNA degradation and conversion specificity. Performance was tested by combining fully methylated and fully unmethylated λ-DNA controls in a series of spikes by means of Sanger sequencing (0%, 25%, 50% and 100% methylated spikes) and Next-Generation Sequencing (0%, 3%, 5%, 7%, 10%, 25%, 50% and 100% methylated spikes). We also studied the methylation status of two of our previously published differentially methylated regions (DMRs) at base resolution by using spikes of chorionic villus sample in whole blood. The kits studied showed different but comparable results regarding DNA degradation, conversion efficiency and conversion specificity. However, the best performance was observed with the MethylEdge Bisulfite Conversion System (Promega) followed by the Premium Bisulfite kit (Diagenode). The DMRs, EP6 and EP10, were confirmed to be hypermethylated in the CVS and hypomethylated in whole blood. Our findings indicate that the MethylEdge Bisulfite Conversion System (Promega) was shown to have the best performance among the kits. In addition, the methylation level of two of our DMRs, EP6 and EP10, was confirmed. Finally, we showed that bisulfite amplicon sequencing is a suitable approach for methylation analysis of targeted regions.

  16. spads 1.0: a toolbox to perform spatial analyses on DNA sequence data sets.

    PubMed

    Dellicour, Simon; Mardulyn, Patrick

    2014-05-01

    SPADS 1.0 (for 'Spatial and Population Analysis of DNA Sequences') is a population genetic toolbox for characterizing genetic variability within and among populations from DNA sequences. In view of the drastic increase in genetic information available through sequencing methods, spads was specifically designed to deal with multilocus data sets of DNA sequences. It computes several summary statistics from populations or groups of populations, performs input file conversions for other population genetic programs and implements locus-by-locus and multilocus versions of two clustering algorithms to study the genetic structure of populations. The toolbox also includes two MATLAB and r functions, GDISPAL and GDIVPAL, to display differentiation and diversity patterns across landscapes. These functions aim to generate interpolating surfaces based on multilocus distance and diversity indices. In the case of multiple loci, such surfaces can represent a useful alternative to multiple pie charts maps traditionally used in phylogeography to represent the spatial distribution of genetic diversity. These coloured surfaces can also be used to compare different data sets or different diversity and/or distance measures estimated on the same data set. © 2013 John Wiley & Sons Ltd.

  17. DNA viewed as an out-of-equilibrium structure

    NASA Astrophysics Data System (ADS)

    Provata, A.; Nicolis, C.; Nicolis, G.

    2014-05-01

    The complexity of the primary structure of human DNA is explored using methods from nonequilibrium statistical mechanics, dynamical systems theory, and information theory. A collection of statistical analyses is performed on the DNA data and the results are compared with sequences derived from different stochastic processes. The use of χ2 tests shows that DNA can not be described as a low order Markov chain of order up to r =6. Although detailed balance seems to hold at the level of a binary alphabet, it fails when all four base pairs are considered, suggesting spatial asymmetry and irreversibility. Furthermore, the block entropy does not increase linearly with the block size, reflecting the long-range nature of the correlations in the human genomic sequences. To probe locally the spatial structure of the chain, we study the exit distances from a specific symbol, the distribution of recurrence distances, and the Hurst exponent, all of which show power law tails and long-range characteristics. These results suggest that human DNA can be viewed as a nonequilibrium structure maintained in its state through interactions with a constantly changing environment. Based solely on the exit distance distribution accounting for the nonequilibrium statistics and using the Monte Carlo rejection sampling method, we construct a model DNA sequence. This method allows us to keep both long- and short-range statistical characteristics of the native DNA data. The model sequence presents the same characteristic exponents as the natural DNA but fails to capture spatial correlations and point-to-point details.

  18. DNA sequence-selective C8-linked pyrrolobenzodiazepine-heterocyclic polyamide conjugates show anti-tubercular-specific activities.

    PubMed

    Brucoli, Federico; Guzman, Juan D; Basher, Mohammad A; Evangelopoulos, Dimitrios; McMahon, Eleanor; Munshi, Tulika; McHugh, Timothy D; Fox, Keith R; Bhakta, Sanjib

    2016-12-01

    New chemotherapeutic agents with novel mechanisms of action are in urgent need to combat the tuberculosis pandemic. A library of 12 C8-linked pyrrolo[2,1-c][1,4]benzodiazepine (PBD)-heterocyclic polyamide conjugates (1-12) was evaluated for anti-tubercular activity and DNA sequence selectivity. The PBD conjugates were screened against slow-growing Mycobacterium bovis Bacillus Calmette-Guérin and M. tuberculosis H 37 Rv, and fast-growing Escherichia coli, Pseudomonas putida and Rhodococcus sp. RHA1 bacteria. DNase I footprinting and DNA thermal denaturation experiments were used to determine the molecules' DNA recognition properties. The PBD conjugates were highly selective for the mycobacterial strains and exhibited significant growth inhibitory activity against the pathogenic M. tuberculosis H 37 Rv, with compound 4 showing MIC values (MIC=0.08 mg l -1 ) similar to those of rifampin and isoniazid. DNase I footprinting results showed that the PBD conjugates with three heterocyclic moieties had enhanced sequence selectivity and produced larger footprints, with distinct cleavage patterns compared with the two-heterocyclic chain PBD conjugates. DNA melting experiments indicated a covalent binding of the PBD conjugates to two AT-rich DNA-duplexes containing either a central GGATCC or GTATAC sequence, and showed that the polyamide chains affect the interactions of the molecules with DNA. The PBD-C8 conjugates tested in this study have a remarkable anti-mycobacterial activity and can be further developed as DNA-targeted anti-tubercular drugs.

  19. Evolution in the block: common elements of 5S rDNA organization and evolutionary patterns in distant fish genera.

    PubMed

    Campo, Daniel; García-Vázquez, Eva

    2012-01-01

    The 5S rDNA is organized in the genome as tandemly repeated copies of a structural unit composed of a coding sequence plus a nontranscribed spacer (NTS). The coding region is highly conserved in the evolution, whereas the NTS vary in both length and sequence. It has been proposed that 5S rRNA genes are members of a gene family that have arisen through concerted evolution. In this study, we describe the molecular organization and evolution of the 5S rDNA in the genera Lepidorhombus and Scophthalmus (Scophthalmidae) and compared it with already known 5S rDNA of the very different genera Merluccius (Merluccidae) and Salmo (Salmoninae), to identify common structural elements or patterns for understanding 5S rDNA evolution in fish. High intra- and interspecific diversity within the 5S rDNA family in all the genera can be explained by a combination of duplications, deletions, and transposition events. Sequence blocks with high similarity in all the 5S rDNA members across species were identified for the four studied genera, with evidences of intense gene conversion within noncoding regions. We propose a model to explain the evolution of the 5S rDNA, in which the evolutionary units are blocks of nucleotides rather than the entire sequences or single nucleotides. This model implies a "two-speed" evolution: slow within blocks (homogenized by recombination) and fast within the gene family (diversified by duplications and deletions).

  20. DNA viewed as an out-of-equilibrium structure.

    PubMed

    Provata, A; Nicolis, C; Nicolis, G

    2014-05-01

    The complexity of the primary structure of human DNA is explored using methods from nonequilibrium statistical mechanics, dynamical systems theory, and information theory. A collection of statistical analyses is performed on the DNA data and the results are compared with sequences derived from different stochastic processes. The use of χ^{2} tests shows that DNA can not be described as a low order Markov chain of order up to r=6. Although detailed balance seems to hold at the level of a binary alphabet, it fails when all four base pairs are considered, suggesting spatial asymmetry and irreversibility. Furthermore, the block entropy does not increase linearly with the block size, reflecting the long-range nature of the correlations in the human genomic sequences. To probe locally the spatial structure of the chain, we study the exit distances from a specific symbol, the distribution of recurrence distances, and the Hurst exponent, all of which show power law tails and long-range characteristics. These results suggest that human DNA can be viewed as a nonequilibrium structure maintained in its state through interactions with a constantly changing environment. Based solely on the exit distance distribution accounting for the nonequilibrium statistics and using the Monte Carlo rejection sampling method, we construct a model DNA sequence. This method allows us to keep both long- and short-range statistical characteristics of the native DNA data. The model sequence presents the same characteristic exponents as the natural DNA but fails to capture spatial correlations and point-to-point details.

  1. Digital signal processing methods for biosequence comparison.

    PubMed Central

    Benson, D C

    1990-01-01

    A method is discussed for DNA or protein sequence comparison using a finite field fast Fourier transform, a digital signal processing technique; and statistical methods are discussed for analyzing the output of this algorithm. This method compares two sequences of length N in computing time proportional to N log N compared to N2 for methods currently used. This method makes it feasible to compare very long sequences. An example is given to show that the method correctly identifies sites of known homology. PMID:2349096

  2. Genome sequence diversity and clues to the evolution of variola (smallpox) virus.

    PubMed

    Esposito, Joseph J; Sammons, Scott A; Frace, A Michael; Osborne, John D; Olsen-Rasmussen, Melissa; Zhang, Ming; Govil, Dhwani; Damon, Inger K; Kline, Richard; Laker, Miriam; Li, Yu; Smith, Geoffrey L; Meyer, Hermann; Leduc, James W; Wohlhueter, Robert M

    2006-08-11

    Comparative genomics of 45 epidemiologically varied variola virus isolates from the past 30 years of the smallpox era indicate low sequence diversity, suggesting that there is probably little difference in the isolates' functional gene content. Phylogenetic clustering inferred three clades coincident with their geographical origin and case-fatality rate; the latter implicated putative proteins that mediate viral virulence differences. Analysis of the viral linear DNA genome suggests that its evolution involved direct descent and DNA end-region recombination events. Knowing the sequences will help understand the viral proteome and improve diagnostic test precision, therapeutics, and systems for their assessment.

  3. High-Resolution Whole-Genome Sequencing Reveals That Specific Chromatin Domains from Most Human Chromosomes Associate with Nucleoli

    PubMed Central

    van Koningsbruggen, Silvana; Gierliński, Marek; Schofield, Pietá; Martin, David; Barton, Geoffey J.; Ariyurek, Yavuz; den Dunnen, Johan T.

    2010-01-01

    The nuclear space is mostly occupied by chromosome territories and nuclear bodies. Although this organization of chromosomes affects gene function, relatively little is known about the role of nuclear bodies in the organization of chromosomal regions. The nucleolus is the best-studied subnuclear structure and forms around the rRNA repeat gene clusters on the acrocentric chromosomes. In addition to rDNA, other chromatin sequences also surround the nucleolar surface and may even loop into the nucleolus. These additional nucleolar-associated domains (NADs) have not been well characterized. We present here a whole-genome, high-resolution analysis of chromatin endogenously associated with nucleoli. We have used a combination of three complementary approaches, namely fluorescence comparative genome hybridization, high-throughput deep DNA sequencing and photoactivation combined with time-lapse fluorescence microscopy. The data show that specific sequences from most human chromosomes, in addition to the rDNA repeat units, associate with nucleoli in a reproducible and heritable manner. NADs have in common a high density of AT-rich sequence elements, low gene density and a statistically significant enrichment in transcriptionally repressed genes. Unexpectedly, both the direct DNA sequencing and fluorescence photoactivation data show that certain chromatin loci can specifically associate with either the nucleolus, or the nuclear envelope. PMID:20826608

  4. High-resolution whole-genome sequencing reveals that specific chromatin domains from most human chromosomes associate with nucleoli.

    PubMed

    van Koningsbruggen, Silvana; Gierlinski, Marek; Schofield, Pietá; Martin, David; Barton, Geoffey J; Ariyurek, Yavuz; den Dunnen, Johan T; Lamond, Angus I

    2010-11-01

    The nuclear space is mostly occupied by chromosome territories and nuclear bodies. Although this organization of chromosomes affects gene function, relatively little is known about the role of nuclear bodies in the organization of chromosomal regions. The nucleolus is the best-studied subnuclear structure and forms around the rRNA repeat gene clusters on the acrocentric chromosomes. In addition to rDNA, other chromatin sequences also surround the nucleolar surface and may even loop into the nucleolus. These additional nucleolar-associated domains (NADs) have not been well characterized. We present here a whole-genome, high-resolution analysis of chromatin endogenously associated with nucleoli. We have used a combination of three complementary approaches, namely fluorescence comparative genome hybridization, high-throughput deep DNA sequencing and photoactivation combined with time-lapse fluorescence microscopy. The data show that specific sequences from most human chromosomes, in addition to the rDNA repeat units, associate with nucleoli in a reproducible and heritable manner. NADs have in common a high density of AT-rich sequence elements, low gene density and a statistically significant enrichment in transcriptionally repressed genes. Unexpectedly, both the direct DNA sequencing and fluorescence photoactivation data show that certain chromatin loci can specifically associate with either the nucleolus, or the nuclear envelope.

  5. [Phylogenetic analysis of closely related Leuconostoc citreum species based on partial housekeeping genes].

    PubMed

    Lv, Qiang; Chen, Ming; Xu, Haiyan; Song, Yuqin; Sun, Zhihong; Dan, Tong; Sun, Tiansong

    2013-07-04

    Using the 16S rRNA, dnaA, murC and pyrG gene sequences, we identified the phylogenetic relationship among closely related Leuconostoc citreum species. Seven Leu. citreum strains originally isolated from sourdough were characterized by PCR methods to amplify the dnaA, murC and pyrG gene sequences, which were determined to assess the suitability as phylogenetic markers. Then, we estimated the genetic distance and constructed the phylogenetic trees including 16S rRNA and above mentioned three housekeeping genes combining with published corresponding sequences. By comparing the phylogenetic trees, the topology of three housekeeping genes trees were consistent with that of 16S rRNA gene. The homology of closely related Leu. citreum species among dnaA, murC, pyrG and 16S rRNA gene sequences were different, ranged from75.5% to 97.2%, 50.2% to 99.7%, 65.0% to 99.8% and 98.5% 100%, respectively. The phylogenetic relationship of three housekeeping genes sequences were highly consistent with the results of 16S rRNA gene sequence, while the genetic distance of these housekeeping genes were extremely high than 16S rRNA gene. Consequently, the dnaA, murC and pyrG gene are suitable for classification and identification closely related Leu. citreum species.

  6. Lineage divergence detected in the malaria vector Anopheles marajoara (Diptera: Culicidae) in Amazonian Brazil

    PubMed Central

    2010-01-01

    Background Cryptic species complexes are common among anophelines. Previous phylogenetic analysis based on the complete mtDNA COI gene sequences detected paraphyly in the Neotropical malaria vector Anopheles marajoara. The "Folmer region" detects a single taxon using a 3% divergence threshold. Methods To test the paraphyletic hypothesis and examine the utility of the Folmer region, genealogical trees based on a concatenated (white + 3' COI sequences) dataset and pairwise differentiation of COI fragments were examined. The population structure and demographic history were based on partial COI sequences for 294 individuals from 14 localities in Amazonian Brazil. 109 individuals from 12 localities were sequenced for the nDNA white gene, and 57 individuals from 11 localities were sequenced for the ribosomal DNA (rDNA) internal transcribed spacer 2 (ITS2). Results Distinct A. marajoara lineages were detected by combined genealogical analysis and were also supported among COI haplotypes using a median joining network and AMOVA, with time since divergence during the Pleistocene (<100,000 ya). COI sequences at the 3' end were more variable, demonstrating significant pairwise differentiation (3.82%) compared to the more moderate 2.92% detected by the Folmer region. Lineage 1 was present in all localities, whereas lineage 2 was restricted mainly to the west. Mismatch distributions for both lineages were bimodal, likely due to multiple colonization events and spatial expansion (~798 - 81,045 ya). There appears to be gene flow within, not between lineages, and a partial barrier was detected near Rio Jari in Amapá state, separating western and eastern populations. In contrast, both nDNA data sets (white gene sequences with or without the retention of the 4th intron, and ITS2 sequences and length) detected a single A. marajoara lineage. Conclusions Strong support for combined data with significant differentiation detected in the COI and absent in the nDNA suggest that the divergence is recent, and detectable only by the faster evolving mtDNA. A within subgenus threshold of >2% may be more appropriate among sister taxa in cryptic anopheline complexes than the standard 3%. Differences in demographic history and climatic changes may have contributed to mtDNA lineage divergence in A. marajoara. PMID:20929572

  7. Evolution of nuclear rDNA ITS sequences in the Cladophora albida/sericea clade (Chlorophyta).

    PubMed

    Bakker, F T; Olsen, J L; Stam, W T

    1995-06-01

    Ribosomal DNA ITS sequences were compared among 13 different species and biogeographic isolates from the monophyletic "albida/sericea clade" in the green algal genus Cladophora. Six distinct ITS sequence types were found, characterized by multiple insertions and deletions and high levels of nucleotide substitution. Conserved domains within the ITS regions indicate the presence of ITS secondary structure. Low transition/transversion ratios among the six types and nearly symmetrical tree-length frequency distributions indicate some saturation, and low phylogenetic signal. Although branching order among five of the six ITS sequence types could not be resolved, estimates of ITS sequence divergence as compared with 18S divergence in a subset of the taxa suggests that the origin of the different ITS types is probably in the mid-Miocene (12 Ma ago) but that biogeographic isolates within a single ITS type (including both Pacific and Atlantic representatives) have probably dispersed on a time scale of thousands rather than millions of years.

  8. Mitochondrial Mutations in Subjects with Psychiatric Disorders

    PubMed Central

    Magnan, Christophe; van Oven, Mannis; Baldi, Pierre; Myers, Richard M.; Barchas, Jack D.; Schatzberg, Alan F.; Watson, Stanley J.; Akil, Huda; Bunney, William E.; Vawter, Marquis P.

    2015-01-01

    A considerable body of evidence supports the role of mitochondrial dysfunction in psychiatric disorders and mitochondrial DNA (mtDNA) mutations are known to alter brain energy metabolism, neurotransmission, and cause neurodegenerative disorders. Genetic studies focusing on common nuclear genome variants associated with these disorders have produced genome wide significant results but those studies have not directly studied mtDNA variants. The purpose of this study is to investigate, using next generation sequencing, the involvement of mtDNA variation in bipolar disorder, schizophrenia, major depressive disorder, and methamphetamine use. MtDNA extracted from multiple brain regions and blood were sequenced (121 mtDNA samples with an average of 8,800x coverage) and compared to an electronic database containing 26,850 mtDNA genomes. We confirmed novel and rare variants, and confirmed next generation sequencing error hotspots by traditional sequencing and genotyping methods. We observed a significant increase of non-synonymous mutations found in individuals with schizophrenia. Novel and rare non-synonymous mutations were found in psychiatric cases in mtDNA genes: ND6, ATP6, CYTB, and ND2. We also observed mtDNA heteroplasmy in brain at a locus previously associated with schizophrenia (T16519C). Large differences in heteroplasmy levels across brain regions within subjects suggest that somatic mutations accumulate differentially in brain regions. Finally, multiplasmy, a heteroplasmic measure of repeat length, was observed in brain from selective cases at a higher frequency than controls. These results offer support for increased rates of mtDNA substitutions in schizophrenia shown in our prior results. The variable levels of heteroplasmic/multiplasmic somatic mutations that occur in brain may be indicators of genetic instability in mtDNA. PMID:26011537

  9. Molecular characterization and phylogenetic analysis of a yak (Bos grunniens) κ-casein cDNA from lactating mammary gland.

    PubMed

    Bai, W L; Yin, R H; Dou, Q L; Jiang, W Q; Zhao, S J; Ma, Z J; Luo, G B; Zhao, Z H

    2011-04-01

    κ-Casein is one of the major proteins in the milk of mammals. It plays an important role in determining the size and specific function of milk micelles. We have previously identified and characterized a genetic variant of yak κ-casein by evaluating genomic DNA. Here, we isolate and characterize a yak κ-casein cDNA harboring the full-length open reading frame (ORF) from lactating mammary gland. Total RNA was extracted from mammary tissue of lactating female yak, and the κ-casein cDNA were synthesized by RT-PCR technique, then cloned and sequenced. The obtained cDNA of 660-bp contained an ORF sufficient to encode the entire amino acid sequence of κ-casein precursor protein consisting of 190 amino acids with a signal peptide of 21 amino acids. Yak κ-casein has a predicted molecular mass of 19,006.588 Da with a calculated isoelectric point of 7.245. Compared with the corresponding sequences in GenBank of cattle, buffalo, sheep, goat, Arabian camel, horse, and rabbit, yak κ-casein sequence had identity of 64.76-98.78% in cDNA, and identity of 44.79-98.42% and similarity of 53.65-98.42% in deduced amino acids, revealing a high homology with the other livestock species. Based on κ-casein cDNA sequences, the phylogenetic analysis indicated that yak κ-casein had a close relationship with that of cattle. This work might be useful in the genetic engineering researches for yak κ-casein.

  10. Sequence and Structure Dependent DNA-DNA Interactions

    NASA Astrophysics Data System (ADS)

    Kopchick, Benjamin; Qiu, Xiangyun

    Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.

  11. Treatment of Mestastatic Breast Cancer by Photodynamic Therapy Induced Anti-Tumor Immunity in a Murine Model

    DTIC Science & Technology

    2005-12-01

    dinucleotide and were more common in the genomes of bacteria compared to humans. Immunostimulatory sequences in bacterial ( bDNA ) that are structurally defined...stimulates B cells, natural killer (NK) cells, dendritic cells (DC), and macrophages, regardless of whether the DNA is in the form of genomic bDNA or

  12. 16S rRNA Gene Sequence Analysis of Drinking Water Using RNA and DNA Extracts as Targets for Clone Library Development

    EPA Science Inventory

    The bacterial composition of chlorinated drinking water was analyzed using 16S rRNA gene clone libraries derived from DNA extracts of 12 samples and compared to clone libraries previously generated using RNA extracts from the same samples. Phylogenetic analysis of 761 DNA-based ...

  13. ITS1: a DNA barcode better than ITS2 in eukaryotes?

    PubMed

    Wang, Xin-Cun; Liu, Chang; Huang, Liang; Bengtsson-Palme, Johan; Chen, Haimei; Zhang, Jian-Hui; Cai, Dayong; Li, Jian-Qin

    2015-05-01

    A DNA barcode is a short piece of DNA sequence used for species determination and discovery. The internal transcribed spacer (ITS/ITS2) region has been proposed as the standard DNA barcode for fungi and seed plants and has been widely used in DNA barcoding analyses for other biological groups, for example algae, protists and animals. The ITS region consists of both ITS1 and ITS2 regions. Here, a large-scale meta-analysis was carried out to compare ITS1 and ITS2 from three aspects: PCR amplification, DNA sequencing and species discrimination, in terms of the presence of DNA barcoding gaps, species discrimination efficiency, sequence length distribution, GC content distribution and primer universality. In total, 85 345 sequence pairs in 10 major groups of eukaryotes, including ascomycetes, basidiomycetes, liverworts, mosses, ferns, gymnosperms, monocotyledons, eudicotyledons, insects and fishes, covering 611 families, 3694 genera, and 19 060 species, were analysed. Using similarity-based methods, we calculated species discrimination efficiencies for ITS1 and ITS2 in all major groups, families and genera. Using Fisher's exact test, we found that ITS1 has significantly higher efficiencies than ITS2 in 17 of the 47 families and 20 of the 49 genera, which are sample-rich. By in silico PCR amplification evaluation, primer universality of the extensively applied ITS1 primers was found superior to that of ITS2 primers. Additionally, shorter length of amplification product and lower GC content was discovered to be two other advantages of ITS1 for sequencing. In summary, ITS1 represents a better DNA barcode than ITS2 for eukaryotic species. © 2014 John Wiley & Sons Ltd.

  14. Multiplexed enrichment of rare DNA variants via sequence-selective and temperature-robust amplification

    PubMed Central

    Wu, Lucia R.; Chen, Sherry X.; Wu, Yalei; Patel, Abhijit A.; Zhang, David Yu

    2018-01-01

    Rare DNA-sequence variants hold important clinical and biological information, but existing detection techniques are expensive, complex, allele-specific, or don’t allow for significant multiplexing. Here, we report a temperature-robust polymerase-chain-reaction method, which we term blocker displacement amplification (BDA), that selectively amplifies all sequence variants, including single-nucleotide variants (SNVs), within a roughly 20-nucleotide window by 1,000-fold over wild-type sequences. This allows for easy detection and quantitation of hundreds of potential variants originally at ≤0.1% in allele frequency. BDA is compatible with inexpensive thermocycler instrumentation and employs a rationally designed competitive hybridization reaction to achieve comparable enrichment performance across annealing temperatures ranging from 56 °C to 64 °C. To show the sequence generality of BDA, we demonstrate enrichment of 156 SNVs and the reliable detection of single-digit copies. We also show that the BDA detection of rare driver mutations in cell-free DNA samples extracted from the blood plasma of lung-cancer patients is highly consistent with deep sequencing using molecular lineage tags, with a receiver operator characteristic accuracy of 95%. PMID:29805844

  15. Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma.

    PubMed

    Wrzeszczynski, Kazimierz O; Frank, Mayu O; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A; Moore Vogel, Julia L; Bruce, Jeffrey N; Lassman, Andrew B; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V; Zody, Michael C; Jobanputra, Vaidehi; Royyuru, Ajay K; Darnell, Robert B

    2017-08-01

    To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. NCT02725684.

  16. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    PubMed

    Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

    2014-07-04

    Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was shared by mitochondrial genomes of CMS and male-fertile pepper lines, extensive genome rearrangements were detected. CMS candidate genes located on the edges of highly-rearranged CMS-specific DNA regions and near to repeat sequences. These characteristics were detected among CMS-associated genes in other species, implying a common mechanism might be involved in the evolution of CMS-associated genes.

  17. ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis

    PubMed Central

    2011-01-01

    Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108

  18. Rapid in silico cloning of genes using expressed sequence tags (ESTs).

    PubMed

    Gill, R W; Sanseau, P

    2000-01-01

    Expressed sequence tags (ESTs) are short single-pass DNA sequences obtained from either end of cDNA clones. These ESTs are derived from a vast number of cDNA libraries obtained from different species. Human ESTs are the bulk of the data and have been widely used to identify new members of gene families, as markers on the human chromosomes, to discover polymorphism sites and to compare expression patterns in different tissues or pathologies states. Information strategies have been devised to query EST databases. Since most of the analysis is performed with a computer, the term "in silico" strategy has been coined. In this chapter we will review the current status of EST databases, the pros and cons of EST-type data and describe possible strategies to retrieve meaningful information.

  19. DNA barcoding of morphologically characterized mosquitoes belonging to the subfamily Culicinae from Sri Lanka.

    PubMed

    Weeraratne, Thilini Chathurika; Surendran, Sinnathamby Noble; Parakrama Karunaratne, S H P

    2018-04-25

    Vectors of mosquito-borne diseases in Sri Lanka, except for malaria, belong to the subfamily Culicinae, which includes nearly 84% of the mosquito fauna of the country. Hence, accurate and precise species identification of culicine mosquitoes is a crucial factor in implementing effective vector control strategies. During the present study, a combined effort using morphology and DNA barcoding was made to characterize mosquitoes of the subfamily Culicinae for the first time from nine districts of Sri Lanka. Cytochrome c oxidase subunit 1 (cox1) gene from the mitochondrial genome and the internal transcribed spacer 2 (ITS2) region from the nuclear ribosomal DNA were used for molecular characterization. According to morphological identification, the field collected adult mosquitoes belonged to 5 genera and 14 species, i.e. Aedes aegypti, Ae. albopictus, Ae. pallidostriatus, Aedes sp. 1, Armigeres sp. 1, Culex bitaeniorhynchus, Cx. fuscocephala, Cx. gelidus, Cx. pseudovishnui, Cx. quinquefasciatus, Cx. tritaeniorhynchus, Cx. whitmorei, Mansonia uniformis and Mimomyia chamberlaini. Molecular analyses of 62 cox1 and 36 ITS2 sequences were exclusively comparable with the morphological identifications of all the species except for Ae. pallidostriatus and Aedes sp. 1. Although the species identification of Armigeres sp. 1 specimens using morphological features was not possible during this study, DNA barcodes of the specimens matched 100% with the publicly available Ar. subalbatus sequences, giving their species status. Analysis of all the cox1 sequences (14 clades supported by strong bootstrap value in the Neighbor-Joining tree and interspecific distances of > 3%) showed the presence of 14 different species. This is the first available DNA sequence in the GenBank records for morphologically identified Ae. pallidostriatus. Aedes sp. 1 could not be identified morphologically or by publicly available sequences. Aedes aegypti, Ae. albopictus and all Culex species reported during the current study are vectors of human diseases. All these vector species showed comparatively high diversity. The current study reflects the significance of integrated systematic approach and use of cox1 and ITS genetic markers in mosquito taxonomy. Results of DNA barcoding were comparable with morphological identifications and, more importantly, DNA barcoding could accurately identify the species in the instances where the traditional morphological identification failed due to indistinguishable characters of damaged specimens and the presence of subspecies.

  20. Authentication of Cordyceps sinensis by DNA Analyses: Comparison of ITS Sequence Analysis and RAPD-Derived Molecular Markers.

    PubMed

    Lam, Kelly Y C; Chan, Gallant K L; Xin, Gui-Zhong; Xu, Hong; Ku, Chuen-Fai; Chen, Jian-Ping; Yao, Ping; Lin, Huang-Quan; Dong, Tina T X; Tsim, Karl W K

    2015-12-15

    Cordyceps sinensis is an endoparasitic fungus widely used as a tonic and medicinal food in the practice of traditional Chinese medicine (TCM). In historical usage, Cordyceps specifically is referring to the species of C. sinensis. However, a number of closely related species are named themselves as Cordyceps, and they are sold commonly as C. sinensis. The substitutes and adulterants of C. sinensis are often introduced either intentionally or accidentally in the herbal market, which seriously affects the therapeutic effects or even leads to life-threatening poisoning. Here, we aim to identify Cordyceps by DNA sequencing technology. Two different DNA-based approaches were compared. The internal transcribed spacer (ITS) sequences and the random amplified polymorphic DNA (RAPD)-sequence characterized amplified region (SCAR) were developed here to authenticate different species of Cordyceps. Both approaches generally enabled discrimination of C. sinensis from others. The application of the two methods, supporting each other, increases the security of identification. For better reproducibility and faster analysis, the SCAR markers derived from the RAPD results provide a new method for quick authentication of Cordyceps.

  1. An optimized method for high quality DNA extraction from microalga Prototheca wickerhamii for genome sequencing.

    PubMed

    Jagielski, Tomasz; Gawor, Jan; Bakuła, Zofia; Zuchniewicz, Karolina; Żak, Iwona; Gromadka, Robert

    2017-01-01

    The complex cell wall structure of algae often precludes efficient extraction of their genetic material. The purpose of this study was to design a next-generation sequencing-suitable DNA isolation method for unicellular, achlorophyllous, yeast-like microalgae of the genus Prototheca , the only known plant pathogens of both humans and animals. The effectiveness of the newly proposed scheme was compared with five other, previously described methods, commonly used for DNA isolation from plants and/or yeasts, available either as laboratory-developed, in-house assays, based on liquid nitrogen grinding or different enzymatic digestion, or as commercially manufactured kits. All five, previously described, isolation assays yielded DNA concentrations lower than those obtained with the new method, averaging 16.15 ± 25.39 vs 74.2 ± 0.56 ng/µL, respectively. The new method was also superior in terms of DNA purity, as measured by A260/A280 (-0.41 ± 4.26 vs 2.02 ± 0.03), and A260/A230 (1.20 ± 1.12 vs 1.97 ± 0.07) ratios. Only the liquid nitrogen-based method yielded DNA of comparable quantity (60.96 ± 0.16 ng/µL) and quality (A260/A280 = 2.08 ± 0.02; A260/A230 = 2.23 ± 0.26). Still, the new method showed higher integrity, which was best illustrated upon electrophoretic analysis. Genomic DNA of Prototheca wickerhamii POL-1 strain isolated with the protocol herein proposed was successfully sequenced on the Illumina MiSeq platform. A new method for DNA isolation from Prototheca algae is described. The method, whose protocol involves glass beads pulverization and cesium chloride (CsCl) density gradient centrifugation, was demonstrated superior over the other common assays in terms of DNA quantity and quality. The method is also the first to offer the possibility of preparation of DNA template suitable for whole genome sequencing of Prototheca spp.

  2. Isolation from genomic DNA of sequences binding specific regulatory proteins by the acceleration of protein electrophoretic mobility upon DNA binding.

    PubMed

    Subrahmanyam, S; Cronan, J E

    1999-01-21

    We report an efficient and flexible in vitro method for the isolation of genomic DNA sequences that are the binding targets of a given DNA binding protein. This method takes advantage of the fact that binding of a protein to a DNA molecule generally increases the rate of migration of the protein in nondenaturing gel electrophoresis. By the use of a radioactively labeled DNA-binding protein and nonradioactive DNA coupled with PCR amplification from gel slices, we show that specific binding sites can be isolated from Escherichia coli genomic DNA. We have applied this method to isolate a binding site for FadR, a global regulator of fatty acid metabolism in E. coli. We have also isolated a second binding site for BirA, the biotin operon repressor/biotin ligase, from the E. coli genome that has a very low binding efficiency compared with the bio operator region.

  3. Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr.) Different Developmental Stages

    PubMed Central

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-01-01

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944

  4. IFI16 Preferentially Binds to DNA with Quadruplex Structure and Enhances DNA Quadruplex Formation.

    PubMed

    Hároníková, Lucia; Coufal, Jan; Kejnovská, Iva; Jagelská, Eva B; Fojta, Miroslav; Dvořáková, Petra; Muller, Petr; Vojtesek, Borivoj; Brázda, Václav

    2016-01-01

    Interferon-inducible protein 16 (IFI16) is a member of the HIN-200 protein family, containing two HIN domains and one PYRIN domain. IFI16 acts as a sensor of viral and bacterial DNA and is important for innate immune responses. IFI16 binds DNA and binding has been described to be DNA length-dependent, but a preference for supercoiled DNA has also been demonstrated. Here we report a specific preference of IFI16 for binding to quadruplex DNA compared to other DNA structures. IFI16 binds to quadruplex DNA with significantly higher affinity than to the same sequence in double stranded DNA. By circular dichroism (CD) spectroscopy we also demonstrated the ability of IFI16 to stabilize quadruplex structures with quadruplex-forming oligonucleotides derived from human telomere (HTEL) sequences and the MYC promotor. A novel H/D exchange mass spectrometry approach was developed to assess protein interactions with quadruplex DNA. Quadruplex DNA changed the IFI16 deuteration profile in parts of the PYRIN domain (aa 0-80) and in structurally identical parts of both HIN domains (aa 271-302 and aa 586-617) compared to single stranded or double stranded DNAs, supporting the preferential affinity of IFI16 for structured DNA. Our results reveal the importance of quadruplex DNA structure in IFI16 binding and improve our understanding of how IFI16 senses DNA. IFI16 selectivity for quadruplex structure provides a mechanistic framework for IFI16 in immunity and cellular processes including DNA damage responses and cell proliferation.

  5. Detection of Cytosine methylation in ancient DNA from five native american populations using bisulfite sequencing.

    PubMed

    Smith, Rick W A; Monroe, Cara; Bolnick, Deborah A

    2015-01-01

    While cytosine methylation has been widely studied in extant populations, relatively few studies have analyzed methylation in ancient DNA. Most existing studies of epigenetic marks in ancient DNA have inferred patterns of methylation in highly degraded samples using post-mortem damage to cytosines as a proxy for cytosine methylation levels. However, this approach limits the inference of methylation compared with direct bisulfite sequencing, the current gold standard for analyzing cytosine methylation at single nucleotide resolution. In this study, we used direct bisulfite sequencing to assess cytosine methylation in ancient DNA from the skeletal remains of 30 Native Americans ranging in age from approximately 230 to 4500 years before present. Unmethylated cytosines were converted to uracils by treatment with sodium bisulfite, bisulfite products of a CpG-rich retrotransposon were pyrosequenced, and C-to-T ratios were quantified for a single CpG position. We found that cytosine methylation is readily recoverable from most samples, given adequate preservation of endogenous nuclear DNA. In addition, our results indicate that the precision of cytosine methylation estimates is inversely correlated with aDNA preservation, such that samples of low DNA concentration show higher variability in measures of percent methylation than samples of high DNA concentration. In particular, samples in this study with a DNA concentration above 0.015 ng/μL generated the most consistent measures of cytosine methylation. This study presents evidence of cytosine methylation in a large collection of ancient human remains, and indicates that it is possible to analyze epigenetic patterns in ancient populations using direct bisulfite sequencing approaches.

  6. Hybridization chain reaction-based instantaneous derivatization technology for chemiluminescence detection of specific DNA sequences.

    PubMed

    Wang, Xin; Lau, Choiwan; Kai, Masaaki; Lu, Jianzhong

    2013-05-07

    We propose here a new amplifying strategy that uses hybridization chain reaction (HCR) to detect specific sequences of DNA, where stable DNA monomers assemble on the magnetic beads only upon exposure to a target DNA. Briefly, in the HCR process, two complementary stable species of hairpins coexist in solution until the introduction of initiator reporter strands triggers a cascade of hybridization events that yield nicked double helices analogous to alternating copolymers. Moreover, a "sandwich-type" detection strategy is employed in our design. Magnetic beads, which are functionalized with capture DNA, are reacted with the target, and sandwiched with the above nicked double helices. Then, chemiluminescence (CL) detection proceeds via an instantaneous derivatization reaction between a specific CL reagent, 3,4,5-trimethoxylphenylglyoxal (TMPG), and the guanine nucleotides within the target DNA, reporter strands and DNA monomers for the generation of light. Our results clearly show that the amplification detection of specific sequences of DNA achieves a better performance (e.g. wide linear response range, low detection limit, and high specificity) as compared to the traditional sandwich type (capture/target/reporter) assays. Upon modification, the approach presented could be extended to detect other types of targets. We believe that this simple technique is promising for improving medical diagnosis and treatment.

  7. Nanoscale Bio-engineering Solutions for Space Exploration: The Nanopore Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor; Cozmuta, Ioana

    2004-01-01

    Characterization of biological systems at the molecular level and extraction of essential information for nano-engineering design to guide the nano-fabrication of solid-state sensors and molecular identification devices is a computational challenge. The alpha hemolysin protein ion channel is used as a model system for structural analysis of nucleic acids like DNA. Applied voltage draws a DNA strand and surrounding ionic solution through the biological nanopore. The subunits in the DNA strand block ion flow by differing amounts. Atomistic scale simulations are employed using NASA supercomputers to study DNA translocation, with the aim to enhance single DNA subunit identification. Compared to protein channels, solid-state nanopores offer a better temporal control of the translocation of DNA and the possibility to easily tune its chemistry to increase the signal resolution. Potential applications for NASA missions, besides real-time genome sequencing include astronaut health, life detection and decoding of various genomes.

  8. Nanoscale Bioengineering Solutions for Space Exploration the Nanopore Sequencer

    NASA Technical Reports Server (NTRS)

    Ioana, Cozmuta; Viktor, Stoic

    2005-01-01

    Characterization of biological systems at the molecular level and extraction of essential information for nano-engineering design to guide the nano-fabrication of solid-state sensors and molecular identification devices is a computational challenge. The alpha hemolysin protein ion channel is used as a model system for structural analysis of nucleic acids like DNA. Applied voltage draws a DNA strand and surrounding ionic solution through the biological nanopore. The subunits in the DNA strand block ion flow by differing amounts. Atomistic scale simulations are employed using NASA supercomputers to study DNA translocation. with the aim to enhance single DNA subunit identification. Compared to protein channels, solid-state nanopores offer a better temporal control of the translocation of DNA and the possibility to easily tune its chemistry to increase the signal resolution. Potential applications for NASA missions, besides real-time genome sequencing include astronaut health, life detection and decoding of various genomes. http://phenomrph.arc.nasa.gov/index.php

  9. Evolution of rDNA in Nicotiana Allopolyploids: A Potential Link between rDNA Homogenization and Epigenetics

    PubMed Central

    Kovarik, Ales; Dadejova, Martina; Lim, Yoong K.; Chase, Mark W.; Clarkson, James J.; Knapp, Sandra; Leitch, Andrew R.

    2008-01-01

    Background The evolution and biology of rDNA have interested biologists for many years, in part, because of two intriguing processes: (1) nucleolar dominance and (2) sequence homogenization. We review patterns of evolution in rDNA in the angiosperm genus Nicotiana to determine consequences of allopolyploidy on these processes. Scope Allopolyploid species of Nicotiana are ideal for studying rDNA evolution because phylogenetic reconstruction of DNA sequences has revealed patterns of species divergence and their parents. From these studies we also know that polyploids formed over widely different timeframes (thousands to millions of years), enabling comparative and temporal studies of rDNA structure, activity and chromosomal distribution. In addition studies on synthetic polyploids enable the consequences of de novo polyploidy on rDNA activity to be determined. Conclusions We propose that rDNA epigenetic expression patterns established even in F1 hybrids have a material influence on the likely patterns of divergence of rDNA. It is the active rDNA units that are vulnerable to homogenization, which probably acts to reduce mutational load across the active array. Those rDNA units that are epigenetically silenced may be less vulnerable to sequence homogenization. Selection cannot act on these silenced genes, and they are likely to accumulate mutations and eventually be eliminated from the genome. It is likely that whole silenced arrays will be deleted in polyploids of 1 million years of age and older. PMID:18310159

  10. Immune-Related Transcriptome of Coptotermes formosanus Shiraki Workers: The Defense Mechanism

    PubMed Central

    Hussain, Abid; Li, Yi-Feng; Cheng, Yu; Liu, Yang; Chen, Chuan-Cheng; Wen, Shuo-Yang

    2013-01-01

    Formosan subterranean termites, Coptotermes formosanus Shiraki, live socially in microbial-rich habitats. To understand the molecular mechanism by which termites combat pathogenic microbes, a full-length normalized cDNA library and four Suppression Subtractive Hybridization (SSH) libraries were constructed from termite workers infected with entomopathogenic fungi (Metarhizium anisopliae and Beauveria bassiana), Gram-positive Bacillus thuringiensis and Gram-negative Escherichia coli, and the libraries were analyzed. From the high quality normalized cDNA library, 439 immune-related sequences were identified. These sequences were categorized as pattern recognition receptors (47 sequences), signal modulators (52 sequences), signal transducers (137 sequences), effectors (39 sequences) and others (164 sequences). From the SSH libraries, 27, 17, 22 and 15 immune-related genes were identified from each SSH library treated with M. anisopliae, B. bassiana, B. thuringiensis and E. coli, respectively. When the normalized cDNA library was compared with the SSH libraries, 37 immune-related clusters were found in common; 56 clusters were identified in the SSH libraries, and 259 were identified in the normalized cDNA library. The immune-related gene expression pattern was further investigated using quantitative real time PCR (qPCR). Important immune-related genes were characterized, and their potential functions were discussed based on the integrated analysis of the results. We suggest that normalized cDNA and SSH libraries enable us to discover functional genes transcriptome. The results remarkably expand our knowledge about immune-inducible genes in C. formosanus Shiraki and enable the future development of novel control strategies for the management of Formosan subterranean termites. PMID:23874972

  11. Evaluation of partial 16S ribosomal DNA sequencing for identification of nocardia species by using the MicroSeq 500 system with an expanded database.

    PubMed

    Cloud, Joann L; Conville, Patricia S; Croft, Ann; Harmsen, Dag; Witebsky, Frank G; Carroll, Karen C

    2004-02-01

    Identification of clinically significant nocardiae to the species level is important in patient diagnosis and treatment. A study was performed to evaluate Nocardia species identification obtained by partial 16S ribosomal DNA (rDNA) sequencing by the MicroSeq 500 system with an expanded database. The expanded portion of the database was developed from partial 5' 16S rDNA sequences derived from 28 reference strains (from the American Type Culture Collection and the Japanese Collection of Microorganisms). The expanded MicroSeq 500 system was compared to (i). conventional identification obtained from a combination of growth characteristics with biochemical and drug susceptibility tests; (ii). molecular techniques involving restriction enzyme analysis (REA) of portions of the 16S rRNA and 65-kDa heat shock protein genes; and (iii). when necessary, sequencing of a 999-bp fragment of the 16S rRNA gene. An unknown isolate was identified as a particular species if the sequence obtained by partial 16S rDNA sequencing by the expanded MicroSeq 500 system was 99.0% similar to that of the reference strain. Ninety-four nocardiae representing 10 separate species were isolated from patient specimens and examined by using the three different methods. Sequencing of partial 16S rDNA by the expanded MicroSeq 500 system resulted in only 72% agreement with conventional methods for species identification and 90% agreement with the alternative molecular methods. Molecular methods for identification of Nocardia species provide more accurate and rapid results than the conventional methods using biochemical and susceptibility testing. With an expanded database, the MicroSeq 500 system for partial 16S rDNA was able to correctly identify the human pathogens N. brasiliensis, N. cyriacigeorgica, N. farcinica, N. nova, N. otitidiscaviarum, and N. veterana.

  12. Genomic Organization of Repetitive DNA in Woodpeckers (Aves, Piciformes): Implications for Karyotype and ZW Sex Chromosome Differentiation

    PubMed Central

    Kretschmer, Rafael; Bertocchi, Natasha Avila; Degrandi, Tiago Marafiga; de Oliveira, Edivaldo Herculano Corrêa; Cioffi, Marcelo de Bello; Garnero, Analía del Valle; Gunski, Ricardo José

    2017-01-01

    Birds are characterized by a low proportion of repetitive DNA in their genome when compared to other vertebrates. Among birds, species belonging to Piciformes order, such as woodpeckers, show a relatively higher amount of these sequences. The aim of this study was to analyze the distribution of different classes of repetitive DNA—including microsatellites, telomere sequences and 18S rDNA—in the karyotype of three Picidae species (Aves, Piciformes)—Colaptes melanochloros (2n = 84), Colaptes campestris (2n = 84) and Melanerpes candidus (2n = 64)–by means of fluorescence in situ hybridization. Clusters of 18S rDNA were found in one microchromosome pair in each of the three species, coinciding to a region of (CGG)10 sequence accumulation. Interstitial telomeric sequences were found in some macrochromosomes pairs, indicating possible regions of fusions, which can be related to variation of diploid number in the family. Only one, from the 11 different microsatellite sequences used, did not produce any signals. Both species of genus Colaptes showed a similar distribution of microsatellite sequences, with some difference when compared to M. candidus. Microsatellites were found preferentially in the centromeric and telomeric regions of micro and macrochromosomes. However, some sequences produced patterns of interstitial bands in the Z chromosome, which corresponds to the largest element of the karyotype in all three species. This was not observed in the W chromosome of Colaptes melanochloros, which is heterochromatic in most of its length, but was not hybridized by any of the sequences used. These results highlight the importance of microsatellite sequences in differentiation of sex chromosomes, and the accumulation of these sequences is probably responsible for the enlargement of the Z chromosome. PMID:28081238

  13. Mitochondrial Genome Sequence of the Legume Vicia faba

    PubMed Central

    Negruk, Valentine

    2013-01-01

    The number of plant mitochondrial genomes sequenced exceeds two dozen. However, for a detailed comparative study of different phylogenetic branches more plant mitochondrial genomes should be sequenced. This article presents sequencing data and comparative analysis of mitochondrial DNA (mtDNA) of the legume Vicia faba. The size of the V. faba circular mitochondrial master chromosome of cultivar Broad Windsor was estimated as 588,000 bp with a genome complexity of 387,745 bp and 52 conservative mitochondrial genes; 32 of them encoding proteins, 3 rRNA, and 17 tRNA genes. Six tRNA genes were highly homologous to chloroplast genome sequences. In addition to the 52 conservative genes, 114 unique open reading frames (ORFs) were found, 36 without significant homology to any known proteins and 29 with homology to the Medicago truncatula nuclear genome and to other plant mitochondrial ORFs, 49 ORFs were not homologous to M. truncatula but possessed sequences with significant homology to other plant mitochondrial or nuclear ORFs. In general, the unique ORFs revealed very low homology to known closely related legumes, but several sequence homologies were found between V. faba, Beta vulgaris, Nicotiana tabacum, Vitis vinifera, and even the monocots Oryza sativa and Zea mays. Most likely these ORFs arose independently during angiosperm evolution (Kubo and Mikami, 2007; Kubo and Newton, 2008). Computational analysis revealed in total about 45% of V. faba mtDNA sequence being homologous to the Medicago truncatula nuclear genome (more than to any sequenced plant mitochondrial genome), and 35% of this homology ranging from a few dozen to 12,806 bp are located on chromosome 1. Apparently, mitochondrial rrn5, rrn18, rps10, ATP synthase subunit alpha, cox2, and tRNA sequences are part of transcribed nuclear mosaic ORFs. PMID:23675376

  14. Distinctive archaebacterial species associated with anaerobic rumen protozoan Entodinium caudatum.

    PubMed

    Tóthová, T; Piknová, M; Kisidayová, S; Javorský, P; Pristas, P

    2008-01-01

    The diversity of archaebacteria associated with anaerobic rumen protozoan Entodinium caudatum in long term in vitro culture was investigated by denaturing gradient gel electrophoresis (DGGE) analysis of hypervariable V3 region of archaebacterial 16S rRNA gene. PCR was accomplished directly from DNA extracted from a single protozoal cell and from total community genomic DNA and the obtained fingerprints were compared. The analysis indicated the presence of a solitary intensive band present in Entodinium caudatum single cell DNA, which had no counterparts in the profile from total DNA. The identity of archaebacterium represented by this band was determined by sequence analysis which showed that the sequence fell to the cluster of ciliate symbiotic methanogens identified recently by 16S gene library approach.

  15. A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences

    PubMed Central

    Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L.

    2017-01-01

    An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5′-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5′-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. PMID:28628204

  16. DNA sequence of the lymphotropic variant of minute virus of mice, MVM(i), and comparison with the DNA sequence of the fibrotropic prototype strain.

    PubMed

    Astell, C R; Gardiner, E M; Tattersall, P

    1986-02-01

    The sequence of molecular clones of the genome of MVM(i), a lymphotropic variant of minute virus of mice, was determined and compared with that of MVM(p), the fibrotropic prototype strain. At the nucleotide level there are 163 base changes: 129 transitions and 34 transversions. Most nucleotide changes are silent, with only 27 amino acids changes predicted, of which 22 are conservative. Notable differences between the MVM(i) and MVM(p) genomes which may account for the cell specificities of these viruses occur within the 3' nontranslated regions. The differences discussed include the absence of a 65-base-pair direct in MVM(i), the presence of only two polyadenylation sites in MVM(i) compared with four in MVM(p), and sequences that bear a resemblance to enhancer sequences. Also included in this paper is an important correction to the MVM(p) sequence (C.R. Astell, M. Thomson, M. Merchlinsky, and D. C. Ward, Nucleic Acids Res. 11:999-1018, 1983).

  17. COMPETITIVE METAGENOMIC DNA HYBRIDIZATION IDENTIFIES HOST-SPECIFIC GENETIC MARKERS IN HUMAN FECAL MICROBIAL COMMUNITIES

    EPA Science Inventory

    Although recent technological advances in DNA sequencing and computational biology now allow scientists to compare entire microbial genomes, the use of these approaches to discern key genomic differences between natural microbial communities remains prohibitively expensive for mo...

  18. Identification of Bacterial DNA Markers for the Detection of Human and Cattle Fecal Pollution - SLIDES

    EPA Science Inventory

    Technological advances in DNA sequencing and computational biology allow scientists to compare entire microbial genomes. However, the use of these approaches to discern key genomic differences between natural microbial communities remains prohibitively expensive for most laborato...

  19. IDENTIFICATION OF BACTERIAL DNA MARKERS FOR THE DETECTION OF HUMAN AND CATTLE FECAL POLLUTION

    EPA Science Inventory

    Technological advances in DNA sequencing and computational biology allow scientists to compare entire microbial genomes. However, the use of these approaches to discern key genomic differences between natural microbial communities remains prohibitively expensive for most laborato...

  20. MitoAge: a database for comparative analysis of mitochondrial DNA, with a special focus on animal longevity.

    PubMed

    Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E

    2016-01-04

    Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Selection of a DNA barcode for Nectriaceae from fungal whole-genomes.

    PubMed

    Zeng, Zhaoqing; Zhao, Peng; Luo, Jing; Zhuang, Wenying; Yu, Zhihe

    2012-01-01

    A DNA barcode is a short segment of sequence that is able to distinguish species. A barcode must ideally contain enough variation to distinguish every individual species and be easily obtained. Fungi of Nectriaceae are economically important and show high species diversity. To establish a standard DNA barcode for this group of fungi, the genomes of Neurospora crassa and 30 other filamentous fungi were compared. The expect value was treated as a criterion to recognize homologous sequences. Four candidate markers, Hsp90, AAC, CDC48, and EF3, were tested for their feasibility as barcodes in the identification of 34 well-established species belonging to 13 genera of Nectriaceae. Two hundred and fifteen sequences were analyzed. Intra- and inter-specific variations and the success rate of PCR amplification and sequencing were considered as important criteria for estimation of the candidate markers. Ultimately, the partial EF3 gene met the requirements for a good DNA barcode: No overlap was found between the intra- and inter-specific pairwise distances. The smallest inter-specific distance of EF3 gene was 3.19%, while the largest intra-specific distance was 1.79%. In addition, there was a high success rate in PCR and sequencing for this gene (96.3%). CDC48 showed sufficiently high sequence variation among species, but the PCR and sequencing success rate was 84% using a single pair of primers. Although the Hsp90 and AAC genes had higher PCR and sequencing success rates (96.3% and 97.5%, respectively), overlapping occurred between the intra- and inter-specific variations, which could lead to misidentification. Therefore, we propose the EF3 gene as a possible DNA barcode for the nectriaceous fungi.

  2. The cDNA sequence of a neutral horseradish peroxidase.

    PubMed

    Bartonek-Roxå, E; Eriksson, H; Mattiasson, B

    1991-02-16

    A cDNA clone encoding a horseradish (Armoracia rusticana) peroxidase has been isolated and characterized. The cDNA contains 1378 nucleotides excluding the poly(A) tail and the deduced protein contains 327 amino acids which includes a 28 amino acid leader sequence. The predicted amino acid sequence is nine amino acids shorter than the major isoenzyme belonging to the horseradish peroxidase C group (HRP-C) and the sequence shows 53.7% identity with this isoenzyme. The described clone encodes nine cysteines of which eight correspond well with the cysteines found in HRP-C. Five potential N-glycosylation sites with the general sequence Asn-X-Thr/Ser are present in the deduced sequence. Compared to the earlier described HRP-C this is three glycosylation sites less. The shorter sequence and fewer N-glycosylation sites give the native isoenzyme a molecular weight of several thousands less than the horseradish peroxidase C isoenzymes. Comparison with the net charge value of HRP-C indicates that the described cDNA clone encodes a peroxidase which has either the same or a slightly less basic pI value, depending on whether the encoded protein is N-terminally blocked or not. This excludes the possibility that HRP-n could belong to either the HRP-A, -D or -E groups. The low sequence identity (53.7%) with HRP-C indicates that the described clone does not belong to the HRP-C isoenzyme group and comparison of the total amino acid composition with the HRP-B group does not place the described clone within this isoenzyme group. Our conclusion is that the described cDNA clone encodes a neutral horseradish peroxidase which belongs to a new, not earlier described, horseradish peroxidase group.

  3. Molecular genetic characterization of the RD-114 gene family of endogenous feline retroviral sequences.

    PubMed Central

    Reeves, R H; O'Brien, S J

    1984-01-01

    RD-114 is a replication-competent, xenotropic retrovirus which is homologous to a family of moderately repetitive DNA sequences present at ca. 20 copies in the normal cellular genome of domestic cats. To examine the extent and character of genomic divergence of the RD-114 gene family as well as to assess their positional association within the cat genome, we have prepared a series of molecular clones of endogenous RD-114 DNA segments from a genomic library of cat cellular DNA. Their restriction endonuclease maps were compared with each other as well as to that of the prototype-inducible RD-114 which was molecularly cloned from a chronically infected human cell line. The endogenous sequences analyzed were similar to each other in that they were colinear with RD-114 proviral DNA, were bounded by long terminal redundancies, and conserved many restriction sites in the gag and pol regions. However, the env regions of many of the sequences examined were substantially deleted. Several of the endogenous RD-114 genomes contained a novel envelope sequence which was unrelated to the env gene of the prototype RD-114 env gene but which, like RD-114 and endogenous feline leukemia virus provirus, was found only in species of the genus Felis, and not in other closely related Felidae genera. The endogenous RD-114 sequences each had a distinct cellular flank which indicates that these sequences are not tandem but dispersed nonspecifically throughout the genome. Southern analysis of cat cellular DNA confirmed the conclusions about conserved restriction sites in endogenous sequences and indicated that a single locus may be responsible for the production of the major inducible form of RD-114. Images PMID:6090693

  4. Improving accuracy of DNA diet estimates using food tissue control materials and an evaluation of proxies for digestion bias.

    PubMed

    Thomas, Austen C; Jarman, Simon N; Haman, Katherine H; Trites, Andrew W; Deagle, Bruce E

    2014-08-01

    Ecologists are increasingly interested in quantifying consumer diets based on food DNA in dietary samples and high-throughput sequencing of marker genes. It is tempting to assume that food DNA sequence proportions recovered from diet samples are representative of consumer's diet proportions, despite the fact that captive feeding studies do not support that assumption. Here, we examine the idea of sequencing control materials of known composition along with dietary samples in order to correct for technical biases introduced during amplicon sequencing and biological biases such as variable gene copy number. Using the Ion Torrent PGM(©) , we sequenced prey DNA amplified from scats of captive harbour seals (Phoca vitulina) fed a constant diet including three fish species in known proportions. Alongside, we sequenced a prey tissue mix matching the seals' diet to generate tissue correction factors (TCFs). TCFs improved the diet estimates (based on sequence proportions) for all species and reduced the average estimate error from 28 ± 15% (uncorrected) to 14 ± 9% (TCF-corrected). The experimental design also allowed us to infer the magnitude of prey-specific digestion biases and calculate digestion correction factors (DCFs). The DCFs were compared with possible proxies for differential digestion (e.g. fish protein%, fish lipid%) revealing a strong relationship between the DCFs and percent lipid of the fish prey, suggesting prey-specific corrections based on lipid content would produce accurate diet estimates in this study system. These findings demonstrate the value of parallel sequencing of food tissue mixtures in diet studies and offer new directions for future research in quantitative DNA diet analysis. © 2013 John Wiley & Sons Ltd.

  5. Nucleotide sequence of a complementary DNA encoding pea cytosolic copper/zinc superoxide dismutase. [Pisum sativum L

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, D.A.; Zilinskas, B.A.

    1991-08-01

    The authors now report the nucleotide sequence of the cytosolic Cu/Zn SOD cloned from a {lambda}gt11 cDNA library constructed from mRNA extracted from leaves of 7- to 10-d pea seedlings (Pisum sativum L.). The clone was isolated using a 22-base synthetic oligonucleotide complementary to the amino acid sequence CGIIGLQG. This sequence, found at the protein's carboxy terminus, is highly conserved among plant cytosolic Cu/Zn SODs but not chloroplastic Cu/Zn SODs. The 738-base pair sequence contains an open reading frame specifying 152 codons and a predicted M{sub r} of 18,024 D. The deduced amino acid sequence is highly homologous (79-82% identity)more » with the sequences of other known plant cytosolic Cu/Zn SODs but less highly conserved (63-65%) when compared with several chloroplastic Cu/Zn SODs including pea (10).« less

  6. Assignment of the human caltractin gene (CALT) to Xq28 by fluorescence in situ hybridization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tanaka, Tanaka; Okui, Keiko; Nakamura, Yusuke

    1994-12-01

    The centrosome is the major microtubule-organizing center of interphase eukaryotic cells, an its duplication is essential to eukaryotic cell division. Caltractin, a structural component of centrosomes, is highly homologous in amino acid sequence to the product of the CDC31 gene of Saccharomyces cerevisiae. In S. cerevisiae, an important role for CDC31 in duplication of the spindle pole body (SPB), a kind of microtubule-organizing center, has been demonstrated by an experiment in which mutant CDC31 prevented SPB duplication and led to formation of a monopolar spindle. In view of the localization of human caltractin in centrosomes and the sequence homology itmore » bears to yeast CDC31, it is reasonable to assume that caltractin functions in humans as CDC31 does in yeast. As a part of the Human Genome Project, we have been determining nucleotide sequences of DNA clones randomly selected from a directionally cloned cDNA library constructed from fetal brain mRNA obtained from Clontech (La Jolla, CA). By comparing 5{prime} partial DNA sequences of these cDNA clones with known DNA sequences in the database, we found one clone that was highly homologous to the caltractin gene of Chlamydomonas, which turned out to be the same as a human gene identified recently. 4 refs., 1 fig.« less

  7. Molecular Identification of Ectomycorrhizal Mycelium in Soil Horizons

    PubMed Central

    Landeweert, Renske; Leeflang, Paula; Kuyper, Thom W.; Hoffland, Ellis; Rosling, Anna; Wernars, Karel; Smit, Eric

    2003-01-01

    Molecular identification techniques based on total DNA extraction provide a unique tool for identification of mycelium in soil. Using molecular identification techniques, the ectomycorrhizal (EM) fungal community under coniferous vegetation was analyzed. Soil samples were taken at different depths from four horizons of a podzol profile. A basidiomycete-specific primer pair (ITS1F-ITS4B) was used to amplify fungal internal transcribed spacer (ITS) sequences from total DNA extracts of the soil horizons. Amplified basidiomycete DNA was cloned and sequenced, and a selection of the obtained clones was analyzed phylogenetically. Based on sequence similarity, the fungal clone sequences were sorted into 25 different fungal groups, or operational taxonomic units (OTUs). Out of 25 basidiomycete OTUs, 7 OTUs showed high nucleotide homology (≥99%) with known EM fungal sequences and 16 were found exclusively in the mineral soil. The taxonomic positions of six OTUs remained unclear. OTU sequences were compared to sequences from morphotyped EM root tips collected from the same sites. Of the 25 OTUs, 10 OTUs had ≥98% sequence similarity with these EM root tip sequences. The present study demonstrates the use of molecular techniques to identify EM hyphae in various soil types. This approach differs from the conventional method of EM root tip identification and provides a novel approach to examine EM fungal communities in soil. PMID:12514012

  8. A multi-scale analysis of bull sperm methylome revealed both species peculiarities and conserved tissue-specific features.

    PubMed

    Perrier, Jean-Philippe; Sellem, Eli; Prézelin, Audrey; Gasselin, Maxime; Jouneau, Luc; Piumi, François; Al Adhami, Hala; Weber, Michaël; Fritz, Sébastien; Boichard, Didier; Le Danvic, Chrystelle; Schibler, Laurent; Jammes, Hélène; Kiefer, Hélène

    2018-05-29

    Spermatozoa have a remarkable epigenome in line with their degree of specialization, their unique nature and different requirements for successful fertilization. Accordingly, perturbations in the establishment of DNA methylation patterns during male germ cell differentiation have been associated with infertility in several species. While bull semen is widely used in artificial insemination, the literature describing DNA methylation in bull spermatozoa is still scarce. The purpose of this study was therefore to characterize the bull sperm methylome relative to both bovine somatic cells and the sperm of other mammals through a multiscale analysis. The quantification of DNA methylation at CCGG sites using luminometric methylation assay (LUMA) highlighted the undermethylation of bull sperm compared to the sperm of rams, stallions, mice, goats and men. Total blood cells displayed a similarly high level of methylation in bulls and rams, suggesting that undermethylation of the bovine genome was specific to sperm. Annotation of CCGG sites in different species revealed no striking bias in the distribution of genome features targeted by LUMA that could explain undermethylation of bull sperm. To map DNA methylation at a genome-wide scale, bull sperm was compared with bovine liver, fibroblasts and monocytes using reduced representation bisulfite sequencing (RRBS) and immunoprecipitation of methylated DNA followed by microarray hybridization (MeDIP-chip). These two methods exhibited differences in terms of genome coverage, and consistently, two independent sets of sequences differentially methylated in sperm and somatic cells were identified for RRBS and MeDIP-chip. Remarkably, in the two sets most of the differentially methylated sequences were hypomethylated in sperm. In agreement with previous studies in other species, the sequences that were specifically hypomethylated in bull sperm targeted processes relevant to the germline differentiation program (piRNA metabolism, meiosis, spermatogenesis) and sperm functions (cell adhesion, fertilization), as well as satellites and rDNA repeats. These results highlight the undermethylation of bull spermatozoa when compared with both bovine somatic cells and the sperm of other mammals, and raise questions regarding the dynamics of DNA methylation in bovine male germline. Whether sperm undermethylation has potential interactions with structural variation in the cattle genome may deserve further attention.

  9. Homeologous plastid DNA transformation in tobacco is mediated by multiple recombination events.

    PubMed Central

    Kavanagh, T A; Thanh, N D; Lao, N T; McGrath, N; Peter, S O; Horváth, E M; Dix, P J; Medgyesy, P

    1999-01-01

    Efficient plastid transformation has been achieved in Nicotiana tabacum using cloned plastid DNA of Solanum nigrum carrying mutations conferring spectinomycin and streptomycin resistance. The use of the incompletely homologous (homeologous) Solanum plastid DNA as donor resulted in a Nicotiana plastid transformation frequency comparable with that of other experiments where completely homologous plastid DNA was introduced. Physical mapping and nucleotide sequence analysis of the targeted plastid DNA region in the transformants demonstrated efficient site-specific integration of the 7.8-kb Solanum plastid DNA and the exclusion of the vector DNA. The integration of the cloned Solanum plastid DNA into the Nicotiana plastid genome involved multiple recombination events as revealed by the presence of discontinuous tracts of Solanum-specific sequences that were interspersed between Nicotiana-specific markers. Marked position effects resulted in very frequent cointegration of the nonselected peripheral donor markers located adjacent to the vector DNA. Data presented here on the efficiency and features of homeologous plastid DNA recombination are consistent with the existence of an active RecA-mediated, but a diminished mismatch, recombination/repair system in higher-plant plastids. PMID:10388829

  10. Ribosomal RNA Genes Contribute to the Formation of Pseudogenes and Junk DNA in the Human Genome.

    PubMed

    Robicheau, Brent M; Susko, Edward; Harrigan, Amye M; Snyder, Marlene

    2017-02-01

    Approximately 35% of the human genome can be identified as sequence devoid of a selected-effect function, and not derived from transposable elements or repeated sequences. We provide evidence supporting a known origin for a fraction of this sequence. We show that: 1) highly degraded, but near full length, ribosomal DNA (rDNA) units, including both 45S and Intergenic Spacer (IGS), can be found at multiple sites in the human genome on chromosomes without rDNA arrays, 2) that these rDNA sequences have a propensity for being centromere proximal, and 3) that sequence at all human functional rDNA array ends is divergent from canonical rDNA to the point that it is pseudogenic. We also show that small sequence strings of rDNA (from 45S + IGS) can be found distributed throughout the genome and are identifiable as an "rDNA-like signal", representing 0.26% of the q-arm of HSA21 and ∼2% of the total sequence of other regions tested. The size of sequence strings found in the rDNA-like signal intergrade into the size of sequence strings that make up the full-length degrading rDNA units found scattered throughout the genome. We conclude that the displaced and degrading rDNA sequences are likely of a similar origin but represent different stages in their evolution towards random sequence. Collectively, our data suggests that over vast evolutionary time, rDNA arrays contribute to the production of junk DNA. The concept that the production of rDNA pseudogenes is a by-product of concerted evolution represents a previously under-appreciated process; we demonstrate here its importance. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. Comparison of Boiling and Robotics Automation Method in DNA Extraction for Metagenomic Sequencing of Human Oral Microbes.

    PubMed

    Yamagishi, Junya; Sato, Yukuto; Shinozaki, Natsuko; Ye, Bin; Tsuboi, Akito; Nagasaki, Masao; Yamashita, Riu

    2016-01-01

    The rapid improvement of next-generation sequencing performance now enables us to analyze huge sample sets with more than ten thousand specimens. However, DNA extraction can still be a limiting step in such metagenomic approaches. In this study, we analyzed human oral microbes to compare the performance of three DNA extraction methods: PowerSoil (a method widely used in this field), QIAsymphony (a robotics method), and a simple boiling method. Dental plaque was initially collected from three volunteers in the pilot study and then expanded to 12 volunteers in the follow-up study. Bacterial flora was estimated by sequencing the V4 region of 16S rRNA following species-level profiling. Our results indicate that the efficiency of PowerSoil and QIAsymphony was comparable to the boiling method. Therefore, the boiling method may be a promising alternative because of its simplicity, cost effectiveness, and short handling time. Moreover, this method was reliable for estimating bacterial species and could be used in the future to examine the correlation between oral flora and health status. Despite this, differences in the efficiency of DNA extraction for various bacterial species were observed among the three methods. Based on these findings, there is no "gold standard" for DNA extraction. In future, we suggest that the DNA extraction method should be selected on a case-by-case basis considering the aims and specimens of the study.

  12. The mitochondrial C16069T polymorphism, not mitochondrial D310 (D-loop) mononucleotide sequence variations, is associated with bladder cancer.

    PubMed

    Shakhssalim, Nasser; Houshmand, Massoud; Kamalidehghan, Behnam; Faraji, Abolfazl; Sarhangnejad, Reza; Dadgar, Sepideh; Mobaraki, Maryam; Rosli, Rozita; Sanati, Mohammad Hossein

    2013-12-05

    Bladder cancer is a relatively common and potentially life-threatening neoplasm that ranks ninth in terms of worldwide cancer incidence. The aim of this study was to determine deletions and sequence variations in the mitochondrial displacement loop (D-loop) region from the blood specimens and tumoral tissues of patients with bladder cancer, compared to adjacent non-tumoral tissues. The DNA from blood, tumoral tissues and adjacent non-tumoral tissues of twenty-six patients with bladder cancer and DNA from blood of 504 healthy controls from different ethnicities were investigated to determine sequence variation in the mitochondrial D-loop region using multiplex polymerase chain reaction (PCR), DNA sequencing and southern blotting analysis. From a total of 110 variations, 48 were reported as new mutations. No deletions were detected in tumoral tissues, adjacent non-tumoral tissues and blood samples from patients. Although the polymorphisms at loci 16189, 16261 and 16311 were not significantly correlated with bladder cancer, the C16069T variation was significantly present in patient samples compared to control samples (p < 0.05). Interestingly, there was no significant difference (p > 0.05) of C variations, including C7TC6, C8TC6, C9TC6 and C10TC6, in D310 mitochondrial DNA between patients and control samples. Our study suggests that 16069 mitochondrial DNA D-Loop mutations may play a significant role in the etiology of bladder cancer and facilitate the definition of carcinogenesis-related mutations in human cancer.

  13. Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

    PubMed Central

    Chan, K. C. Allen; Jiang, Peiyong; Sun, Kun; Cheng, Yvonne K. Y.; Tong, Yu K.; Cheng, Suk Hang; Wong, Ada I. C.; Hudecova, Irena; Leung, Tak Y.; Chiu, Rossa W. K.; Lo, Yuk Ming Dennis

    2016-01-01

    Plasma DNA obtained from a pregnant woman was sequenced to a depth of 270× haploid genome coverage. Comparing the maternal plasma DNA sequencing data with the parental genomic DNA data and using a series of bioinformatics filters, fetal de novo mutations were detected at a sensitivity of 85% and a positive predictive value of 74%. These results represent a 169-fold improvement in the positive predictive value over previous attempts. Improvements in the interpretation of the sequence information of every base position in the genome allowed us to interrogate the maternal inheritance of the fetus for 618,271 of 656,676 (94.2%) heterozygous SNPs within the maternal genome. The fetal genotype at each of these sites was deduced individually, unlike previously, where the inheritance was determined for a collection of sites within a haplotype. These results represent a 90-fold enhancement in the resolution in determining the fetus’s maternal inheritance. Selected genomic locations were more likely to be found at the ends of plasma DNA molecules. We found that a subset of such preferred ends exhibited selectivity for fetal- or maternal-derived DNA in maternal plasma. The ratio of the number of maternal plasma DNA molecules with fetal preferred ends to those with maternal preferred ends showed a correlation with the fetal DNA fraction. Finally, this second generation approach for noninvasive fetal whole-genome analysis was validated in a pregnancy diagnosed with cardiofaciocutaneous syndrome with maternal plasma DNA sequenced to 195× coverage. The causative de novo BRAF mutation was successfully detected through the maternal plasma DNA analysis. PMID:27799561

  14. A Surrogate Approach to Study the Evolution of Noncoding DNA Elements That Organize Eukaryotic Genomes

    PubMed Central

    Vermaak, Danielle; Bayes, Joshua J.

    2009-01-01

    Comparative genomics provides a facile way to address issues of evolutionary constraint acting on different elements of the genome. However, several important DNA elements have not reaped the benefits of this new approach. Some have proved intractable to current day sequencing technology. These include centromeric and heterochromatic DNA, which are essential for chromosome segregation as well as gene regulation, but the highly repetitive nature of the DNA sequences in these regions make them difficult to assemble into longer contigs. Other sequences, like dosage compensation X chromosomal sites, origins of DNA replication, or heterochromatic sequences that encode piwi-associated RNAs, have proved difficult to study because they do not have recognizable DNA features that allow them to be described functionally or computationally. We have employed an alternate approach to the direct study of these DNA elements. By using proteins that specifically bind these noncoding DNAs as surrogates, we can indirectly assay the evolutionary constraints acting on these important DNA elements. We review the impact that such “surrogate strategies” have had on our understanding of the evolutionary constraints shaping centromeres, origins of DNA replication, and dosage compensation X chromosomal sites. These have begun to reveal that in contrast to the view that such structural DNA elements are either highly constrained (under purifying selection) or free to drift (under neutral evolution), some of them may instead be shaped by adaptive evolution and genetic conflicts (these are not mutually exclusive). These insights also help to explain why the same elements (e.g., centromeres and replication origins), which are so complex in some eukaryotic genomes, can be simple and well defined in other where similar conflicts do not exist. PMID:19635763

  15. A Dynamic Tandem Repeat in Monocotyledons Inferred from a Comparative Analysis of Chloroplast Genomes in Melanthiaceae.

    PubMed

    Do, Hoang Dang Khoa; Kim, Joo-Hwan

    2017-01-01

    Chloroplast genomes (cpDNA) are highly valuable resources for evolutionary studies of angiosperms, since they are highly conserved, are small in size, and play critical roles in plants. Slipped-strand mispairing (SSM) was assumed to be a mechanism for generating repeat units in cpDNA. However, research on the employment of different small repeated sequences through SSM events, which may induce the accumulation of distinct types of repeats within the same region in cpDNA, has not been documented. Here, we sequenced two chloroplast genomes from the endemic species Heloniopsis tubiflora (Korea) and Xerophyllum tenax (USA) to cover the gap between molecular data and explore "hot spots" for genomic events in Melanthiaceae. Comparative analysis of 23 complete cpDNA sequences revealed that there were different stages of deletion in the rps16 region across the Melanthiaceae. Based on the partial or complete loss of rps16 gene in cpDNA, we have firstly reported potential molecular markers for recognizing two sections ( Veratrum and Fuscoveratrum ) of Veratrum . Melathiaceae exhibits a significant change in the junction between large single copy and inverted repeat regions, ranging from trnH_GUG to a part of rps3 . Our results show an accumulation of tandem repeats in the rpl23-ycf2 regions of cpDNAs. Small conserved sequences exist and flank tandem repeats in further observation of this region across most of the examined taxa of Liliales. Therefore, we propose three scenarios in which different small repeated sequences were used during SSM events to generate newly distinct types of repeats. Occasionally, prior to the SSM process, point mutation event and double strand break repair occurred and induced the formation of initial repeat units which are indispensable in the SSM process. SSM may have likely occurred more frequently for short repeats than for long repeat sequences in tribe Parideae (Melanthiaceae, Liliales). Collectively, these findings add new evidence of dynamic results from SSM in chloroplast genomes which can be useful for further evolutionary studies in angiosperms. Additionally, genomics events in cpDNA are potential resources for mining molecular markers in Liliales.

  16. Single-cell genomic sequencing using Multiple Displacement Amplification.

    PubMed

    Lasken, Roger S

    2007-10-01

    Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction. The few femtograms of DNA in a bacterium are amplified into micrograms of high molecular weight DNA suitable for DNA library construction and Sanger sequencing. The MDA-generated DNA also performs well when used directly as template for pyrosequencing by the 454 Life Sciences method. While MDA from single cells loses some of the genomic sequence, this approach will greatly accelerate the pace of sequencing from uncultured microbes. The genetically linked sequences from single cells are also a powerful tool to be used in guiding genomic assembly of shotgun sequences of multiple organisms from environmental DNA extracts (metagenomic sequences).

  17. Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels

    NASA Astrophysics Data System (ADS)

    Moghaddasi, Hanieh; Khalifeh, Khosrow; Darooneh, Amir Hossein

    2017-01-01

    Functional DNA sub-sequences and genome elements are spatially clustered through the genome just as keywords in literary texts. Therefore, some of the methods for ranking words in texts can also be used to compare different DNA sub-sequences. In analogy with the literary texts, here we claim that the distribution of distances between the successive sub-sequences (words) is q-exponential which is the distribution function in non-extensive statistical mechanics. Thus the q-parameter can be used as a measure of words clustering levels. Here, we analyzed the distribution of distances between consecutive occurrences of 16 possible dinucleotides in human chromosomes to obtain their corresponding q-parameters. We found that CG as a biologically important two-letter word concerning its methylation, has the highest clustering level. This finding shows the predicting ability of the method in biology. We also proposed that chromosome 18 with the largest value of q-parameter for promoters of genes is more sensitive to dietary and lifestyle. We extended our study to compare the genome of some selected organisms and concluded that the clustering level of CGs increases in higher evolutionary organisms compared to lower ones.

  18. Scaling in nature: From DNA through heartbeats to weather

    NASA Astrophysics Data System (ADS)

    Havlin, S.; Buldyrev, S. V.; Bunde, A.; Goldberger, A. L.; Ivanov, P. Ch.; Peng, C.-K.; Stanley, H. E.

    1999-12-01

    The purpose of this talk is to describe some recent progress in applying scaling concepts to various systems in nature. We review several systems characterized by scaling laws such as DNA sequences, heartbeat rates and weather variations. We discuss the finding that the exponent α quantifying the scaling in DNA in smaller for coding than for noncoding sequences. We also discuss the application of fractal scaling analysis to the dynamics of heartbeat regulation, and report the recent finding that the scaling exponent α is smaller during sleep periods compared to wake periods. We also discuss the recent findings that suggest a universal scaling exponent characterizing the weather fluctuations.

  19. Scaling in nature: from DNA through heartbeats to weather

    NASA Technical Reports Server (NTRS)

    Havlin, S.; Buldyrev, S. V.; Bunde, A.; Goldberger, A. L.; Peng, C. K.; Stanley, H. E.

    1999-01-01

    The purpose of this report is to describe some recent progress in applying scaling concepts to various systems in nature. We review several systems characterized by scaling laws such as DNA sequences, heartbeat rates and weather variations. We discuss the finding that the exponent alpha quantifying the scaling in DNA in smaller for coding than for noncoding sequences. We also discuss the application of fractal scaling analysis to the dynamics of heartbeat regulation, and report the recent finding that the scaling exponent alpha is smaller during sleep periods compared to wake periods. We also discuss the recent findings that suggest a universal scaling exponent characterizing the weather fluctuations.

  20. Nucleosome core particles containing a poly(dA.dT) sequence element exhibit a locally distorted DNA structure.

    PubMed

    Bao, Yunhe; White, Cindy L; Luger, Karolin

    2006-08-25

    Poly(dA.dT) DNA sequence elements are thought to promote transcription by either excluding nucleosomes or by altering their structural or dynamic properties. Here, the stability and structure of a defined nucleosome core particle containing a 16 base-pair poly(dA.dT) element (A16 NCP) was investigated. The A16 NCP requires a significantly higher temperature for histone octamer sliding in vitro compared to comparable nucleosomes that do not contain a poly(dA.dT) element. Fluorescence resonance energy transfer showed that the interactions between the nucleosomal DNA ends and the histone octamer were destabilized in A16 NCP. The crystal structure of A16 NCP was determined to a resolution of 3.2 A. The overall structure was maintained except for local deviations in DNA conformation. These results are consistent with previous in vivo and in vitro observations that poly(dA.dT) elements cause only modest changes in DNA accessibility and modest increases in steady-state transcription levels.

  1. Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes

    PubMed Central

    Doerr, Daniel; Chauve, Cedric

    2017-01-01

    Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95 % of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains. PMID:29114402

  2. MPN estimation of qPCR target sequence recoveries from whole cell calibrator samples.

    PubMed

    Sivaganesan, Mano; Siefring, Shawn; Varma, Manju; Haugland, Richard A

    2011-12-01

    DNA extracts from enumerated target organism cells (calibrator samples) have been used for estimating Enterococcus cell equivalent densities in surface waters by a comparative cycle threshold (Ct) qPCR analysis method. To compare surface water Enterococcus density estimates from different studies by this approach, either a consistent source of calibrator cells must be used or the estimates must account for any differences in target sequence recoveries from different sources of calibrator cells. In this report we describe two methods for estimating target sequence recoveries from whole cell calibrator samples based on qPCR analyses of their serially diluted DNA extracts and most probable number (MPN) calculation. The first method employed a traditional MPN calculation approach. The second method employed a Bayesian hierarchical statistical modeling approach and a Monte Carlo Markov Chain (MCMC) simulation method to account for the uncertainty in these estimates associated with different individual samples of the cell preparations, different dilutions of the DNA extracts and different qPCR analytical runs. The two methods were applied to estimate mean target sequence recoveries per cell from two different lots of a commercially available source of enumerated Enterococcus cell preparations. The mean target sequence recovery estimates (and standard errors) per cell from Lot A and B cell preparations by the Bayesian method were 22.73 (3.4) and 11.76 (2.4), respectively, when the data were adjusted for potential false positive results. Means were similar for the traditional MPN approach which cannot comparably assess uncertainty in the estimates. Cell numbers and estimates of recoverable target sequences in calibrator samples prepared from the two cell sources were also used to estimate cell equivalent and target sequence quantities recovered from surface water samples in a comparative Ct method. Our results illustrate the utility of the Bayesian method in accounting for uncertainty, the high degree of precision attainable by the MPN approach and the need to account for the differences in target sequence recoveries from different calibrator sample cell sources when they are used in the comparative Ct method. Published by Elsevier B.V.

  3. Multiplexed detection of DNA sequences using a competitive displacement assay in a microfluidic SERRS-based device.

    PubMed

    Yazdi, Soroush H; Giles, Kristen L; White, Ian M

    2013-11-05

    We demonstrate sensitive and multiplexed detection of DNA sequences through a surface enhanced resonance Raman spectroscopy (SERRS)-based competitive displacement assay in an integrated microsystem. The use of the competitive displacement scheme, in which the target DNA sequence displaces a Raman-labeled reporter sequence that has lower affinity for the immobilized probe, enables detection of unlabeled target DNA sequences with a simple single-step procedure. In our implementation, the displacement reaction occurs in a microporous packed column of silica beads prefunctionalized with probe-reporter pairs. The use of a functionalized packed-bead column in a microfluidic channel provides two major advantages: (i) immobilization surface chemistry can be performed as a batch process instead of on a chip-by-chip basis, and (ii) the microporous network eliminates the diffusion limitations of a typical biological assay, which increases the sensitivity. Packed silica beads are also leveraged to improve the SERRS detection of the Raman-labeled reporter. Following displacement, the reporter adsorbs onto aggregated silver nanoparticles in a microfluidic mixer; the nanoparticle-reporter conjugates are then trapped and concentrated in the silica bead matrix, which leads to a significant increase in plasmonic nanoparticles and adsorbed Raman reporters within the detection volume as compared to an open microfluidic channel. The experimental results reported here demonstrate detection down to 100 pM of the target DNA sequence, and the experiments are shown to be specific, repeatable, and quantitative. Furthermore, we illustrate the advantage of using SERRS by demonstrating multiplexed detection. The sensitivity of the assay, combined with the advantages of multiplexed detection and single-step operation with unlabeled target sequences makes this method attractive for practical applications. Importantly, while we illustrate DNA sequence detection, the SERRS-based competitive displacement assay is applicable to detection of a variety of biological macromolecules, including proteins and proteolytic enzymes.

  4. Characterization of monomeric DNA-binding protein Histone H1 in Leishmania braziliensis.

    PubMed

    Carmelo, Emma; González, Gloria; Cruz, Teresa; Osuna, Antonio; Hernández, Mariano; Valladares, Basilio

    2011-08-01

    Histone H1 in Leishmania presents relevant differences compared to higher eukaryote counterparts, such as the lack of a DNA-binding central globular domain. Despite that, it is apparently fully functional since its differential expression levels have been related to changes in chromatin condensation and infectivity, among other features. The localization and the aggregation state of L. braziliensis H1 has been determined by immunolocalization, mass spectrometry, cross-linking and electrophoretic mobility shift assays. Analysis of H1 sequences from the Leishmania Genome Database revealed that our protein is included in a very divergent group of histones H1 that is present only in L. braziliensis. An antibody raised against recombinant L. braziliensis H1 recognized specifically that protein by immunoblot in L. braziliensis extracts, but not in other Leishmania species, a consequence of the sequence divergences observed among Leishmania species. Mass spectrometry analysis and in vitro DNA-binding experiments have also proven that L. braziliensis H1 is monomeric in solution, but oligomerizes upon binding to DNA. Finally, despite the lack of a globular domain, L. braziliensis H1 is able to form complexes with DNA in vitro, with higher affinity for supercoiled compared to linear DNA.

  5. Molecular Systematic of Three Species of Oithona (Copepoda, Cyclopoida) from the Atlantic Ocean: Comparative Analysis Using 28S rDNA

    PubMed Central

    Cepeda, Georgina D.; Blanco-Bercial, Leocadio; Bucklin, Ann; Berón, Corina M.; Viñas, María D.

    2012-01-01

    Species of Oithona (Copepoda, Cyclopoida) are highly abundant, ecologically important, and widely distributed throughout the world oceans. Although there are valid and detailed descriptions of the species, routine species identifications remain challenging due to their small size, subtle morphological diagnostic traits, and the description of geographic forms or varieties. This study examined three species of Oithona (O. similis, O. atlantica and O. nana) occurring in the Argentine sector of the South Atlantic Ocean based on DNA sequence variation of a 575 base-pair region of 28S rDNA, with comparative analysis of these species from other North and South Atlantic regions. DNA sequence variation clearly resolved and discriminated the species, and revealed low levels of intraspecific variation among North and South Atlantic populations of each species. The 28S rDNA region was thus shown to provide an accurate and reliable means of identifying the species throughout the sampled domain. Analysis of 28S rDNA variation for additional species collected throughout the global ocean will be useful to accurately characterize biogeographical distributions of the species and to examine phylogenetic relationships among them. PMID:22558245

  6. Human placental lactogen mRNA and its structural genes during pregnancy: quantitation with a complementary DNA.

    PubMed Central

    McWilliams, D; Callahan, R C; Boime, I

    1977-01-01

    A complementary DNA (cDNA) strand was transcribed from human placental lactogen (hPL) mRNA. Based on alkaline sucrose gradient centrifugation, the size of the cDNA was about 8 S, which would represent at least 80% of the hPL mRNA. Previously we showed that four to five times more hPL was synthesized in cell-free extracts derived from term as compared to first trimester placentas. Hybridization of the cDNA with RNA derived from placental tissue revealed that there was about four times more hPL mRNA sequences in total RNA from term placenta than in a comparable quantity of total first trimester RNA. Only background hybridization was observed when the cDNA was incubated with RNA prepared from human kidney. To test if this differential accumulation of hPL mRNA was the result of an amplification of hPL genes, we hybridized the labeled cDNA with cellular DNA from first trimester and term placentas and with DNA isolated from human brain. In all cases, the amount of hPL sequences was approximately two copies per haploid genome. Thus, the enhanced synthesis of hPL mRNA appears to result from a transcriptional activation rather than an amplification of the hPL gene. The increase likely reflects placental differentiation in which the proportion of syncytial trophoblast increases at term. Images PMID:66681

  7. Comparison of large-insert, small-insert and pyrosequencing libraries for metagenomic analysis.

    PubMed

    Danhorn, Thomas; Young, Curtis R; DeLong, Edward F

    2012-11-01

    The development of DNA sequencing methods for characterizing microbial communities has evolved rapidly over the past decades. To evaluate more traditional, as well as newer methodologies for DNA library preparation and sequencing, we compared fosmid, short-insert shotgun and 454 pyrosequencing libraries prepared from the same metagenomic DNA samples. GC content was elevated in all fosmid libraries, compared with shotgun and 454 libraries. Taxonomic composition of the different libraries suggested that this was caused by a relative underrepresentation of dominant taxonomic groups with low GC content, notably Prochlorales and the SAR11 cluster, in fosmid libraries. While these abundant taxa had a large impact on library representation, we also observed a positive correlation between taxon GC content and fosmid library representation in other low-GC taxa, suggesting a general trend. Analysis of gene category representation in different libraries indicated that the functional composition of a library was largely a reflection of its taxonomic composition, and no additional systematic biases against particular functional categories were detected at the level of sequencing depth in our samples. Another important but less predictable factor influencing the apparent taxonomic and functional library composition was the read length afforded by the different sequencing technologies. Our comparisons and analyses provide a detailed perspective on the influence of library type on the recovery of microbial taxa in metagenomic libraries and underscore the different uses and utilities of more traditional, as well as contemporary 'next-generation' DNA library construction and sequencing technologies for exploring the genomics of the natural microbial world.

  8. G-Anchor: a novel approach for whole-genome comparative mapping utilizing evolutionary conserved DNA sequences.

    PubMed

    Lenis, Vasileios Panagiotis E; Swain, Martin; Larkin, Denis M

    2018-05-01

    Cross-species whole-genome sequence alignment is a critical first step for genome comparative analyses, ranging from the detection of sequence variants to studies of chromosome evolution. Animal genomes are large and complex, and whole-genome alignment is a computationally intense process, requiring expensive high-performance computing systems due to the need to explore extensive local alignments. With hundreds of sequenced animal genomes available from multiple projects, there is an increasing demand for genome comparative analyses. Here, we introduce G-Anchor, a new, fast, and efficient pipeline that uses a strictly limited but highly effective set of local sequence alignments to anchor (or map) an animal genome to another species' reference genome. G-Anchor makes novel use of a databank of highly conserved DNA sequence elements. We demonstrate how these elements may be aligned to a pair of genomes, creating anchors. These anchors enable the rapid mapping of scaffolds from a de novo assembled genome to chromosome assemblies of a reference species. Our results demonstrate that G-Anchor can successfully anchor a vertebrate genome onto a phylogenetically related reference species genome using a desktop or laptop computer within a few hours and with comparable accuracy to that achieved by a highly accurate whole-genome alignment tool such as LASTZ. G-Anchor thus makes whole-genome comparisons accessible to researchers with limited computational resources. G-Anchor is a ready-to-use tool for anchoring a pair of vertebrate genomes. It may be used with large genomes that contain a significant fraction of evolutionally conserved DNA sequences and that are not highly repetitive, polypoid, or excessively fragmented. G-Anchor is not a substitute for whole-genome aligning software but can be used for fast and accurate initial genome comparisons. G-Anchor is freely available and a ready-to-use tool for the pairwise comparison of two genomes.

  9. Prediction of TF target sites based on atomistic models of protein-DNA complexes

    PubMed Central

    Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno

    2008-01-01

    Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190

  10. Molecular cloning of a gene encoding translation initiation factor (TIF) from Candida albicans.

    PubMed

    Mirbod, F; Nakashima, S; Kitajima, Y; Ghannoum, M A; Cannon, R D; Nozawa, Y

    1996-01-01

    The differential display technique was applied to compare mRNAs from two clinical isolates of Candida albicans with different virulence; high (potent strain, 16240) and low (weak strain, 18084) extracellular phospholipase activities. Complementary DNA fragments corresponding to several apparently differentially expressed mRNAs were recovered and sequenced. A complementary DNA fragment seen distinctly in the potent phospholipase producing strain was highly homologous to the yeast translation initiation factor (TIF). The selected DNA fragment was then used as a probe to isolate its corresponding complementary DNA clone from a library of C. albicans genomic DNA. The sequence of isolated gene revealed an open reading frame of 1194 nucleotides with the potential to encode a protein of 397 amino acids with a predicted molecular weight of 43 kDa. Over its entire length, the amino acid sequence showed strong homology (78-89%) to Saccharomyces cerevisiae TIF and (63-80%) to mouse eIF-4A proteins. Therefore, our C. albicans gene was identified to be TIF (Ca TIF). Northern blot analysis in the two strains of C. albicans revealed that Ca TIF expression is 1.5-fold higher in the potent phospholipase producing strain. The restriction endonuclease digestion of genomic DNA from this potent strain revealed at least two hybridized bands in Southern blot analysis, suggesting two or more closely related sequences in the C. albicans genome.

  11. Acquisition of New DNA Sequences After Infection of Chicken Cells with Avian Myeloblastosis Virus

    PubMed Central

    Shoyab, M.; Baluda, M. A.; Evans, R.

    1974-01-01

    DNA-RNA hybridization studies between 70S RNA from avian myeloblastosis virus (AMV) and an excess of DNA from (i) AMV-induced leukemic chicken myeloblasts or (ii) a mixture of normal and of congenitally infected K-137 chicken embryos producing avian leukosis viruses revealed the presence of fast- and slow-hybridizing virus-specific DNA sequences. However, the leukemic cells contained twice the level of AMV-specific DNA sequences observed in normal chicken embryonic cells. The fast-reacting sequences were two to three times more numerous in leukemic DNA than in DNA from the mixed embryos. The slow-reacting sequences had a reiteration frequency of approximately 9 and 6, in the two respective systems. Both the fast- and the slow-reacting DNA sequences in leukemic cells exhibited a higher Tm (2 C) than the respective DNA sequences in normal cells. In normal and leukemic cells the slow hybrid sequences appeared to have a Tm which was 2 C higher than that of the fast hybrid sequences. Individual non-virus-producing chicken embryos, either group-specific antigen positive or negative, contained 40 to 100 copies of the fast sequences and 2 to 6 copies of the slowly hybridizing sequences per cell genome. Normal rat cells did not contain DNA that hybridized with AMV RNA, whereas non-virus-producing rat cells transformed by B-77 avian sarcoma virus contained only the slowly reacting sequences. The results demonstrate that leukemic cells transformed by AMV contain new AMV-specific DNA sequences which were not present before infection. PMID:16789139

  12. Single-molecule analysis of DNA cross-links using nanopore technology

    NASA Astrophysics Data System (ADS)

    Wolna, Anna H.

    The alpha-hemolysin (alpha-HL) protein ion channel is a potential next-generation sequencing platform that has been extensively used to study nucleic acids at a single-molecule level. After applying a potential across a lipid bilayer, the imbedded alpha-HL allows monitoring of the duration and current levels of DNA translocation and immobilization. Because this method does not require DNA amplification prior to sequencing, all the DNA damage present in the cell at any given time will be present during the sequencing experiment. The goal of this research is to determine if these damage sites give distinguishable current levels beyond those observed for the canonical nucleobases. Because DNA cross-links are one of the most prevalent types of DNA damage occurring in vivo, the blockage current levels were determined for thymine-dimers, guanine(C8)-thymine(N3) cross-links and platinum adducts. All of these cross-links give a different blockage current level compared to the undamaged strands when immobilized in the ion channel, and they all can easily translocate across the alpha-HL channel. Additionally, the alpha-HL nanopore technique presents a unique opportunity to study the effects of DNA cross-links, such as thymine-dimers, on the secondary structure of DNA G-quadruplexes folded from the human telomere sequence. Using this single-molecule nanopore technique we can detect subtle structural differences that cannot be easily addressed using conventional methods. The human telomere plays crucial roles in maintaining genome stability. In the presence of suitable cations, the repetitive 5'-TTAGGG human telomere sequence can fold into G-quadruplexes that adopt the hybrid fold in vivo. The telomere sequence is hypersensitive to UV-induced thymine-dimer (T=T) formation, and yet the presence of thymine dimers does not cause telomere shortening. The potential structural disruption and thermodynamic stability of the T=T-containing natural telomere sequences were studied to understand how this damage is tolerated in telomeric DNA. The alpha-HL experiments determined that T=Ts disrupt double-chain reversal loop formation but are well tolerated in edgewise and diagonal loops of the hybrid G-quadruplexes. These studies demonstrated the power of the alpha-HL ion channel to analyze DNA modifications and secondary structures at a single-molecule level.

  13. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  14. Electronic Transport in Single-Stranded DNA Molecule Related to Huntington's Disease

    NASA Astrophysics Data System (ADS)

    Sarmento, R. G.; Silva, R. N. O.; Madeira, M. P.; Frazão, N. F.; Sousa, J. O.; Macedo-Filho, A.

    2018-04-01

    We report a numerical analysis of the electronic transport in single chain DNA molecule consisting of 182 nucleotides. The DNA chains studied were extracted from a segment of the human chromosome 4p16.3, which were modified by expansion of CAG (cytosine-adenine-guanine) triplet repeats to mimics Huntington's disease. The mutated DNA chains were connected between two platinum electrodes to analyze the relationship between charge propagation in the molecule and Huntington's disease. The computations were performed within a tight-binding model, together with a transfer matrix technique, to investigate the current-voltage (I-V) of 23 types of DNA sequence and compare them with the distributions of the related CAG repeat numbers with the disease. All DNA sequences studied have a characteristic behavior of a semiconductor. In addition, the results showed a direct correlation between the current-voltage curves and the distributions of the CAG repeat numbers, suggesting possible applications in the development of DNA-based biosensors for molecular diagnostics.

  15. Chromosomal characteristics and distribution of rDNA sequences in the brook trout Salvelinus fontinalis (Mitchill, 1814).

    PubMed

    Śliwińska-Jewsiewicka, A; Kuciński, M; Kirtiklis, L; Dobosz, S; Ocalewicz, K; Jankun, Malgorzata

    2015-08-01

    Brook trout Salvelinus fontinalis (Mitchill, 1814) chromosomes have been analyzed using conventional and molecular cytogenetic techniques enabling characteristics and chromosomal location of heterochromatin, nucleolus organizer regions (NORs), ribosomal RNA-encoding genes and telomeric DNA sequences. The C-banding and chromosome digestion with the restriction endonucleases demonstrated distribution and heterogeneity of the heterochromatin in the brook trout genome. DNA sequences of the ribosomal RNA genes, namely the nucleolus-forming 28S (major) and non-nucleolus-forming 5S (minor) rDNAs, were physically mapped using fluorescence in situ hybridization (FISH) and primed in situ labelling. The minor rDNA locus was located on the subtelo-acrocentric chromosome pair No. 9, whereas the major rDNA loci were dispersed on 14 chromosome pairs, showing a considerable inter-individual variation in the number and location. The major and minor rDNA loci were located at different chromosomes. Multichromosomal location (3-6 sites) of the NORs was demonstrated by silver nitrate (AgNO3) impregnation. All Ag-positive i.e. active NORs corresponded to the GC-rich blocks of heterochromatin. FISH with telomeric probe showed the presence of the interstitial telomeric site (ITS) adjacent to the NOR/28S rDNA site on the chromosome 11. This ITS was presumably remnant of the chromosome rearrangement(s) leading to the genomic redistribution of the rDNA sequences. Comparative analysis of the cytogenetic data among several related salmonid species confirmed huge variation in the number and the chromosomal location of rRNA gene clusters in the Salvelinus genome.

  16. Coupling Spore Traps and Quantitative PCR Assays for Detection of the Downy Mildew Pathogens of Spinach (Peronospora effusa) and Beet (P. schachtii)

    PubMed Central

    Klosterman, Steven J.; Anchieta, Amy; McRoberts, Neil; Koike, Steven T.; Subbarao, Krishna V.; Voglmayr, Hermann; Choi, Young-Joon; Thines, Marco; Martin, Frank N.

    2016-01-01

    Downy mildew of spinach (Spinacia oleracea), caused by Peronospora effusa, is a production constraint on production worldwide, including in California, where the majority of U.S. spinach is grown. The aim of this study was to develop a real-time quantitative polymerase chain reaction (qPCR) assay for detection of airborne inoculum of P. effusa in California. Among oomycete ribosomal DNA (rDNA) sequences examined for assay development, the highest nucleotide sequence identity was observed between rDNA sequences of P. effusa and P. schachtii, the cause of downy mildew on sugar beet and Swiss chard in the leaf beet group (Beta vulgaris subsp. vulgaris). Single-nucleotide polymorphisms were detected between P. effusa and P. schachtii in the 18S rDNA regions for design of P. effusa- and P. schachtii-specific TaqMan probes and reverse primers. An allele-specific probe and primer amplification method was applied to determine the frequency of both P. effusa and P. schachtii rDNA target sequences in pooled DNA samples, enabling quantification of rDNA of P. effusa from impaction spore trap samples collected from spinach production fields. The rDNA copy numbers of P. effusa were, on average, ≈3,300-fold higher from trap samples collected near an infected field compared with those levels recorded at a site without a nearby spinach field. In combination with disease-conducive weather forecasting, application of the assays may be helpful to time fungicide applications for disease management. PMID:24964150

  17. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    PubMed

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well.

  18. Programmable DNA-binding proteins from Burkholderia provide a fresh perspective on the TALE-like repeat domain.

    PubMed

    de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas

    2014-06-01

    The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Directed evolution of the TALE N-terminal domain for recognition of all 5' bases.

    PubMed

    Lamb, Brian M; Mercer, Andrew C; Barbas, Carlos F

    2013-11-01

    Transcription activator-like effector (TALE) proteins can be designed to bind virtually any DNA sequence. General guidelines for design of TALE DNA-binding domains suggest that the 5'-most base of the DNA sequence bound by the TALE (the N0 base) should be a thymine. We quantified the N0 requirement by analysis of the activities of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R) and TALE nucleases (TALENs) with each DNA base at this position. In the absence of a 5' T, we observed decreases in TALE activity up to >1000-fold in TALE-TF activity, up to 100-fold in TALE-R activity and up to 10-fold reduction in TALEN activity compared with target sequences containing a 5' T. To develop TALE architectures that recognize all possible N0 bases, we used structure-guided library design coupled with TALE-R activity selections to evolve novel TALE N-terminal domains to accommodate any N0 base. A G-selective domain and broadly reactive domains were isolated and characterized. The engineered TALE domains selected in the TALE-R format demonstrated modularity and were active in TALE-TF and TALEN architectures. Evolved N-terminal domains provide effective and unconstrained TALE-based targeting of any DNA sequence as TALE binding proteins and designer enzymes.

  20. Genome-wide comparison of medieval and modern Mycobacterium leprae.

    PubMed

    Schuenemann, Verena J; Singh, Pushpendra; Mendum, Thomas A; Krause-Kyora, Ben; Jäger, Günter; Bos, Kirsten I; Herbig, Alexander; Economou, Christos; Benjak, Andrej; Busso, Philippe; Nebel, Almut; Boldsen, Jesper L; Kjellström, Anna; Wu, Huihai; Stewart, Graham R; Taylor, G Michael; Bauer, Peter; Lee, Oona Y-C; Wu, Houdini H T; Minnikin, David E; Besra, Gurdyal S; Tucker, Katie; Roffey, Simon; Sow, Samba O; Cole, Stewart T; Nieselt, Kay; Krause, Johannes

    2013-07-12

    Leprosy was endemic in Europe until the Middle Ages. Using DNA array capture, we have obtained genome sequences of Mycobacterium leprae from skeletons of five medieval leprosy cases from the United Kingdom, Sweden, and Denmark. In one case, the DNA was so well preserved that full de novo assembly of the ancient bacterial genome could be achieved through shotgun sequencing alone. The ancient M. leprae sequences were compared with those of 11 modern strains, representing diverse genotypes and geographic origins. The comparisons revealed remarkable genomic conservation during the past 1000 years, a European origin for leprosy in the Americas, and the presence of an M. leprae genotype in medieval Europe now commonly associated with the Middle East. The exceptional preservation of M. leprae biomarkers, both DNA and mycolic acids, in ancient skeletons has major implications for palaeomicrobiology and human pathogen evolution.

  1. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  2. Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.

    PubMed

    Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N

    1984-03-26

    The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic DNA methylation between sperm and oocyte DNA. The methylation levels of the minor satellite sequences did not change during spermiogenesis, and were not associated with the onset of meiosis or a specific stage in sperm development.

  3. Process of labeling specific chromosomes using recombinant repetitive DNA

    DOEpatents

    Moyzis, R.K.; Meyne, J.

    1988-02-12

    Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.

  4. Bicycle: a bioinformatics pipeline to analyze bisulfite sequencing data.

    PubMed

    Graña, Osvaldo; López-Fernández, Hugo; Fdez-Riverola, Florentino; González Pisano, David; Glez-Peña, Daniel

    2018-04-15

    High-throughput sequencing of bisulfite-converted DNA is a technique used to measure DNA methylation levels. Although a considerable number of computational pipelines have been developed to analyze such data, none of them tackles all the peculiarities of the analysis together, revealing limitations that can force the user to manually perform additional steps needed for a complete processing of the data. This article presents bicycle, an integrated, flexible analysis pipeline for bisulfite sequencing data. Bicycle analyzes whole genome bisulfite sequencing data, targeted bisulfite sequencing data and hydroxymethylation data. To show how bicycle overtakes other available pipelines, we compared them on a defined number of features that are summarized in a table. We also tested bicycle with both simulated and real datasets, to show its level of performance, and compared it to different state-of-the-art methylation analysis pipelines. Bicycle is publicly available under GNU LGPL v3.0 license at http://www.sing-group.org/bicycle. Users can also download a customized Ubuntu LiveCD including bicycle and other bisulfite sequencing data pipelines compared here. In addition, a docker image with bicycle and its dependencies, which allows a straightforward use of bicycle in any platform (e.g. Linux, OS X or Windows), is also available. ograna@cnio.es or dgpena@uvigo.es. Supplementary data are available at Bioinformatics online.

  5. Non-invasive method to obtain DNA from freshwater mussels (Bivalvia: Unionidae)

    USGS Publications Warehouse

    Henley, W.F.; Grobler, P.J.; Neves, R.J.

    2006-01-01

    To determine whether DNA could be isolated from tissues obtained by brush-swabbing the mantle, viscera and foot, mantle-clips and swabbed cells were obtained from eight Quadrula pustulosa (Lea, 1831). DNA yields from clips and swabbings were 447.0 and 975.3 ??g/??L, respectively. Furthermore, comparisons of sequences from the ND-1 mitochondrial gene region showed a 100% sequence agreement of DNA from cells obtained by clips and swabs. To determine the number of swabs needed to obtain adequate yields of DNA for analyses, the visceras and feet of 5 Q. pustulosa each were successively swabbed 2, 4 and 6 times. DNA yields from the 2, 4 and 6 swabbed mussel groups were 399.4, 833.8 and 852.6 ng/??L, respectively. ND-1 sequences from the lowest yield still provided 846-901 bp for the ND-1 region. Nevertheless, to ensure adequate DNA yield from cell samples obtained by swabbing, we recommend that 4 swab-strokes of the viscera and foot be obtained. The use of integumental swabbing for collection of cells for determination of genetic relationships among freshwater mussels is noninvasive, when compared with tissue collection by mantle-clipping. Therefore, its use is recommended for freshwater mussels, especially state-protected or federally listed mussel species.

  6. Sequence analysis of three mitochondrial DNA molecules reveals interesting differences among Saccharomyces yeasts

    PubMed Central

    Langkjær, R. B.; Casaregola, S.; Ussery, D. W.; Gaillardin, C.; Piškur, J.

    2003-01-01

    The complete sequences of mitochondrial DNA (mtDNA) from the two budding yeasts Saccharomyces castellii and Saccharomyces servazzii, consisting of 25 753 and 30 782 bp, respectively, were analysed and compared to Saccharomyces cerevisiae mtDNA. While some of the traits are very similar among Saccharomyces yeasts, others have highly diverged. The two mtDNAs are much more compact than that of S.cerevisiae and contain fewer introns and intergenic sequences, although they have almost the same coding potential. A few genes contain group I introns, but group II introns, otherwise found in S.cerevisiae mtDNA, are not present. Surprisingly, four genes (ATP6, COX2, COX3 and COB) in the mtDNA of S.servazzii contain, in total, five +1 frameshifts. mtDNAs of S.castellii, S.servazzii and S.cerevisiae contain all genes on the same strand, except for one tRNA gene. On the other hand, the gene order is very different. Several gene rearrangements have taken place upon separation of the Saccharomyces lineages, and even a part of the transcription units have not been preserved. It seems that the mechanism(s) involved in the generation of the rearrangements has had to ensure that all genes stayed encoded by the same DNA strand. PMID:12799436

  7. Massively parallel pyrosequencing of the mitochondrial genome with the 454 methodology in forensic genetics.

    PubMed

    Mikkelsen, Martin; Frank-Hansen, Rune; Hansen, Anders J; Morling, Niels

    2014-09-01

    of sequencing of whole mitochondrial genome, HV1 and HV2 DNA with the second generation system (SGS) Roche 454 GS Junior were compared with results of Sanger sequencing and SNP typing with SNaPshot single base extension detected with MALDI-TOF and capillary electrophoresis. We investigated the performance of the software analysis of the data, reproducibility, ability to sequence homopolymeric regions, detection of mixtures and heteroplasmy as well as the implications of the depth of coverage. We found full reproducibility between samples sequenced twice with SGS. We found close to full concordance between the mtDNA sequences of 26 samples obtained with (1) the 454 SGS method using a depth of coverage above 100 and (2) Sanger sequencing and SNP typing. The discrepancies were primarily observed in homopolymeric regions. The 454 SGS method was able to sequence 95% of the reads correctly in homopolymers up to 4 bases, and up to 6 bases could be sequenced with similar success if the results were carefully, visually inspected. The 454 technology was able to detect mixtures or heteroplasmy of approximately 10%. We detected previously unreported heteroplasmy in the GM9947A component of the NIST human mitochondrial DNA SRM-2392 standard reference material. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  8. Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure

    PubMed Central

    Coyne, Robert S; Thiagarajan, Mathangi; Jones, Kristie M; Wortman, Jennifer R; Tallon, Luke J; Haas, Brian J; Cassidy-Hanley, Donna M; Wiley, Emily A; Smith, Joshua J; Collins, Kathleen; Lee, Suzanne R; Couvillion, Mary T; Liu, Yifan; Garg, Jyoti; Pearlman, Ronald E; Hamilton, Eileen P; Orias, Eduardo; Eisen, Jonathan A; Methé, Barbara A

    2008-01-01

    Background Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes. PMID:19036158

  9. Idiopathic slow transit constipation and megacolon are not associated with neurturin mutations.

    PubMed

    Chen, B; Knowles, C H; Scott, M; Anand, P; Williams, N S; Milbrandt, J; Tam, P K H

    2002-10-01

    Chronic idiopathic slow-transit constipation (ISTC) and idiopathic megacolon (IMC) are early-onset gastrointestinal motility disorders of unknown aetiology. The gene encoding the neurotrophic factor neurturin may be a candidate for these disorders, as neurturin-deficient mice have a similar enteric phenotype. In the present study, we tested this hypothesis. Genomic DNA from 26 cases of chronic idiopathic STC [with a family history of constipation in 15 (58%) and Hirschsprung's disease in two (8%)], and five cases of IMC [two familial (40%)] was screened by direct DNA sequencing using the fluorescent dideoxy terminator method. Results were compared with published sequence data and 24 control DNAs. Our results revealed several previously unreported common sequence polymorphisms, but overall frequencies were comparable between patients and controls. We conclude that mutation of neurturin is not a frequent cause of ISTC or IMC.

  10. Enlightenment of Yeast Mitochondrial Homoplasmy: Diversified Roles of Gene Conversion

    PubMed Central

    Ling, Feng; Mikawa, Tsutomu; Shibata, Takehiko

    2011-01-01

    Mitochondria have their own genomic DNA. Unlike the nuclear genome, each cell contains hundreds to thousands of copies of mitochondrial DNA (mtDNA). The copies of mtDNA tend to have heterogeneous sequences, due to the high frequency of mutagenesis, but are quickly homogenized within a cell (“homoplasmy”) during vegetative cell growth or through a few sexual generations. Heteroplasmy is strongly associated with mitochondrial diseases, diabetes and aging. Recent studies revealed that the yeast cell has the machinery to homogenize mtDNA, using a common DNA processing pathway with gene conversion; i.e., both genetic events are initiated by a double-stranded break, which is processed into 3′ single-stranded tails. One of the tails is base-paired with the complementary sequence of the recipient double-stranded DNA to form a D-loop (homologous pairing), in which repair DNA synthesis is initiated to restore the sequence lost by the breakage. Gene conversion generates sequence diversity, depending on the divergence between the donor and recipient sequences, especially when it occurs among a number of copies of a DNA sequence family with some sequence variations, such as in immunoglobulin diversification in chicken. MtDNA can be regarded as a sequence family, in which the members tend to be diversified by a high frequency of spontaneous mutagenesis. Thus, it would be interesting to determine why and how double-stranded breakage and D-loop formation induce sequence homogenization in mitochondria and sequence diversification in nuclear DNA. We will review the mechanisms and roles of mtDNA homoplasmy, in contrast to nuclear gene conversion, which diversifies gene and genome sequences, to provide clues toward understanding how the common DNA processing pathway results in such divergent outcomes. PMID:24710143

  11. Quantum Point Contact Single-Nucleotide Conductance for DNA and RNA Sequence Identification.

    PubMed

    Afsari, Sepideh; Korshoj, Lee E; Abel, Gary R; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant

    2017-11-28

    Several nanoscale electronic methods have been proposed for high-throughput single-molecule nucleic acid sequence identification. While many studies display a large ensemble of measurements as "electronic fingerprints" with some promise for distinguishing the DNA and RNA nucleobases (adenine, guanine, cytosine, thymine, and uracil), important metrics such as accuracy and confidence of base calling fall well below the current genomic methods. Issues such as unreliable metal-molecule junction formation, variation of nucleotide conformations, insufficient differences between the molecular orbitals responsible for single-nucleotide conduction, and lack of rigorous base calling algorithms lead to overlapping nanoelectronic measurements and poor nucleotide discrimination, especially at low coverage on single molecules. Here, we demonstrate a technique for reproducible conductance measurements on conformation-constrained single nucleotides and an advanced algorithmic approach for distinguishing the nucleobases. Our quantum point contact single-nucleotide conductance sequencing (QPICS) method uses combed and electrostatically bound single DNA and RNA nucleotides on a self-assembled monolayer of cysteamine molecules. We demonstrate that by varying the applied bias and pH conditions, molecular conductance can be switched ON and OFF, leading to reversible nucleotide perturbation for electronic recognition (NPER). We utilize NPER as a method to achieve >99.7% accuracy for DNA and RNA base calling at low molecular coverage (∼12×) using unbiased single measurements on DNA/RNA nucleotides, which represents a significant advance compared to existing sequencing methods. These results demonstrate the potential for utilizing simple surface modifications and existing biochemical moieties in individual nucleobases for a reliable, direct, single-molecule, nanoelectronic DNA and RNA nucleotide identification method for sequencing.

  12. "First generation" automated DNA sequencing technology.

    PubMed

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  13. cDNA cloning, functional expression and cellular localization of rat liver mitochondrial electron-transfer flavoprotein-ubiquinone oxidoreductase protein.

    PubMed

    Huang, Shengbing; Song, Wei; Lin, Qishui

    2005-08-01

    A membrane-bound protein was purified from rat liver mitochondria. After being digested with V8 protease, two peptides containing identical 14 amino acid residue sequences were obtained. Using the 14 amino acid peptide derived DNA sequence as gene specific primer, the cDNA of correspondent gene 5'-terminal and 3'-terminal were obtained by RACE technique. The full-length cDNA that encoded a protein of 616 amino acids was thus cloned, which included the above mentioned peptide sequence. The full length cDNA was highly homologous to that of human ETF-QO, indicating that it may be the cDNA of rat ETF-QO. ETF-QO is an iron sulfur protein located in mitochondria inner membrane containing two kinds of redox center: FAD and [4Fe-4S] center. After comparing the sequence from the cDNA of the 616 amino acids protein with that of the mature protein of rat liver mitochondria, it was found that the N terminal 32 amino acid residues did not exist in the mature protein, indicating that the cDNA was that of ETF-QOp. When the cDNA was expressed in Saccharomyces cerevisiae with inducible vectors, the protein product was enriched in mitochondrial fraction and exhibited electron transfer activity (NBT reductase activity) of ETF-QO. Results demonstrated that the 32 amino acid peptide was a mitochondrial targeting peptide, and both FAD and iron-sulfur cluster were inserted properly into the expressed ETF-QO. ETF-QO had a high level expression in rat heart, liver and kidney. The fusion protein of GFP-ETF-QO co-localized with mitochondria in COS-7 cells.

  14. Sampling and pyrosequencing methods for characterizing bacterial communities in the human gut using 16S sequence tags.

    PubMed

    Wu, Gary D; Lewis, James D; Hoffmann, Christian; Chen, Ying-Yu; Knight, Rob; Bittinger, Kyle; Hwang, Jennifer; Chen, Jun; Berkowsky, Ronald; Nessel, Lisa; Li, Hongzhe; Bushman, Frederic D

    2010-07-30

    Intense interest centers on the role of the human gut microbiome in health and disease, but optimal methods for analysis are still under development. Here we present a study of methods for surveying bacterial communities in human feces using 454/Roche pyrosequencing of 16S rRNA gene tags. We analyzed fecal samples from 10 individuals and compared methods for storage, DNA purification and sequence acquisition. To assess reproducibility, we compared samples one cm apart on a single stool specimen for each individual. To analyze storage methods, we compared 1) immediate freezing at -80 degrees C, 2) storage on ice for 24 or 3) 48 hours. For DNA purification methods, we tested three commercial kits and bead beating in hot phenol. Variations due to the different methodologies were compared to variation among individuals using two approaches--one based on presence-absence information for bacterial taxa (unweighted UniFrac) and the other taking into account their relative abundance (weighted UniFrac). In the unweighted analysis relatively little variation was associated with the different analytical procedures, and variation between individuals predominated. In the weighted analysis considerable variation was associated with the purification methods. Particularly notable was improved recovery of Firmicutes sequences using the hot phenol method. We also carried out surveys of the effects of different 454 sequencing methods (FLX versus Titanium) and amplification of different 16S rRNA variable gene segments. Based on our findings we present recommendations for protocols to collect, process and sequence bacterial 16S rDNA from fecal samples--some major points are 1) if feasible, bead-beating in hot phenol or use of the PSP kit improves recovery; 2) storage methods can be adjusted based on experimental convenience; 3) unweighted (presence-absence) comparisons are less affected by lysis method.

  15. Birth and death of genes linked to chromosomal inversion

    PubMed Central

    Furuta, Yoshikazu; Kawai, Mikihiko; Yahara, Koji; Takahashi, Noriko; Handa, Naofumi; Tsuru, Takeshi; Oshima, Kenshiro; Yoshida, Masaru; Azuma, Takeshi; Hattori, Masahira; Uchiyama, Ikuo; Kobayashi, Ichizo

    2011-01-01

    The birth and death of genes is central to adaptive evolution, yet the underlying genome dynamics remain elusive. The availability of closely related complete genome sequences helps to follow changes in gene contents and clarify their relationship to overall genome organization. Helicobacter pylori, bacteria in our stomach, are known for their extreme genome plasticity through mutation and recombination and will make a good target for such an analysis. In comparing their complete genome sequences, we found that gain and loss of genes (loci) for outer membrane proteins, which mediate host interaction, occurred at breakpoints of chromosomal inversions. Sequence comparison there revealed a unique mechanism of DNA duplication: DNA duplication associated with inversion. In this process, a DNA segment at one chromosomal locus is copied and inserted, in an inverted orientation, into a distant locus on the same chromosome, while the entire region between these two loci is also inverted. Recognition of this and three more inversion modes, which occur through reciprocal recombination between long or short sequence similarity or adjacent to a mobile element, allowed reconstruction of synteny evolution through inversion events in this species. These results will guide the interpretation of extensive DNA sequencing results for understanding long- and short-term genome evolution in various organisms and in cancer cells. PMID:21212362

  16. Comparative performance of high-density oligonucleotide sequencing and dideoxynucleotide sequencing of HIV type 1 pol from clinical samples.

    PubMed

    Günthard, H F; Wong, J K; Ignacio, C C; Havlir, D V; Richman, D D

    1998-07-01

    The performance of the high-density oligonucleotide array methodology (GeneChip) in detecting drug resistance mutations in HIV-1 pol was compared with that of automated dideoxynucleotide sequencing (ABI) of clinical samples, viral stocks, and plasmid-derived NL4-3 clones. Sequences from 29 clinical samples (plasma RNA, n = 17; lymph node RNA, n = 5; lymph node DNA, n = 7) from 12 patients, from 6 viral stock RNA samples, and from 13 NL4-3 clones were generated by both methods. Editing was done independently by a different investigator for each method before comparing the sequences. In addition, NL4-3 wild type (WT) and mutants were mixed in varying concentrations and sequenced by both methods. Overall, a concordance of 99.1% was found for a total of 30,865 bases compared. The comparison of clinical samples (plasma RNA and lymph node RNA and DNA) showed a slightly lower match of base calls, 98.8% for 19,831 nucleotides compared (protease region, 99.5%, n = 8272; RT region, 98.3%, n = 11,316), than for viral stocks and NL4-3 clones (protease region, 99.8%; RT region, 99.5%). Artificial mixing experiments showed a bias toward calling wild-type bases by GeneChip. Discordant base calls are most likely due to differential detection of mixtures. The concordance between GeneChip and ABI was high and appeared dependent on the nature of the templates (directly amplified versus cloned) and the complexity of mixes.

  17. Comparison between TRF2 and TRF1 of their telomeric DNA-bound structures and DNA-binding activities

    PubMed Central

    Hanaoka, Shingo; Nagadoi, Aritaka; Nishimura, Yoshifumi

    2005-01-01

    Mammalian telomeres consist of long tandem arrays of double-stranded telomeric TTAGGG repeats packaged by the telomeric DNA-binding proteins TRF1 and TRF2. Both contain a similar C-terminal Myb domain that mediates sequence-specific binding to telomeric DNA. In a DNA complex of TRF1, only the single Myb-like domain consisting of three helices can bind specifically to double-stranded telomeric DNA. TRF2 also binds to double-stranded telomeric DNA. Although the DNA binding mode of TRF2 is likely identical to that of TRF1, TRF2 plays an important role in the t-loop formation that protects the ends of telomeres. Here, to clarify the details of the double-stranded telomeric DNA-binding modes of TRF1 and TRF2, we determined the solution structure of the DNA-binding domain of human TRF2 bound to telomeric DNA; it consists of three helices, and like TRF1, the third helix recognizes TAGGG sequence in the major groove of DNA with the N-terminal arm locating in the minor groove. However, small but significant differences are observed; in contrast to the minor groove recognition of TRF1, in which an arginine residue recognizes the TT sequence, a lysine residue of TRF2 interacts with the TT part. We examined the telomeric DNA-binding activities of both DNA-binding domains of TRF1 and TRF2 and found that TRF1 binds more strongly than TRF2. Based on the structural differences of both domains, we created several mutants of the DNA-binding domain of TRF2 with stronger binding activities compared to the wild-type TRF2. PMID:15608118

  18. Complete mitochondrial DNA sequence of a tadpole shrimp (Triops cancriformis) and analysis of museum samples.

    PubMed

    Umetsu, Kazuo; Iwabuchi, Naruki; Yuasa, Isao; Saitou, Naruya; Clark, Paul F; Boxshall, Geoff; Osawa, Motoki; Igarashi, Keiji

    2002-12-01

    The complete mitochondrial DNA (mtNDA) of the tadpole shrimp Triops cancriformis was sequenced. The sequence consisted of 15,101 bp with an A+T content of 69%. Its gene arrangement was identical with those sequences of the water flea (Daphnia pulex) and giant tiger prawn (Penaeus monodon), whereas it differed from that of the brine shrimp (Artemia franciscana) in the arrangement of its genes for tRNAs. Phylogenetic analysis revealed T. cancriformis to be more closely related to the water flea than to the brine shrimp and giant tiger prawn. We also compared the 16S rRNA sequences of five formalin-fixed tadpole shrimps that had been collected in five different locations and stored in a museum. The sequence divergence was in the range of 0-1.51%, suggesting that those samples were closely related to each other.

  19. Influence of DNA sequence on the structure of minicircles under torsional stress

    PubMed Central

    Wang, Qian; Irobalieva, Rossitza N.; Chiu, Wah; Schmid, Michael F.; Fogg, Jonathan M.; Zechiedrich, Lynn

    2017-01-01

    Abstract The sequence dependence of the conformational distribution of DNA under various levels of torsional stress is an important unsolved problem. Combining theory and coarse-grained simulations shows that the DNA sequence and a structural correlation due to topology constraints of a circle are the main factors that dictate the 3D structure of a 336 bp DNA minicircle under torsional stress. We found that DNA minicircle topoisomers can have multiple bend locations under high torsional stress and that the positions of these sharp bends are determined by the sequence, and by a positive mechanical correlation along the sequence. We showed that simulations and theory are able to provide sequence-specific information about individual DNA minicircles observed by cryo-electron tomography (cryo-ET). We provided a sequence-specific cryo-ET tomogram fitting of DNA minicircles, registering the sequence within the geometric features. Our results indicate that the conformational distribution of minicircles under torsional stress can be designed, which has important implications for using minicircle DNA for gene therapy. PMID:28609782

  20. Analysis of DNA Sequences by an Optical Time-Integrating Correlator: Proof-of-Concept Experiments.

    DTIC Science & Technology

    1992-05-01

    DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0 CUSTOM GENERATORS FOR DNA SEQUENCES 10 3.1 Hardware Design 10...of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5 Figure 4: Coarse analysis of a DNA sequence. 7 Figure 5: Fine...a 20-bases long database. 32 xiii LIST OF TABLES PAGE Table 1: Short representations of the DNA bases where each base is represented by 7-bits long

  1. In silico Analysis of 2085 Clones from a Normalized Rat Vestibular Periphery 3′ cDNA Library

    PubMed Central

    Roche, Joseph P.; Cioffi, Joseph A.; Kwitek, Anne E.; Erbe, Christy B.; Popper, Paul

    2005-01-01

    The inserts from 2400 cDNA clones isolated from a normalized Rattus norvegicus vestibular periphery cDNA library were sequenced and characterized. The Wackym-Soares vestibular 3′ cDNA library was constructed from the saccular and utricular maculae, the ampullae of all three semicircular canals and Scarpa's ganglia containing the somata of the primary afferent neurons, microdissected from 104 male and female rats. The inserts from 2400 randomly selected clones were sequenced from the 5′ end. Each sequence was analyzed using the BLAST algorithm compared to the Genbank nonredundant, rat genome, mouse genome and human genome databases to search for high homology alignments. Of the initial 2400 clones, 315 (13%) were found to be of poor quality and did not yield useful information, and therefore were eliminated from the analysis. Of the remaining 2085 sequences, 918 (44%) were found to represent 758 unique genes having useful annotations that were identified in databases within the public domain or in the published literature; these sequences were designated as known characterized sequences. 1141 sequences (55%) aligned with 1011 unique sequences had no useful annotations and were designated as known but uncharacterized sequences. Of the remaining 26 sequences (1%), 24 aligned with rat genomic sequences, but none matched previously described rat expressed sequence tags or mRNAs. No significant alignment to the rat or human genomic sequences could be found for the remaining 2 sequences. Of the 2085 sequences analyzed, 86% were singletons. The known, characterized sequences were analyzed with the FatiGO online data-mining tool (http://fatigo.bioinfo.cnio.es/) to identify level 5 biological process gene ontology (GO) terms for each alignment and to group alignments with similar or identical GO terms. Numerous genes were identified that have not been previously shown to be expressed in the vestibular system. Further characterization of the novel cDNA sequences may lead to the identification of genes with vestibular-specific functions. Continued analysis of the rat vestibular periphery transcriptome should provide new insights into vestibular function and generate new hypotheses. Physiological studies are necessary to further elucidate the roles of the identified genes and novel sequences in vestibular function. PMID:16103642

  2. Polyfluorophore Labels on DNA: Dramatic Sequence Dependence of Quenching

    PubMed Central

    Teo, Yin Nah; Wilson, James N.

    2010-01-01

    We describe studies carried out in the DNA context to test how a common fluorescence quencher, dabcyl, interacts with oligodeoxynu-cleoside fluorophores (ODFs)—a system of stacked, electronically interacting fluorophores built on a DNA scaffold. We tested twenty different tetrameric ODF sequences containing varied combinations and orderings of pyrene (Y), benzopyrene (B), perylene (E), dimethylaminostilbene (D), and spacer (S) monomers conjugated to the 3′ end of a DNA oligomer. Hybridization of this probe sequence to a dabcyl-labeled complementary strand resulted in strong quenching of fluorescence in 85% of the twenty ODF sequences. The high efficiency of quenching was also established by their large Stern–Volmer constants (KSV) of between 2.1 × 104 and 4.3 × 105M−1, measured with a free dabcyl quencher. Interestingly, quenching of ODFs displayed strong sequence dependence. This was particularly evident in anagrams of ODF sequences; for example, the sequence BYDS had a KSV that was approximately two orders of magnitude greater than that of BSDY, which has the same dye composition. Other anagrams, for example EDSY and ESYD, also displayed different responses upon quenching by dabcyl. Analysis of spectra showed that apparent excimer and exciplex emission bands were quenched with much greater efficiency compared to monomer emission bands by at least an order of magnitude. This suggests an important role played by delocalized excited states of the π stack of fluorophores in the amplified quenching of fluorescence. PMID:19780115

  3. Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.

    PubMed

    Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V

    2018-02-01

    Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.

  4. Comparison of impedimetric detection of DNA hybridization on the various biosensors based on modified glassy carbon electrodes with PANHS and nanomaterials of RGO and MWCNTs.

    PubMed

    Benvidi, Ali; Tezerjani, Marzieh Dehghan; Jahanbani, Shahriar; Mazloum Ardakani, Mohammad; Moshtaghioun, Seyed Mohammad

    2016-01-15

    In this research, we have developed lable free DNA biosensors based on modified glassy carbon electrodes (GCE) with reduced graphene oxide (RGO) and carbon nanotubes (MWCNTs) for detection of DNA sequences. This paper compares the detection of BRCA1 5382insC mutation using independent glassy carbon electrodes (GCE) modified with RGO and MWCNTs. A probe (BRCA1 5382insC mutation detection (ssDNA)) was then immobilized on the modified electrodes for a specific time. The immobilization of the probe and its hybridization with the target DNA (Complementary DNA) were performed under optimum conditions using different electrochemical techniques such as cyclic voltammetry (CV) and electrochemical impedance spectroscopy (EIS). The proposed biosensors were used for determination of complementary DNA sequences. The non-modified DNA biosensor (1-pyrenebutyric acid-N- hydroxysuccinimide ester (PANHS)/GCE), revealed a linear relationship between ∆Rct and logarithm of the complementary target DNA concentration ranging from 1.0×10(-16)molL(-1) to 1.0×10(-10)mol L(-1) with a correlation coefficient of 0.992, for DNA biosensors modified with multi-wall carbon nanotubes (MWCNTs) and reduced graphene oxide (RGO) wider linear range and lower detection limit were obtained. For ssDNA/PANHS/MWCNTs/GCE a linear range 1.0×10(-17)mol L(-1)-1.0×10(-10)mol L(-1) with a correlation coefficient of 0.993 and for ssDNA/PANHS/RGO/GCE a linear range from 1.0×10(-18)mol L(-1) to 1.0×10(-10)mol L(-1) with a correlation coefficient of 0.985 were obtained. In addition, the mentioned biosensors were satisfactorily applied for discriminating of complementary sequences from noncomplementary sequences, so the mentioned biosensors can be used for the detection of BRCA1-associated breast cancer. Copyright © 2015. Published by Elsevier B.V.

  5. Genes with stable DNA methylation levels show higher evolutionary conservation than genes with fluctuant DNA methylation levels.

    PubMed

    Zhang, Ruijie; Lv, Wenhua; Luan, Meiwei; Zheng, Jiajia; Shi, Miao; Zhu, Hongjie; Li, Jin; Lv, Hongchao; Zhang, Mingming; Shang, Zhenwei; Duan, Lian; Jiang, Yongshuai

    2015-11-24

    Different human genes often exhibit different degrees of stability in their DNA methylation levels between tissues, samples or cell types. This may be related to the evolution of human genome. Thus, we compared the evolutionary conservation between two types of genes: genes with stable DNA methylation levels (SM genes) and genes with fluctuant DNA methylation levels (FM genes). For long-term evolutionary characteristics between species, we compared the percentage of the orthologous genes, evolutionary rate dn/ds and protein sequence identity. We found that the SM genes had greater percentages of the orthologous genes, lower dn/ds, and higher protein sequence identities in all the 21 species. These results indicated that the SM genes were more evolutionarily conserved than the FM genes. For short-term evolutionary characteristics among human populations, we compared the single nucleotide polymorphism (SNP) density, and the linkage disequilibrium (LD) degree in HapMap populations and 1000 genomes project populations. We observed that the SM genes had lower SNP densities, and higher degrees of LD in all the 11 HapMap populations and 13 1000 genomes project populations. These results mean that the SM genes had more stable chromosome genetic structures, and were more conserved than the FM genes.

  6. Comparison of internal transcribed spacers and intergenic spacer regions of five common Iranian sheep bursate nematodes.

    PubMed

    Nabavi, Reza; Conneely, Brendan; McCarthy, Elaine; Good, Barbara; Shayan, Parviz; DE Waal, Theo

    2014-09-01

    Accurate identification of sheep nematodes is a critical point in epidemiological studies and monitoring of drug resistance in flocks. However, due to a close morphological similarity between the eggs and larval stages of many of these nematodes, such identification is not a trivial task. There are a number of studies showing that molecular targets in ribosomal DNA (Internal transcribed spacer 1, 2 and Intergenic spacer) are suitable for accurate identification of sheep bursate nematodes. The objective of present study was to compare the ITS1, ITS2 and IGS regions of Iranian common bursate nematodes in order to choose best target for specific identification methods. The first and second internal transcribed spacers (ITS1and ITS2) and intergenic spacer (IGS) of the ribosomal DNA (rDNA) of 5 common Iranian bursate nematodes of sheep were sequenced. The sequences of some non-Iranian isolates were used for comparison in order to evaluate the variation in sequence homology between geographically different nematode populations. Comparison of the ITS1 and ITS2 sequences of Iranian nematodes showed greatest similarity among Teladorsagia circumcincta and Marshallagia marshalli of 94% and 88%, respectively. While Trichostrongylus colubriformis and M. marshalli showed the highest homology (99%) in the IGS sequences. Comparison of the spacer sequences of Iranian with non-Iranian isolates showed significantly higher variation in Haemonchus contortus compared to the other species. Both the ITS1 and ITS2 sequences are convenient targets to have species-specific identification of Iranian bursate nematodes. On the other hand the IGS region may be a less suitable molecular target.

  7. Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Zhu, Y. F.; Chung, C. N.; Allman, S. L.

    1997-05-01

    Since laser mass spectrometry has the potential for achieving very fast DNA analysis, we recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Sanger's enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. Our preliminary results indicate laser mass spectrometry can possible be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, we applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.

  8. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

    PubMed

    Kress, W John; Erickson, David L

    2007-06-06

    A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.

  9. Isolation and sequence characterization of DNA-A genome of a new begomovirus strain associated with severe leaf curling symptoms of Jatropha curcas L.

    PubMed

    Chauhan, Sushma; Rahman, Hifzur; Mastan, Shaik G; Pamidimarri, D V N Sudheer; Reddy, Muppala P

    2018-07-20

    Begomoviruses belong to the family Geminiviridae are associated with several disease symptoms, such as mosaic and leaf curling in Jatropha curcas. The molecular characterization of these viral strains will help in developing management strategies to control the disease. In this study, J. curcas that was infected with begomovirus and showed acute leaf curling symptoms were identified. DNA-A segment from pathogenic viral strain was isolated and sequenced. The sequenced genome was assembled and characterized in detail. The full-length DNA-A sequence was covered by primer walking. The genome sequence showed the general organization of DNA-A from begomovirus by the distribution of ORFs in both viral and anti-viral strands. The genome size ranged from 2844 bp-2852 bp. Three strains with minor nucleotide variations were identified, and a phylogenetic analysis was performed by comparing the DNA-A segments from other reported begomovirus isolates. The maximum sequence similarity was observed with Euphorbia yellow mosaic virus (FN435995). In the phylogenetic tree, no clustering was observed with previously reported begomovirus strains isolated from J. curcas host. The strains isolated in this study belong to new begomoviral strain that elicits symptoms of leaf curling in J. curcas. The results indicate that the probable origin of the strains is from Jatropha mosaic virus infecting J. gassypifolia. The strains isolated in this study are referred as Jatropha curcas leaf curl India virus (JCLCIV) based on the major symptoms exhibited by host J. curcas. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  11. A Sequence-Dependent DNA Condensation Induced by Prion Protein

    PubMed Central

    2018-01-01

    Different studies indicated that the prion protein induces hybridization of complementary DNA strands. Cell culture studies showed that the scrapie isoform of prion protein remained bound with the chromosome. In present work, we used an oxazole dye, YOYO, as a reporter to quantitative characterization of the DNA condensation by prion protein. We observe that the prion protein induces greater fluorescence quenching of YOYO intercalated in DNA containing only GC bases compared to the DNA containing four bases whereas the effect of dye bound to DNA containing only AT bases is marginal. DNA-condensing biological polyamines are less effective than prion protein in quenching of DNA-bound YOYO fluorescence. The prion protein induces marginal quenching of fluorescence of the dye bound to oligonucleotides, which are resistant to condensation. The ultrastructural studies with electron microscope also validate the biophysical data. The GC bases of the target DNA are probably responsible for increased condensation in the presence of prion protein. To our knowledge, this is the first report of a human cellular protein inducing a sequence-dependent DNA condensation. The increased condensation of GC-rich DNA by prion protein may suggest a biological function of the prion protein and a role in its pathogenesis. PMID:29657864

  12. A Sequence-Dependent DNA Condensation Induced by Prion Protein.

    PubMed

    Bera, Alakesh; Biring, Sajal

    2018-01-01

    Different studies indicated that the prion protein induces hybridization of complementary DNA strands. Cell culture studies showed that the scrapie isoform of prion protein remained bound with the chromosome. In present work, we used an oxazole dye, YOYO, as a reporter to quantitative characterization of the DNA condensation by prion protein. We observe that the prion protein induces greater fluorescence quenching of YOYO intercalated in DNA containing only GC bases compared to the DNA containing four bases whereas the effect of dye bound to DNA containing only AT bases is marginal. DNA-condensing biological polyamines are less effective than prion protein in quenching of DNA-bound YOYO fluorescence. The prion protein induces marginal quenching of fluorescence of the dye bound to oligonucleotides, which are resistant to condensation. The ultrastructural studies with electron microscope also validate the biophysical data. The GC bases of the target DNA are probably responsible for increased condensation in the presence of prion protein. To our knowledge, this is the first report of a human cellular protein inducing a sequence-dependent DNA condensation. The increased condensation of GC-rich DNA by prion protein may suggest a biological function of the prion protein and a role in its pathogenesis.

  13. Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier.

    PubMed

    Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Rao, A R

    2016-11-05

    DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Micronuclear DNA of Oxytricha nova contains sequences with autonomously replicating activity in Saccharomyces cerevisiae.

    PubMed Central

    Colombo, M M; Swanton, M T; Donini, P; Prescott, D M

    1984-01-01

    Oxytricha nova is a hypotrichous ciliate with micronuclei and macronuclei. Micronuclei, which contain large, chromosomal-sized DNA, are genetically inert but undergo meiosis and exchange during cell mating. Macronuclei, which contain only small, gene-sized DNA molecules, provide all of the nuclear RNA needed to run the cell. After cell mating the macronucleus is derived from a micronucleus, a derivation that includes excision of the genes from chromosomes and elimination of the remaining DNA. The eliminated DNA includes all of the repetitious sequences and approximately 95% of the unique sequences. We cloned large restriction fragments from the micronucleus that confer replication ability on a replication-deficient plasmid in Saccharomyces cerevisiae. Sequences that confer replication ability are called autonomously replicating sequences. The frequency and effectiveness of autonomously replicating sequences in micronuclear DNA are similar to those reported for DNAs of other organisms introduced into yeast cells. Of the 12 micronuclear fragments with autonomously replicating sequence activity, 9 also showed homology to macronuclear DNA, indicating that they contain a macronuclear gene sequence. We conclude from this that autonomously replicating sequence activity is nonrandomly distributed throughout micronuclear DNA and is preferentially associated with those regions of micronuclear DNA that contain genes. Images PMID:6092934

  15. DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation

    PubMed Central

    Boedicker, James Q.; Garcia, Hernan G.; Johnson, Stephanie; Phillips, Rob

    2014-01-01

    As the chief informational molecule of life, DNA is subject to extensive physical manipulations. The energy required to deform double-helical DNA depends on sequence, and this mechanical code of DNA influences gene regulation, such as through nucleosome positioning. Here we examine the sequence-dependent flexibility of DNA in bacterial transcription factor-mediated looping, a context for which the role of sequence remains poorly understood. Using a suite of synthetic constructs repressed by the Lac repressor and two well-known sequences that show large flexibility differences in vitro, we make precise statistical mechanical predictions as to how DNA sequence influences loop formation and test these predictions using in vivo transcription and in vitro single-molecule assays. Surprisingly, sequence-dependent flexibility does not affect in vivo gene regulation. By theoretically and experimentally quantifying the relative contributions of sequence and the DNA-bending protein HU to DNA mechanical properties, we reveal that bending by HU dominates DNA mechanics and masks intrinsic sequence-dependent flexibility. Such a quantitative understanding of how mechanical regulatory information is encoded in the genome will be a key step towards a predictive understanding of gene regulation at single-base pair resolution. PMID:24231252

  16. A compositional segmentation of the human mitochondrial genome is related to heterogeneities in the guanine mutation rate

    PubMed Central

    Samuels, David C.; Boys, Richard J.; Henderson, Daniel A.; Chinnery, Patrick F.

    2003-01-01

    We applied a hidden Markov model segmentation method to the human mitochondrial genome to identify patterns in the sequence, to compare these patterns to the gene structure of mtDNA and to see whether these patterns reveal additional characteristics important for our understanding of genome evolution, structure and function. Our analysis identified three segmentation categories based upon the sequence transition probabilities. Category 2 segments corresponded to the tRNA and rRNA genes, with a greater strand-symmetry in these segments. Category 1 and 3 segments covered the protein- coding genes and almost all of the non-coding D-loop. Compared to category 1, the mtDNA segments assigned to category 3 had much lower guanine abundance. A comparison to two independent databases of mitochondrial mutations and polymorphisms showed that the high substitution rate of guanine in human mtDNA is largest in the category 3 segments. Analysis of synonymous mutations showed the same pattern. This suggests that this heterogeneity in the mutation rate is partly independent of respiratory chain function and is a direct property of the genome sequence itself. This has important implications for our understanding of mtDNA evolution and its use as a ‘molecular clock’ to determine the rate of population and species divergence. PMID:14530452

  17. Whole genome nucleosome sequencing identifies novel types of forensic markers in degraded DNA samples

    PubMed Central

    Dong, Chun-nan; Yang, Ya-dong; Li, Shu-jin; Yang, Ya-ran; Zhang, Xiao-jing; Fang, Xiang-dong; Yan, Jiang-wei; Cong, Bin

    2016-01-01

    In the case of mass disasters, missing persons and forensic caseworks, highly degraded biological samples are often encountered. It can be a challenge to analyze and interpret the DNA profiles from these samples. Here we provide a new strategy to solve the problem by taking advantage of the intrinsic structural properties of DNA. We have assessed the in vivo positions of more than 35 million putative nucleosome cores in human leukocytes using high-throughput whole genome sequencing, and identified 2,462 single nucleotide variations (SNVs), 128 insertion-deletion polymorphisms (indels). After comparing the sequence reads with 44 STR loci commonly used in forensics, five STRs (TH01, TPOX, D18S51, DYS391, and D10S1248)were matched. We compared these “nucleosome protected STRs” (NPSTRs) with five other non-NPSTRs using mini-STR primer design, real-time PCR, and capillary gel electrophoresis on artificially degraded DNA. Moreover, genotyping performance of the five NPSTRs and five non-NPSTRs was also tested with real casework samples. All results show that loci located in nucleosomes are more likely to be successfully genotyped in degraded samples. In conclusion, after further strict validation, these markers could be incorporated into future forensic and paleontology identification kits, resulting in higher discriminatory power for certain degraded sample types. PMID:27189082

  18. Structure of CARB-4 and AER-1 CarbenicillinHydrolyzing β-Lactamases

    PubMed Central

    Sanschagrin, François; Bejaoui, Noureddine; Levesque, Roger C.

    1998-01-01

    We determined the nucleotide sequences of blaCARB-4 encoding CARB-4 and deduced a polypeptide of 288 amino acids. The gene was characterized as a variant of group 2c carbenicillin-hydrolyzing β-lactamases such as PSE-4, PSE-1, and CARB-3. The level of DNA homology between the bla genes for these β-lactamases varied from 98.7 to 99.9%, while that between these genes and blaCARB-4 encoding CARB-4 was 86.3%. The blaCARB-4 gene was acquired from some other source because it has a G+C content of 39.1%, compared to a G+C content of 67% for typical Pseudomonas aeruginosa genes. DNA sequencing revealed that blaAER-1 shared 60.8% DNA identity with blaPSE-3 encoding PSE-3. The deduced AER-1 β-lactamase peptide was compared to class A, B, C, and D enzymes and had 57.6% identity with PSE-3, including an STHK tetrad at the active site. For CARB-4 and AER-1, conserved canonical amino acid boxes typical of class A β-lactamases were identified in a multiple alignment. Analysis of the DNA sequences flanking blaCARB-4 and blaAER-1 confirmed the importance of gene cassettes acquired via integrons in bla gene distribution. PMID:9687391

  19. Divergent nuclear 18S rDNA paralogs in a turkey coccidium, Eimeria meleagrimitis, complicate molecular systematics and identification.

    PubMed

    El-Sherry, Shiem; Ogedengbe, Mosun E; Hafeez, Mian A; Barta, John R

    2013-07-01

    Multiple 18S rDNA sequences were obtained from two single-oocyst-derived lines of each of Eimeria meleagrimitis and Eimeria adenoeides. After analysing the 15 new 18S rDNA sequences from two lines of E. meleagrimitis and 17 new sequences from two lines of E. adenoeides, there were clear indications that divergent, paralogous 18S rDNA copies existed within the nuclear genome of E. meleagrimitis. In contrast, mitochondrial cytochrome c oxidase subunit I (COI) partial sequences from all lines of a particular Eimeria sp. were identical and, in phylogenetic analyses, COI sequences clustered unambiguously in monophyletic and highly-supported clades specific to individual Eimeria sp. Phylogenetic analysis of the new 18S rDNA sequences from E. meleagrimitis showed that they formed two distinct clades: Type A with four new sequences; and Type B with nine new sequences; both Types A and B sequences were obtained from each of the single-oocyst-derived lines of E. meleagrimitis. Together these rDNA types formed a well-supported E. meleagrimitis clade. Types A and B 18S rDNA sequences from E. meleagrimitis had a mean sequence identity of only 97.4% whereas mean sequence identity within types was 99.1-99.3%. The observed intraspecific sequence divergence among E. meleagrimitis 18S rDNA sequence types was even higher (approximately 2.6%) than the interspecific sequence divergence present between some well-recognized species such as Eimeria tenella and Eimeria necatrix (1.1%). Our observations suggest that, unlike COI sequences, 18S rDNA sequences are not reliable molecular markers to be used alone for species identification with coccidia, although 18S rDNA sequences have clear utility for phylogenetic reconstruction of apicomplexan parasites at the genus and higher taxonomic ranks. Copyright © 2013. Published by Elsevier Ltd.

  20. Phylum- and Class-Specific PCR Primers for General Microbial Community Analysis

    PubMed Central

    Blackwood, Christopher B.; Oaks, Adam; Buyer, Jeffrey S.

    2005-01-01

    Amplification of a particular DNA fragment from a mixture of organisms by PCR is a common first step in methods of examining microbial community structure. The use of group-specific primers in community DNA profiling applications can provide enhanced sensitivity and phylogenetic detail compared to domain-specific primers. Other uses for group-specific primers include quantitative PCR and library screening. The purpose of the present study was to develop several primer sets targeting commonly occurring and important groups. Primers specific for the 16S ribosomal sequences of Alphaproteobacteria, Betaproteobacteria, Bacilli, Actinobacteria, and Planctomycetes and for parts of both the 18S ribosomal sequence and the internal transcribed spacer region of Basidiomycota were examined. Primers were tested by comparison to sequences in the ARB 2003 database, and chosen primers were further tested by cloning and sequencing from soil community DNA. Eighty-five to 100% of the sequences obtained from clone libraries were found to be placed with the groups intended as targets, demonstrating the specificity of the primers under field conditions. It will be important to reevaluate primers over time because of the continual growth of sequence databases and revision of microbial taxonomy. PMID:16204538

  1. The Comprehensive Phytopathogen Genomics Resource: a web-based resource for data-mining plant pathogen genomes.

    PubMed

    Hamilton, John P; Neeno-Eckwall, Eric C; Adhikari, Bishwo N; Perna, Nicole T; Tisserat, Ned; Leach, Jan E; Lévesque, C André; Buell, C Robin

    2011-01-01

    The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from the A Systematic Annotation Package (ASAP) database that provides curation of genomes using comparative approaches. Curated lists of bacterial genes relevant to pathogenicity and avirulence are also provided. The Plant Pathogen Transcript Assemblies Database provides annotated assemblies of the transcribed regions of 82 eukaryotic genomes from publicly available single pass Expressed Sequence Tags. Data-mining tools are provided along with tools to create candidate diagnostic markers, an emerging use for genomic sequence data in plant pathology. The Plant Pathogen Ribosomal DNA (rDNA) database is a resource for pathogens that lack genome or transcriptome data sets and contains 131 755 rDNA sequences from GenBank for 17 613 species identified as plant pathogens and related genera. Database URL: http://cpgr.plantbiology.msu.edu.

  2. Characterization of Urtica dioica agglutinin isolectins and the encoding gene family.

    PubMed

    Does, M P; Ng, D K; Dekker, H L; Peumans, W J; Houterman, P M; Van Damme, E J; Cornelissen, B J

    1999-01-01

    Urtica dioica agglutinin (UDA) has previously been found in roots and rhizomes of stinging nettles as a mixture of UDA-isolectins. Protein and cDNA sequencing have shown that mature UDA is composed of two hevein domains and is processed from a precursor protein. The precursor contains a signal peptide, two in-tandem hevein domains, a hinge region and a carboxyl-terminal chitinase domain. Genomic fragments encoding precursors for UDA-isolectins have been amplified by five independent polymerase chain reactions on genomic DNA from stinging nettle ecotype Weerselo. One amplified gene was completely sequenced. As compared to the published cDNA sequence, the genomic sequence contains, besides two basepair substitutions, two introns located at the same positions as in other plant chitinases. By partial sequence analysis of 40 amplified genes, 16 different genes were identified which encode seven putative UDA-isolectins. The deduced amino acid sequences share 78.9-98.9% identity. In extracts of roots and rhizomes of stinging nettle ecotype Weerselo six out of these seven isolectins were detected by mass spectrometry. One of them is an acidic form, which has not been identified before. Our results demonstrate that UDA is encoded by a large gene family.

  3. Identification of maca (Lepidium meyenii Walp.) and its adulterants by a DNA-barcoding approach based on the ITS sequence.

    PubMed

    Chen, Jin-Jin; Zhao, Qing-Sheng; Liu, Yi-Lan; Zha, Sheng-Hua; Zhao, Bing

    2015-09-01

    Maca (Lepidium meyenii) is an herbaceous plant that grows in high plateaus and has been used as both food and folk medicine for centuries because of its benefits to human health. In the present study, ITS (internal transcribed spacer) sequences of forty-three maca samples, collected from different regions or vendors, were amplified and analyzed. The ITS sequences of nineteen potential adulterants of maca were also collected and analyzed. The results indicated that the ITS sequence of maca was consistent in all samples and unique when compared with its adulterants. Therefore, this DNA-barcoding approach based on the ITS sequence can be used for the molecular identification of maca and its adulterants. Copyright © 2015 China Pharmaceutical University. Published by Elsevier B.V. All rights reserved.

  4. Land, language, and loci: mtDNA in Native Americans and the genetic history of Peru.

    PubMed

    Lewis, Cecil M; Tito, Raúl Y; Lizárraga, Beatriz; Stone, Anne C

    2005-07-01

    Despite a long history of complex societies and despite extensive present-day linguistic and ethnic diversity, relatively few populations in Peru have been sampled for population genetic investigations. In order to address questions about the relationships between South American populations and about the extent of correlation between genetic distance, language, and geography in the region, mitochondrial DNA (mtDNA) hypervariable region I sequences and mtDNA haplogroup markers were examined in 33 individuals from the state of Ancash, Peru. These sequences were compared to those from 19 American Indian populations using diversity estimates, AMOVA tests, mismatch distributions, a multidimensional scaling plot, and regressions. The results show correlations between genetics, linguistics, and geographical affinities, with stronger correlations between genetics and language. Additionally, the results suggest a pattern of differential gene flow and drift in western vs. eastern South America, supporting previous mtDNA and Y chromosome investigations. (c) 2004 Wiley-Liss, Inc

  5. Pyrosequencing analysis for detection of a BRAFV600E mutation in an FNAB specimen of thyroid nodules.

    PubMed

    Kim, Suk Kyeong; Kim, Dong-Lim; Han, Hye Seung; Kim, Wan Seop; Kim, Seung Ja; Moon, Won Jin; Oh, Seo Young; Hwang, Tae Sook

    2008-06-01

    Fine-needle aspiration biopsy (FNAB) is the primary means of distinguishing benign from malignant and of guiding therapeutic intervention in thyroid nodules. However, 10% to 30% of cases with indeterminate cytology in FNAB need other diagnostic tools to refine diagnosis. We compared the pyrosequencing method with the conventional direct DNA sequencing analysis and investigated the usefulness of preoperative BRAF mutation analysis as an adjunct diagnostic tool with routine FNAB. A total of 103 surgically confirmed patients' FNA slides were recruited and DNA was extracted after atypical cells were scraped from the slides. BRAF mutation was analyzed by pyrosequencing and direct DNA sequencing. Sixty-three (77.8%) of 81 histopathologically diagnosed malignant nodules revealed positive BRAF mutation on pyrosequencing analysis. In detail, 63 (84.0%) of 75 papillary thyroid carcinoma (PTC) samples showed positive BRAF mutation, whereas 3 follicular thyroid carcinomas, 1 anaplastic carcinoma, 1 medullary thyroid carcinoma, and 1 metastatic lung carcinoma did not show BRAF mutation. None of 22 benign nodules had BRAF mutation in both pyrosequencing and direct DNA sequencing. Out of 27 thyroid nodules classified as 'indeterminate' on cytologic examination preoperatively, 21 (77.8%) cases turned out to be malignant: 18 PTCs (including 2 follicular variant types) and 3 follicular thyroid carcinomas. Among these, 13 (61.9%) classic PTCs had BRAF mutation. None of 6 benign nodules, including 3 follicular adenomas and 3 nodular hyperplasias, had BRAF mutation. Among 63 PTCs with positive BRAF mutation detected by pyrosequencing analysis, 3 cases did not show BRAF mutation by direct DNA sequencing. Although it was not statistically significant, pyrosequencing was superior to direct DNA sequencing in detecting the BRAF mutation of thyroid nodules (P=0.25). Detecting BRAF mutation by pyrosequencing is more sensitive, faster, and less expensive than direct DNA sequencing and is proposed as an adjunct diagnostic tool in evaluating thyroid nodules of indeterminate cytology.

  6. DNA methylation of retrotransposons, DNA transposons and genes in sugar beet (Beta vulgaris L.).

    PubMed

    Zakrzewski, Falk; Schmidt, Martin; Van Lijsebettens, Mieke; Schmidt, Thomas

    2017-06-01

    The methylation of cytosines shapes the epigenetic landscape of plant genomes, coordinates transgenerational epigenetic inheritance, represses the activity of transposable elements (TEs), affects gene expression and, hence, can influence the phenotype. Sugar beet (Beta vulgaris ssp. vulgaris), an important crop that accounts for 30% of worldwide sugar needs, has a relatively small genome size (758 Mbp) consisting of approximately 485 Mbp repetitive DNA (64%), in particular satellite DNA, retrotransposons and DNA transposons. Genome-wide cytosine methylation in the sugar beet genome was studied in leaves and leaf-derived callus with a focus on repetitive sequences, including retrotransposons and DNA transposons, the major groups of repetitive DNA sequences, and compared with gene methylation. Genes showed a specific methylation pattern for CG, CHG (H = A, C, and T) and CHH sites, whereas the TE pattern differed, depending on the TE class (class 1, retrotransposons and class 2, DNA transposons). Along genes and TEs, CG and CHG methylation was higher than that of adjacent genomic regions. In contrast to the relatively low CHH methylation in retrotransposons and genes, the level of CHH methylation in DNA transposons was strongly increased, pointing to a functional role of asymmetric methylation in DNA transposon silencing. Comparison of genome-wide DNA methylation between sugar beet leaves and callus revealed a differential methylation upon tissue culture. Potential epialleles were hypomethylated (lower methylation) at CG and CHG sites in retrotransposons and genes and hypermethylated (higher methylation) at CHH sites in DNA transposons of callus when compared with leaves. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

  7. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

    PubMed

    Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

    2009-06-01

    The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.

  8. Comparative genomics of 9 novel Paenibacillus larvae bacteriophages

    PubMed Central

    Stamereilers, Casey; LeBlanc, Lucy; Yost, Diane; Amy, Penny S.; Tsourkas, Philippos K.

    2016-01-01

    ABSTRACT American Foulbrood Disease, caused by the bacterium Paenibacillus larvae, is one of the most destructive diseases of the honeybee, Apis mellifera. Our group recently published the sequences of 9 new phages with the ability to infect and lyse P. larvae. Here, we characterize the genomes of these P. larvae phages, compare them to each other and to other sequenced P. larvae phages, and putatively identify protein function. The phage genomes are 38–45 kb in size and contain 68–86 genes, most of which appear to be unique to P. larvae phages. We classify P. larvae phages into 2 main clusters and one singleton based on nucleotide sequence identity. Three of the new phages show sequence similarity to other sequenced P. larvae phages, while the remaining 6 do not. We identified functions for roughly half of the P. larvae phage proteins, including structural, assembly, host lysis, DNA replication/metabolism, regulatory, and host-related functions. Structural and assembly proteins are highly conserved among our phages and are located at the start of the genome. DNA replication/metabolism, regulatory, and host-related proteins are located in the middle and end of the genome, and are not conserved, with many of these genes found in some of our phages but not others. All nine phages code for a conserved N-acetylmuramoyl-L-alanine amidase. Comparative analysis showed the phages use the “cohesive ends with 3′ overhang” DNA packaging strategy. This work is the first in-depth study of P. larvae phage genomics, and serves as a marker for future work in this area. PMID:27738559

  9. Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach.

    PubMed

    Guttikonda, Satish K; Marri, Pradeep; Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P

    2016-01-01

    Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.

  10. Scaling features of noncoding DNA

    NASA Technical Reports Server (NTRS)

    Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.

    1999-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.

  11. Classification of Plant Associated Bacteria Using RIF, a Computationally Derived DNA Marker

    PubMed Central

    Schneider, Kevin L.; Marrero, Glorimar; Alvarez, Anne M.; Presting, Gernot G.

    2011-01-01

    A DNA marker that distinguishes plant associated bacteria at the species level and below was derived by comparing six sequenced genomes of Xanthomonas, a genus that contains many important phytopathogens. This DNA marker comprises a portion of the dnaA replication initiation factor (RIF). Unlike the rRNA genes, dnaA is a single copy gene in the vast majority of sequenced bacterial genomes, and amplification of RIF requires genus-specific primers. In silico analysis revealed that RIF has equal or greater ability to differentiate closely related species of Xanthomonas than the widely used ribosomal intergenic spacer region (ITS). Furthermore, in a set of 263 Xanthomonas, Ralstonia and Clavibacter strains, the RIF marker was directly sequenced in both directions with a success rate approximately 16% higher than that for ITS. RIF frameworks for Xanthomonas, Ralstonia and Clavibacter were constructed using 682 reference strains representing different species, subspecies, pathovars, races, hosts and geographic regions, and contain a total of 109 different RIF sequences. RIF sequences showed subspecific groupings but did not place strains of X. campestris or X. axonopodis into currently named pathovars nor R. solanacearum strains into their respective races, confirming previous conclusions that pathovar and race designations do not necessarily reflect genetic relationships. The RIF marker also was sequenced for 24 reference strains from three genera in the Enterobacteriaceae: Pectobacterium, Pantoea and Dickeya. RIF sequences of 70 previously uncharacterized strains of Ralstonia, Clavibacter, Pectobacterium and Dickeya matched, or were similar to, those of known reference strains, illustrating the utility of the frameworks to classify bacteria below the species level and rapidly match unknown isolates to reference strains. The RIF sequence frameworks are available at the online RIF database, RIFdb, and can be queried for diagnostic purposes with RIF sequences obtained from unknown strains in both chromatogram and FASTA format. PMID:21533033

  12. DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification

    PubMed Central

    2013-01-01

    Background Mitochondrial DNA (mtDNA) typing can be a useful aid for identifying people from compromised samples when nuclear DNA is too damaged, degraded or below detection thresholds for routine short tandem repeat (STR)-based analysis. Standard mtDNA typing, focused on PCR amplicon sequencing of the control region (HVS I and HVS II), is limited by the resolving power of this short sequence, which misses up to 70% of the variation present in the mtDNA genome. Methods We used in-solution hybridisation-based DNA capture (using DNA capture probes prepared from modern human mtDNA) to recover mtDNA from post-mortem human remains in which the majority of DNA is both highly fragmented (<100 base pairs in length) and chemically damaged. The method ‘immortalises’ the finite quantities of DNA in valuable extracts as DNA libraries, which is followed by the targeted enrichment of endogenous mtDNA sequences and characterisation by next-generation sequencing (NGS). Results We sequenced whole mitochondrial genomes for human identification from samples where standard nuclear STR typing produced only partial profiles or demonstrably failed and/or where standard mtDNA hypervariable region sequences lacked resolving power. Multiple rounds of enrichment can substantially improve coverage and sequencing depth of mtDNA genomes from highly degraded samples. The application of this method has led to the reliable mitochondrial sequencing of human skeletal remains from unidentified World War Two (WWII) casualties approximately 70 years old and from archaeological remains (up to 2,500 years old). Conclusions This approach has potential applications in forensic science, historical human identification cases, archived medical samples, kinship analysis and population studies. In particular the methodology can be applied to any case, involving human or non-human species, where whole mitochondrial genome sequences are required to provide the highest level of maternal lineage discrimination. Multiple rounds of in-solution hybridisation-based DNA capture can retrieve whole mitochondrial genome sequences from even the most challenging samples. PMID:24289217

  13. Contrasting population structure from nuclear intron sequences and mtDNA of humpback whales.

    PubMed

    Palumbi, S R; Baker, C S

    1994-05-01

    Powerful analyses of population structure require information from multiple genetic loci. To help develop a molecular toolbox for obtaining this information, we have designed universal oligonucleotide primers that span conserved intron-exon junctions in a wide variety of animal phyla. We test the utility of exon-primed, intron-crossing amplifications by analyzing the variability of actin intron sequences from humpback, blue, and bowhead whales and comparing the results with mitochondrial DNA (mtDNA) haplotype data. Humpback actin introns fall into two major clades that exist in different frequencies in different oceanic populations. It is surprising that Hawaii and California populations, which are very distinct in mtDNAs, are similar in actin intron alleles. This discrepancy between mtDNA and nuclear DNA results may be due either to differences in genetic drift in mitochondrial and nuclear genes or to preferential movement of males, which do not transmit mtDNA to offspring, between separate breeding grounds. Opposing mtDNA and nuclear DNA results can help clarify otherwise hidden patterns of structure in natural populations.

  14. DNA demethylation activates genes in seed maternal integument development in rice (Oryza sativa L.).

    PubMed

    Wang, Yifeng; Lin, Haiyan; Tong, Xiaohong; Hou, Yuxuan; Chang, Yuxiao; Zhang, Jian

    2017-11-01

    DNA methylation is an important epigenetic modification that regulates various plant developmental processes. Rice seed integument determines the seed size. However, the role of DNA methylation in its development remains largely unknown. Here, we report the first dynamic DNA methylomic profiling of rice maternal integument before and after pollination by using a whole-genome bisulfite deep sequencing approach. Analysis of DNA methylation patterns identified 4238 differentially methylated regions underpin 4112 differentially methylated genes, including GW2, DEP1, RGB1 and numerous other regulators participated in maternal integument development. Bisulfite sanger sequencing and qRT-PCR of six differentially methylated genes revealed extensive occurrence of DNA hypomethylation triggered by double fertilization at IAP compared with IBP, suggesting that DNA demethylation might be a key mechanism to activate numerous maternal controlling genes. These results presented here not only greatly expanded the rice methylome dataset, but also shed novel insight into the regulatory roles of DNA methylation in rice seed maternal integument development. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  15. Nuclear and mitochondrial rDNA variability in Crinipellis perniciosa from different geographic origins and hosts.

    PubMed

    de Arruda, Maricília C C; Ferreira, Marisa A S V; Miller, Robert N G; Resende, Mário Lúcio V; Felipe, Maria Sueli S

    2003-01-01

    Genetic variability in Crinipellis perniciosa, the causal organism of witches' broom disease in Theobroma cacao, was determined in strains originating from T. cacao and other susceptible host species Heteropterys acutifolia and Solanum lycocarpum in Brazil, in order to clarify host specificity and geographical variability. RFLP analysis of the ribosomal DNA ITS regions (rDNA ITS), and the mitochondrial DNA small subunit ribosomal DNA gene (mtDNA SSU rDNA) did not reveal any genetic variability in 120 tested strains, possibly serving only as species level markers. Genetic variability was observed in the ribosomal DNA IGS spacer region, in terms of IGS size, RFLPs and sequence data. Phylogenetic analyses (using CLUSTAL W, PHYLIP and TREEVIEW) indicated considerable differences between C. perniciosa strains from T. cacao and those from H. acutifolia (85-86%) and S. lycocarpum (95-96%). Sequence differences also indicated that C. perniciosa from T. cacao in Bahia is less variable (98%) when compared to the pathogen on T. cacao in Amazonas (97-98%), perhaps reflecting a recent introduction to T. cacao in Bahia.

  16. Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome.

    PubMed

    Singh, Vinod Kumar; Krishnamachari, Annangarachari

    2016-09-01

    Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  17. Genomic resources for songbird research and their use in characterizing gene expression during brain development

    PubMed Central

    Li, XiaoChing; Wang, Xiu-Jie; Tannenhauser, Jonathan; Podell, Sheila; Mukherjee, Piali; Hertel, Moritz; Biane, Jeremy; Masuda, Shoko; Nottebohm, Fernando; Gaasterland, Terry

    2007-01-01

    Vocal learning and neuronal replacement have been studied extensively in songbirds, but until recently, few molecular and genomic tools for songbird research existed. Here we describe new molecular/genomic resources developed in our laboratory. We made cDNA libraries from zebra finch (Taeniopygia guttata) brains at different developmental stages. A total of 11,000 cDNA clones from these libraries, representing 5,866 unique gene transcripts, were randomly picked and sequenced from the 3′ ends. A web-based database was established for clone tracking, sequence analysis, and functional annotations. Our cDNA libraries were not normalized. Sequencing ESTs without normalization produced many developmental stage-specific sequences, yielding insights into patterns of gene expression at different stages of brain development. In particular, the cDNA library made from brains at posthatching day 30–50, corresponding to the period of rapid song system development and song learning, has the most diverse and richest set of genes expressed. We also identified five microRNAs whose sequences are highly conserved between zebra finch and other species. We printed cDNA microarrays and profiled gene expression in the high vocal center of both adult male zebra finches and canaries (Serinus canaria). Genes differentially expressed in the high vocal center were identified from the microarray hybridization results. Selected genes were validated by in situ hybridization. Networks among the regulated genes were also identified. These resources provide songbird biologists with tools for genome annotation, comparative genomics, and microarray gene expression analysis. PMID:17426146

  18. Ancient DNA in human bone remains from Pompeii archaeological site.

    PubMed

    Cipollaro, M; Di Bernardo, G; Galano, G; Galderisi, U; Guarino, F; Angelini, F; Cascino, A

    1998-06-29

    aDNA extraction and amplification procedures have been optimized for Pompeian human bone remains whose diagenesis has been determined by histological analysis. Single copy genes amplification (X and Y amelogenin loci and Y specific alphoid repeat sequences) have been performed and compared with anthropometric data on sexing.

  19. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    PubMed

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  20. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

    PubMed Central

    Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

    2007-01-01

    Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571

  1. Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference.

    PubMed

    Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D

    2004-10-01

    Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.

  2. Comparative molecular cytogenetics of major repetitive sequence families of three Dendrobium species (Orchidaceae) from Bangladesh

    PubMed Central

    Begum, Rabeya; Alam, Sheikh Shamimul; Menzel, Gerhard; Schmidt, Thomas

    2009-01-01

    Background and Aims Dendrobium species show tremendous morphological diversity and have broad geographical distribution. As repetitive sequence analysis is a useful tool to investigate the evolution of chromosomes and genomes, the aim of the present study was the characterization of repetitive sequences from Dendrobium moschatum for comparative molecular and cytogenetic studies in the related species Dendrobium aphyllum, Dendrobium aggregatum and representatives from other orchid genera. Methods In order to isolate highly repetitive sequences, a c0t-1 DNA plasmid library was established. Repeats were sequenced and used as probes for Southern hybridization. Sequence divergence was analysed using bioinformatic tools. Repetitive sequences were localized along orchid chromosomes by fluorescence in situ hybridization (FISH). Key Results Characterization of the c0t-1 library resulted in the detection of repetitive sequences including the (GA)n dinucleotide DmoO11, numerous Arabidopsis-like telomeric repeats and the highly amplified dispersed repeat DmoF14. The DmoF14 repeat is conserved in six Dendrobium species but diversified in representative species of three other orchid genera. FISH analyses showed the genome-wide distribution of DmoF14 in D. moschatum, D. aphyllum and D. aggregatum. Hybridization with the telomeric repeats demonstrated Arabidopsis-like telomeres at the chromosome ends of Dendrobium species. However, FISH using the telomeric probe revealed two pairs of chromosomes with strong intercalary signals in D. aphyllum. FISH showed the terminal position of 5S and 18S–5·8S–25S rRNA genes and a characteristic number of rDNA sites in the three Dendrobium species. Conclusions The repeated sequences isolated from D. moschatum c0t-1 DNA constitute major DNA families of the D. moschatum, D. aphyllum and D. aggregatum genomes with DmoF14 representing an ancient component of orchid genomes. Large intercalary telomere-like arrays suggest chromosomal rearrangements in D. aphyllum while the number and localization of rRNA genes as well as the species-specific distribution pattern of an abundant microsatellite reflect the genomic diversity of the three Dendrobium species. PMID:19635741

  3. Direct Detection and Sequencing of Damaged DNA Bases

    PubMed Central

    2011-01-01

    Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template - as a by-product of the sequencing method - through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications. PMID:22185597

  4. Direct detection and sequencing of damaged DNA bases.

    PubMed

    Clark, Tyson A; Spittle, Kristi E; Turner, Stephen W; Korlach, Jonas

    2011-12-20

    Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template - as a by-product of the sequencing method - through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications.

  5. A comprehensive list of cloned human DNA sequences

    PubMed Central

    Schmidtke, Jörg; Cooper, David N.

    1987-01-01

    A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:3575113

  6. A comprehensive list of cloned human DNA sequences

    PubMed Central

    Schmidtke, Jörg; Cooper, David N.

    1990-01-01

    A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:2333227

  7. A comprehensive list of cloned human DNA sequences

    PubMed Central

    Schmidtke, Jörg; Cooper, David N.

    1988-01-01

    A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:3368330

  8. A comprehensive list of cloned human DNA sequences

    PubMed Central

    Schmidtke, Jörg; Cooper, David N.

    1989-01-01

    A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:2654889

  9. Contrasting morphological and DNA barcode-suggested species boundaries among shallow-water amphipod fauna from the southern European Atlantic coast.

    PubMed

    Lobo, Jorge; Ferreira, Maria S; Antunes, Ilisa C; Teixeira, Marcos A L; Borges, Luisa M S; Sousa, Ronaldo; Gomes, Pedro A; Costa, Maria Helena; Cunha, Marina R; Costa, Filipe O

    2017-02-01

    In this study we compared DNA barcode-suggested species boundaries with morphology-based species identifications in the amphipod fauna of the southern European Atlantic coast. DNA sequences of the cytochrome c oxidase subunit I barcode region (COI-5P) were generated for 43 morphospecies (178 specimens) collected along the Portuguese coast which, together with publicly available COI-5P sequences, produced a final dataset comprising 68 morphospecies and 295 sequences. Seventy-five BINs (Barcode Index Numbers) were assigned to these morphospecies, of which 48 were concordant (i.e., 1 BIN = 1 species), 8 were taxonomically discordant, and 19 were singletons. Twelve species had matching sequences (<2% distance) with conspecifics from distant locations (e.g., North Sea). Seven morphospecies were assigned to multiple, and highly divergent, BINs, including specimens of Corophium multisetosum (18% divergence) and Dexamine spiniventris (16% divergence), which originated from sampling locations on the west coast of Portugal (only about 36 and 250 km apart, respectively). We also found deep divergence (4%-22%) among specimens of seven species from Portugal compared to those from the North Sea and Italy. The detection of evolutionarily meaningful divergence among populations of several amphipod species from southern Europe reinforces the need for a comprehensive re-assessment of the diversity of this faunal group.

  10. cDNA identification, comparison and phylogenetic aspects of lombricine kinase from two oligochaete species.

    PubMed

    Doumen, Chris

    2010-06-01

    Creatine kinase and arginine kinase are the typical representatives of an eight-member phosphagen kinase family, which play important roles in the cellular energy metabolism of animals. The phylum Annelida underwent a series of evolutionary processes that resulted in rapid divergence and radiation of these enzymes, producing the greatest diversity of the phosphagen kinases within this phylum. Lombricine kinase (EC 2.7.3.5) is one of such enzymes and sequence information is rather limited compared to other phosphagen kinases. This study presents data on the cDNA sequences of lombricine kinase from two oligochaete species, the California blackworm (Lumbriculus variegatus) and the sludge worm (Tubifex tubifex). The deduced amino acid sequences are analyzed and compared with other selected phosphagen kinases, including two additional lombricine kinase sequences extracted from DNA databases and provide further insights in the evolution and position of these enzymes within the phosphagen kinase family. The data confirms the presence of a deleted region within the flexible loop (the GS region) of all six examined lombricine kinases. A phylogenetic analysis of these six lombricine kinases clearly positions the enzymes together in a small subcluster within the larger creatine kinase (EC 2.7.3.2) clade. 2010. Published by Elsevier Inc.

  11. Kilo-sequencing: an ordered strategy for rapid DNA sequence data acquisition.

    PubMed Central

    Barnes, W M; Bevan, M

    1983-01-01

    A strategy for rapid DNA sequence acquisition in an ordered, nonrandom manner, while retaining all of the conveniences of the dideoxy method with M13 transducing phage DNA template, is described. Target DNA 3 to 14 kb in size can be stably carried by our M13 vectors. Suitable targets are stretches of DNA which lack an enzyme recognition site which is unique on our cloning vectors and adjacent to the sequencing primer; current sites that are so useful when lacking are Pst, Xba, HindIII, BglII, EcoRI. By an in vitro procedure, we cut RF DNA once randomly and once specifically, to create thousands of deletions which start at the unique restriction site adjacent to the dideoxy sequencing primer and extend various distances across the target DNA. Phage carrying a desired size of deletions, whose DNA as template will give rise to DNA sequence data in a desired location along the target DNA, may be purified by electrophoresis alive on agarose gels. Phage running in the same location on the agarose gel thus conveniently give rise to nucleotide sequence data from the same kilobase of target DNA. Images PMID:6298723

  12. African-American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups

    PubMed Central

    Ely, Bert; Wilson, Jamie Lee; Jackson, Fatimah; Jackson, Bruce A

    2006-01-01

    Background Mitochondrial DNA (mtDNA) haplotypes have become popular tools for tracing maternal ancestry, and several companies offer this service to the general public. Numerous studies have demonstrated that human mtDNA haplotypes can be used with confidence to identify the continent where the haplotype originated. Ideally, mtDNA haplotypes could also be used to identify a particular country or ethnic group from which the maternal ancestor emanated. However, the geographic distribution of mtDNA haplotypes is greatly influenced by the movement of both individuals and population groups. Consequently, common mtDNA haplotypes are shared among multiple ethnic groups. We have studied the distribution of mtDNA haplotypes among West African ethnic groups to determine how often mtDNA haplotypes can be used to reconnect Americans of African descent to a country or ethnic group of a maternal African ancestor. The nucleotide sequence of the mtDNA hypervariable segment I (HVS-I) usually provides sufficient information to assign a particular mtDNA to the proper haplogroup, and it contains most of the variation that is available to distinguish a particular mtDNA haplotype from closely related haplotypes. In this study, samples of general African-American and specific Gullah/Geechee HVS-I haplotypes were compared with two databases of HVS-I haplotypes from sub-Saharan Africa, and the incidence of perfect matches recorded for each sample. Results When two independent African-American samples were analyzed, more than half of the sampled HVS-I mtDNA haplotypes exactly matched common haplotypes that were shared among multiple African ethnic groups. Another 40% did not match any sequence in the database, and fewer than 10% were an exact match to a sequence from a single African ethnic group. Differences in the regional distribution of haplotypes were observed in the African database, and the African-American haplotypes were more likely to match haplotypes found in ethnic groups from West or West Central Africa than those found in eastern or southern Africa. Fewer than 14% of the African-American mtDNA sequences matched sequences from only West Africa or only West Central Africa. Conclusion Our database of sub-Saharan mtDNA sequences includes the most common haplotypes that are shared among ethnic groups from multiple regions of Africa. These common haplotypes have been found in half of all sub-Saharan Africans. More than 60% of the remaining haplotypes differ from the common haplotypes at a single nucleotide position in the HVS-I region, and they are likely to occur at varying frequencies within sub-Saharan Africa. However, the finding that 40% of the African-American mtDNAs analyzed had no match in the database indicates that only a small fraction of the total number of African haplotypes has been identified. In addition, the finding that fewer than 10% of African-American mtDNAs matched mtDNA sequences from a single African region suggests that few African Americans might be able to trace their mtDNA lineages to a particular region of Africa, and even fewer will be able to trace their mtDNA to a single ethnic group. However, no firm conclusions should be made until a much larger database is available. It is clear, however, that when identical mtDNA haplotypes are shared among many ethnic groups from different parts of Africa, it is impossible to determine which single ethnic group was the source of a particular maternal ancestor based on the mtDNA sequence. PMID:17038170

  13. Silicene nanoribbon as a new DNA sequencing device

    NASA Astrophysics Data System (ADS)

    Alesheikh, Sara; Shahtahmassebi, Nasser; Roknabadi, Mahmood Rezaee; Pilevar Shahri, Raheleh

    2018-02-01

    The importance of applying DNA sequencing in different fields, results in looking for fast and cheap methods. Nanotechnology helps this development by introducing nanostructures used for DNA sequencing. In this work we study the interaction between zigzag silicene nanoribbon and DNA nucleobases using DFT and non equilibrium Green's function approach, to investigate the possibility of using zigzag silicene nanoribbons as a biosensor for DNA sequencing.

  14. Isolation and characterization of target sequences of the chicken CdxA homeobox gene.

    PubMed Central

    Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A

    1993-01-01

    The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943

  15. Sequence periodicity in nucleosomal DNA and intrinsic curvature.

    PubMed

    Nair, T Murlidharan

    2010-05-17

    Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.

  16. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    PubMed Central

    2011-01-01

    Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934

  17. Aberrant DNA methylation patterns of spermatozoa in men with unexplained infertility.

    PubMed

    Urdinguio, Rocío G; Bayón, Gustavo F; Dmitrijeva, Marija; Toraño, Estela G; Bravo, Cristina; Fraga, Mario F; Bassas, Lluís; Larriba, Sara; Fernández, Agustín F

    2015-05-01

    Are there DNA methylation alterations in sperm that could explain the reduced biological fertility of male partners from couples with unexplained infertility? DNA methylation patterns, not only at specific loci but also at Alu Yb8 repetitive sequences, are altered in infertile individuals compared with fertile controls. Aberrant DNA methylation of sperm has been associated with human male infertility in patients demonstrating either deficiencies in the process of spermatogenesis or low semen quality. Case and control prospective study. This study compares 46 sperm samples obtained from 17 normospermic fertile men and 29 normospermic infertile patients. Illumina Infinium HD Human Methylation 450K arrays were used to identify genomic regions showing differences in sperm DNA methylation patterns between five fertile and seven infertile individuals. Additionally, global DNA methylation of sperm was measured using the Methylamp Global DNA Methylation Quantification Ultra kit (Epigentek) in 14 samples, and DNA methylation at several repetitive sequences (LINE-1, Alu Yb8, NBL2, D4Z4) measured by bisulfite pyrosequencing in 44 sperm samples. A sperm-specific DNA methylation pattern was obtained by comparing the sperm methylomes with the DNA methylomes of differentiated somatic cells using data obtained from methylation arrays (Illumina 450 K) of blood, neural and glial cells deposited in public databases. In this study we conduct, for the first time, a genome-wide study to identify alterations of sperm DNA methylation in individuals with unexplained infertility that may account for the differences in their biological fertility compared with fertile individuals. We have identified 2752 CpGs showing aberrant DNA methylation patterns, and more importantly, these differentially methylated CpGs were significantly associated with CpG sites which are specifically methylated in sperm when compared with somatic cells. We also found statistically significant (P < 0.001) associations between DNA hypomethylation and regions corresponding to those which, in somatic cells, are enriched in the repressive histone mark H3K9me3, and between DNA hypermethylation and regions enriched in H3K4me1 and CTCF, suggesting that the relationship between chromatin context and aberrant DNA methylation of sperm in infertile men could be locus-dependent. Finally, we also show that DNA methylation patterns, not only at specific loci but also at several repetitive sequences (LINE-1, Alu Yb8, NBL2, D4Z4), were lower in sperm than in somatic cells. Interestingly, sperm samples at Alu Yb8 repetitive sequences of infertile patients showed significantly lower DNA methylation levels than controls. Our results are descriptive and further studies would be needed to elucidate the functional effects of aberrant DNA methylation on male fertility. Overall, our data suggest that aberrant sperm DNA methylation might contribute to fertility impairment in couples with unexplained infertility and they provide a promising basis for future research. This work has been financially supported by Fundación Cientifica de la AECC (to R.G.U.); IUOPA (to G.F.B.); FICYT (to E.G.T.); the Spanish National Research Council (CSIC; 200820I172 to M.F.F.); Fundación Ramón Areces (to M.F.F); the Plan Nacional de I+D+I 2008-2011/2013-2016/FEDER (PI11/01728 to AF.F., PI12/01080 to M.F.F. and PI12/00361 to S.L.); the PN de I+D+I 2008-20011 and the Generalitat de Catalunya (2009SGR01490). A.F.F. is sponsored by ISCIII-Subdirección General de Evaluación y Fomento de la Investigación (CP11/00131). S.L. is sponsored by the Researchers Stabilization Program from the Spanish National Health System (CES09/020). The IUOPA is supported by the Obra Social Cajastur, Spain. © The Author 2015. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  18. [Current applications of high-throughput DNA sequencing technology in antibody drug research].

    PubMed

    Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong

    2012-03-01

    Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.

  19. Mitochondrial DNA and Y-chromosomal diversity in ancient populations of domestic sheep (Ovis aries) in Finland: comparison with contemporary sheep breeds.

    PubMed

    Niemi, Marianna; Bläuer, Auli; Iso-Touru, Terhi; Nyström, Veronica; Harjula, Janne; Taavitsainen, Jussi-Pekka; Storå, Jan; Lidén, Kerstin; Kantanen, Juha

    2013-01-22

    Several molecular and population genetic studies have focused on the native sheep breeds of Finland. In this work, we investigated their ancestral sheep populations from Iron Age, Medieval and Post-Medieval periods by sequencing a partial mitochondrial DNA D-loop and the 5'-promoter region of the SRY gene. We compared the maternal (mitochondrial DNA haplotypes) and paternal (SNP oY1) genetic diversity of ancient sheep in Finland with modern domestic sheep populations in Europe and Asia to study temporal changes in genetic variation and affinities between ancient and modern populations. A 523-bp mitochondrial DNA sequence was successfully amplified for 26 of 36 sheep ancient samples i.e. five, seven and 14 samples representative of Iron Age, Medieval and Post-Medieval sheep, respectively. Genetic diversity was analyzed within the cohorts. This ancient dataset was compared with present-day data consisting of 94 animals from 10 contemporary European breeds and with GenBank DNA sequence data to carry out a haplotype sharing analysis. Among the 18 ancient mitochondrial DNA haplotypes identified, 14 were present in the modern breeds. Ancient haplotypes were assigned to the highly divergent ovine haplogroups A and B, haplogroup B being the major lineage within the cohorts. Only two haplotypes were detected in the Iron Age samples, while the genetic diversity of the Medieval and Post-Medieval cohorts was higher. For three of the ancient DNA samples, Y-chromosome SRY gene sequences were amplified indicating that they originated from rams. The SRY gene of these three ancient ram samples contained SNP G-oY1, which is frequent in modern north-European sheep breeds. Our study did not reveal any sign of major population replacement of native sheep in Finland since the Iron Age. Variations in the availability of archaeological remains may explain differences in genetic diversity estimates and patterns within the cohorts rather than demographic events that occurred in the past. Our ancient DNA results fit well with the genetic context of domestic sheep as determined by analyses of modern north-European sheep breeds.

  20. Mitochondrial DNA and Y-chromosomal diversity in ancient populations of domestic sheep (Ovis aries) in Finland: comparison with contemporary sheep breeds

    PubMed Central

    2013-01-01

    Background Several molecular and population genetic studies have focused on the native sheep breeds of Finland. In this work, we investigated their ancestral sheep populations from Iron Age, Medieval and Post-Medieval periods by sequencing a partial mitochondrial DNA D-loop and the 5’-promoter region of the SRY gene. We compared the maternal (mitochondrial DNA haplotypes) and paternal (SNP oY1) genetic diversity of ancient sheep in Finland with modern domestic sheep populations in Europe and Asia to study temporal changes in genetic variation and affinities between ancient and modern populations. Results A 523-bp mitochondrial DNA sequence was successfully amplified for 26 of 36 sheep ancient samples i.e. five, seven and 14 samples representative of Iron Age, Medieval and Post-Medieval sheep, respectively. Genetic diversity was analyzed within the cohorts. This ancient dataset was compared with present-day data consisting of 94 animals from 10 contemporary European breeds and with GenBank DNA sequence data to carry out a haplotype sharing analysis. Among the 18 ancient mitochondrial DNA haplotypes identified, 14 were present in the modern breeds. Ancient haplotypes were assigned to the highly divergent ovine haplogroups A and B, haplogroup B being the major lineage within the cohorts. Only two haplotypes were detected in the Iron Age samples, while the genetic diversity of the Medieval and Post-Medieval cohorts was higher. For three of the ancient DNA samples, Y-chromosome SRY gene sequences were amplified indicating that they originated from rams. The SRY gene of these three ancient ram samples contained SNP G-oY1, which is frequent in modern north-European sheep breeds. Conclusions Our study did not reveal any sign of major population replacement of native sheep in Finland since the Iron Age. Variations in the availability of archaeological remains may explain differences in genetic diversity estimates and patterns within the cohorts rather than demographic events that occurred in the past. Our ancient DNA results fit well with the genetic context of domestic sheep as determined by analyses of modern north-European sheep breeds. PMID:23339395

  1. DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

    PubMed

    Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

    2012-01-01

    DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.

  2. Mammalian DNA enriched for replication origins is enriched for snap-back sequences.

    PubMed

    Zannis-Hadjopoulos, M; Kaufmann, G; Martin, R G

    1984-11-15

    Using the instability of replication loops as a method for the isolation of double-stranded nascent DNA, extruded DNA enriched for replication origins was obtained and denatured. Snap-back DNA, single-stranded DNA with inverted repeats (palindromic sequences), reassociates rapidly into stem-loop structures with zero-order kinetics when conditions are changed from denaturing to renaturing, and can be assayed by chromatography on hydroxyapatite. Origin-enriched nascent DNA strands from mouse, rat and monkey cells growing either synchronously or asynchronously were purified and assayed for the presence of snap-back sequences. The results show that origin-enriched DNA is also enriched for snap-back sequences, implying that some origins for mammalian DNA replication contain or lie near palindromic sequences.

  3. Large-scale DNA Barcode Library Generation for Biomolecule Identification in High-throughput Screens.

    PubMed

    Lyons, Eli; Sheridan, Paul; Tremmel, Georg; Miyano, Satoru; Sugano, Sumio

    2017-10-24

    High-throughput screens allow for the identification of specific biomolecules with characteristics of interest. In barcoded screens, DNA barcodes are linked to target biomolecules in a manner allowing for the target molecules making up a library to be identified by sequencing the DNA barcodes using Next Generation Sequencing. To be useful in experimental settings, the DNA barcodes in a library must satisfy certain constraints related to GC content, homopolymer length, Hamming distance, and blacklisted subsequences. Here we report a novel framework to quickly generate large-scale libraries of DNA barcodes for use in high-throughput screens. We show that our framework dramatically reduces the computation time required to generate large-scale DNA barcode libraries, compared with a naїve approach to DNA barcode library generation. As a proof of concept, we demonstrate that our framework is able to generate a library consisting of one million DNA barcodes for use in a fragment antibody phage display screening experiment. We also report generating a general purpose one billion DNA barcode library, the largest such library yet reported in literature. Our results demonstrate the value of our novel large-scale DNA barcode library generation framework for use in high-throughput screening applications.

  4. Characterizing DNA preservation in degraded specimens of Amara alpina (Carabidae: Coleoptera).

    PubMed

    Heintzman, Peter D; Elias, Scott A; Moore, Karen; Paszkiewicz, Konrad; Barnes, Ian

    2014-05-01

    DNA preserved in degraded beetle (Coleoptera) specimens, including those derived from dry-stored museum and ancient permafrost-preserved environments, could provide a valuable resource for researchers interested in species and population histories over timescales from decades to millenia. However, the potential of these samples as genetic resources is currently unassessed. Here, using Sanger and Illumina shotgun sequence data, we explored DNA preservation in specimens of the ground beetle Amara alpina, from both museum and ancient environments. Nearly all museum specimens had amplifiable DNA, with the maximum amplifiable fragment length decreasing with age. Amplification of DNA was only possible in 45% of ancient specimens. Preserved mitochondrial DNA fragments were significantly longer than those of nuclear DNA in both museum and ancient specimens. Metagenomic characterization of extracted DNA demonstrated that parasite-derived sequences, including Wolbachia and Spiroplasma, are recoverable from museum beetle specimens. Ancient DNA extracts contained beetle DNA in amounts comparable to museum specimens. Overall, our data demonstrate that there is great potential for both museum and ancient specimens of beetles in future genetic studies, and we see no reason why this would not be the case for other orders of insect. © 2013 John Wiley & Sons Ltd.

  5. Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma

    PubMed Central

    Wrzeszczynski, Kazimierz O.; Frank, Mayu O.; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A.; Moore Vogel, Julia L.; Bruce, Jeffrey N.; Lassman, Andrew B.; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V.; Zody, Michael C.; Jobanputra, Vaidehi; Royyuru, Ajay K.

    2017-01-01

    Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. Conclusions: The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. ClinicalTrials.gov identifier: NCT02725684. PMID:28740869

  6. False positives complicate ancient pathogen identifications using high-throughput shotgun sequencing

    PubMed Central

    2014-01-01

    Background Identification of historic pathogens is challenging since false positives and negatives are a serious risk. Environmental non-pathogenic contaminants are ubiquitous. Furthermore, public genetic databases contain limited information regarding these species. High-throughput sequencing may help reliably detect and identify historic pathogens. Results We shotgun-sequenced 8 16th-century Mixtec individuals from the site of Teposcolula Yucundaa (Oaxaca, Mexico) who are reported to have died from the huey cocoliztli (‘Great Pestilence’ in Nahautl), an unknown disease that decimated native Mexican populations during the Spanish colonial period, in order to identify the pathogen. Comparison of these sequences with those deriving from the surrounding soil and from 4 precontact individuals from the site found a wide variety of contaminant organisms that confounded analyses. Without the comparative sequence data from the precontact individuals and soil, false positives for Yersinia pestis and rickettsiosis could have been reported. Conclusions False positives and negatives remain problematic in ancient DNA analyses despite the application of high-throughput sequencing. Our results suggest that several studies claiming the discovery of ancient pathogens may need further verification. Additionally, true single molecule sequencing’s short read lengths, inability to sequence through DNA lesions, and limited ancient-DNA-specific technical development hinder its application to palaeopathology. PMID:24568097

  7. A pooling-based approach to mapping genetic variants associated with DNA methylation

    PubMed Central

    Kaplow, Irene M.; MacIsaac, Julia L.; Mah, Sarah M.; McEwen, Lisa M.; Kobor, Michael S.; Fraser, Hunter B.

    2015-01-01

    DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a truly genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified more than 2000 genetic variants associated with DNA methylation. We found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data. PMID:25910490

  8. A pooling-based approach to mapping genetic variants associated with DNA methylation

    DOE PAGES

    Kaplow, Irene M.; MacIsaac, Julia L.; Mah, Sarah M.; ...

    2015-04-24

    DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a trulymore » genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified more than 2000 genetic variants associated with DNA methylation. Here we found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data.« less

  9. Specific DNA binding activity of T antigen subclasses varies among different SV40-transformed cell lines.

    PubMed

    Burger, C; Fanning, E

    1983-04-15

    Large tumor antigen (T antigen) occurs in at least three different oligomeric subclasses in cells infected or transformed by simian virus 40 (SV40): 5-7 S, 14-16 S, and 23-25 S. The 23-25 S form is complexed with a host phosphoprotein (p53). The DNA binding properties of these three subclasses of T antigen from nine different cell lines and free p53 protein were compared using an immunoprecipitation assay. All three subclasses of T antigen bound specifically to SV40 DNA sequences near the origin of replication. However, the DNA binding activity varied between different cell lines over a 40- to 50-fold range. The 23-25 S and 14-16 S forms from most of the cell lines tested bound much less SV40 origin DNA than 5-7 S T antigen. The free p53 phosphoprotein did not bind specifically to any SV40 DNA sequences.

  10. Computational Design of DNA-Binding Proteins.

    PubMed

    Thyme, Summer; Song, Yifan

    2016-01-01

    Predicting the outcome of engineered and naturally occurring sequence perturbations to protein-DNA interfaces requires accurate computational modeling technologies. It has been well established that computational design to accommodate small numbers of DNA target site substitutions is possible. This chapter details the basic method of design used in the Rosetta macromolecular modeling program that has been successfully used to modulate the specificity of DNA-binding proteins. More recently, combining computational design and directed evolution has become a common approach for increasing the success rate of protein engineering projects. The power of such high-throughput screening depends on computational methods producing multiple potential solutions. Therefore, this chapter describes several protocols for increasing the diversity of designed output. Lastly, we describe an approach for building comparative models of protein-DNA complexes in order to utilize information from homologous sequences. These models can be used to explore how nature modulates specificity of protein-DNA interfaces and potentially can even be used as starting templates for further engineering.

  11. nrDNA:mtDNA copy number ratios as a comparative metric for evolutionary and conservation genetics.

    PubMed

    Goodall-Copestake, William Paul

    2018-05-12

    Identifying genetic cues of functional relevance is key to understanding the drivers of evolution and increasingly important for the conservation of biodiversity. This study introduces nuclear ribosomal DNA (nrDNA) to mitochondrial DNA (mtDNA) copy number ratios as a metric with which to screen for this functional genetic variation prior to more extensive omics analyses. To illustrate the metric, quantitative PCR was used to estimate nrDNA (18S) to mtDNA (16S) copy number ratios in muscle tissue from samples of two zooplankton species: Salpa thompsoni caught near Elephant Island (Southern Ocean) and S. fusiformis sampled off Gough Island (South Atlantic). Average 18S:16S ratios in these samples were 9:1 and 3:1, respectively. nrDNA 45S arrays and mitochondrial genomes were then deep sequenced to uncover the sources of intra-individual genetic variation underlying these 18S:16S copy number differences. The deep sequencing profiles obtained were consistent with genetic changes resulting from adaptive processes, including an expansion of nrDNA and damage to mtDNA in S. thompsoni, potentially in response to the polar environment. Beyond this example from zooplankton, nrDNA:mtDNA copy number ratios offer a promising metric to help identify genetic variation of functional relevance in animals more broadly.

  12. DNA sequence determinants controlling affinity, stability and shape of DNA complexes bound by the nucleoid protein Fis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hancock, Stephen P.; Stella, Stefano; Cascio, Duilio

    The abundant Fis nucleoid protein selectively binds poorly related DNA sequences with high affinities to regulate diverse DNA reactions. Fis binds DNA primarily through DNA backbone contacts and selects target sites by reading conformational properties of DNA sequences, most prominently intrinsic minor groove widths. High-affinity binding requires Fis-stabilized DNA conformational changes that vary depending on DNA sequence. In order to better understand the molecular basis for high affinity site recognition, we analyzed the effects of DNA sequence within and flanking the core Fis binding site on binding affinity and DNA structure. X-ray crystal structures of Fis-DNA complexes containing variable sequencesmore » in the noncontacted center of the binding site or variations within the major groove interfaces show that the DNA can adapt to the Fis dimer surface asymmetrically. We show that the presence and position of pyrimidine-purine base steps within the major groove interfaces affect both local DNA bending and minor groove compression to modulate affinities and lifetimes of Fis-DNA complexes. Sequences flanking the core binding site also modulate complex affinities, lifetimes, and the degree of local and global Fis-induced DNA bending. In particular, a G immediately upstream of the 15 bp core sequence inhibits binding and bending, and A-tracts within the flanking base pairs increase both complex lifetimes and global DNA curvatures. Taken together, our observations support a revised DNA motif specifying high-affinity Fis binding and highlight the range of conformations that Fis-bound DNA can adopt. Lastly, the affinities and DNA conformations of individual Fis-DNA complexes are likely to be tailored to their context-specific biological functions.« less

  13. DNA sequence determinants controlling affinity, stability and shape of DNA complexes bound by the nucleoid protein Fis

    DOE PAGES

    Hancock, Stephen P.; Stella, Stefano; Cascio, Duilio; ...

    2016-03-09

    The abundant Fis nucleoid protein selectively binds poorly related DNA sequences with high affinities to regulate diverse DNA reactions. Fis binds DNA primarily through DNA backbone contacts and selects target sites by reading conformational properties of DNA sequences, most prominently intrinsic minor groove widths. High-affinity binding requires Fis-stabilized DNA conformational changes that vary depending on DNA sequence. In order to better understand the molecular basis for high affinity site recognition, we analyzed the effects of DNA sequence within and flanking the core Fis binding site on binding affinity and DNA structure. X-ray crystal structures of Fis-DNA complexes containing variable sequencesmore » in the noncontacted center of the binding site or variations within the major groove interfaces show that the DNA can adapt to the Fis dimer surface asymmetrically. We show that the presence and position of pyrimidine-purine base steps within the major groove interfaces affect both local DNA bending and minor groove compression to modulate affinities and lifetimes of Fis-DNA complexes. Sequences flanking the core binding site also modulate complex affinities, lifetimes, and the degree of local and global Fis-induced DNA bending. In particular, a G immediately upstream of the 15 bp core sequence inhibits binding and bending, and A-tracts within the flanking base pairs increase both complex lifetimes and global DNA curvatures. Taken together, our observations support a revised DNA motif specifying high-affinity Fis binding and highlight the range of conformations that Fis-bound DNA can adopt. Lastly, the affinities and DNA conformations of individual Fis-DNA complexes are likely to be tailored to their context-specific biological functions.« less

  14. Specific minor groove solvation is a crucial determinant of DNA binding site recognition

    PubMed Central

    Harris, Lydia-Ann; Williams, Loren Dean; Koudelka, Gerald B.

    2014-01-01

    The DNA sequence preferences of nearly all sequence specific DNA binding proteins are influenced by the identities of bases that are not directly contacted by protein. Discrimination between non-contacted base sequences is commonly based on the differential abilities of DNA sequences to allow narrowing of the DNA minor groove. However, the factors that govern the propensity of minor groove narrowing are not completely understood. Here we show that the differential abilities of various DNA sequences to support formation of a highly ordered and stable minor groove solvation network are a key determinant of non-contacted base recognition by a sequence-specific binding protein. In addition, disrupting the solvent network in the non-contacted region of the binding site alters the protein's ability to recognize contacted base sequences at positions 5–6 bases away. This observation suggests that DNA solvent interactions link contacted and non-contacted base recognition by the protein. PMID:25429976

  15. A comprehensive resource of genomic, epigenomic and transcriptomic sequencing data for the black truffle Tuber melanosporum

    PubMed Central

    2014-01-01

    Background Tuber melanosporum, also known in the gastronomic community as “truffle”, features one of the largest fungal genomes (125 Mb) with an exceptionally high transposable element (TE) and repetitive DNA content (>58%). The main purpose of DNA methylation in fungi is TE silencing. As obligate outcrossing organisms, truffles are bound to a sexual mode of propagation, which together with TEs is thought to represent a major force driving the evolution of DNA methylation. Thus, it was of interest to examine if and how T. melanosporum exploits DNA methylation to maintain genome integrity. Findings We performed whole-genome DNA bisulfite sequencing and mRNA sequencing on different developmental stages of T. melanosporum; namely, fruitbody (“truffle”), free-living mycelium and ectomycorrhiza. The data revealed a high rate of cytosine methylation (>44%), selectively targeting TEs rather than genes with a strong preference for CpG sites. Whole genome DNA sequencing uncovered multiple TE-enriched, copy number variant regions bearing a significant fraction of hypomethylated and expressed TEs, almost exclusively in free-living mycelium propagated in vitro. Treatment of mycelia with 5-azacytidine partially reduced DNA methylation and increased TE transcription. Our transcriptome assembly also resulted in the identification of a set of novel transcripts from 614 genes. Conclusions The datasets presented here provide valuable and comprehensive (epi)genomic information that can be of interest for evolutionary genomics studies of multicellular (filamentous) fungi, in particular Ascomycetes belonging to the subphylum, Pezizomycotina. Evidence derived from comparative methylome and transcriptome analyses indicates that a non-exhaustive and partly reversible methylation process operates in truffles. PMID:25392735

  16. A comprehensive resource of genomic, epigenomic and transcriptomic sequencing data for the black truffle Tuber melanosporum.

    PubMed

    Chen, Pao-Yang; Montanini, Barbara; Liao, Wen-Wei; Morselli, Marco; Jaroszewicz, Artur; Lopez, David; Ottonello, Simone; Pellegrini, Matteo

    2014-01-01

    Tuber melanosporum, also known in the gastronomic community as "truffle", features one of the largest fungal genomes (125 Mb) with an exceptionally high transposable element (TE) and repetitive DNA content (>58%). The main purpose of DNA methylation in fungi is TE silencing. As obligate outcrossing organisms, truffles are bound to a sexual mode of propagation, which together with TEs is thought to represent a major force driving the evolution of DNA methylation. Thus, it was of interest to examine if and how T. melanosporum exploits DNA methylation to maintain genome integrity. We performed whole-genome DNA bisulfite sequencing and mRNA sequencing on different developmental stages of T. melanosporum; namely, fruitbody ("truffle"), free-living mycelium and ectomycorrhiza. The data revealed a high rate of cytosine methylation (>44%), selectively targeting TEs rather than genes with a strong preference for CpG sites. Whole genome DNA sequencing uncovered multiple TE-enriched, copy number variant regions bearing a significant fraction of hypomethylated and expressed TEs, almost exclusively in free-living mycelium propagated in vitro. Treatment of mycelia with 5-azacytidine partially reduced DNA methylation and increased TE transcription. Our transcriptome assembly also resulted in the identification of a set of novel transcripts from 614 genes. The datasets presented here provide valuable and comprehensive (epi)genomic information that can be of interest for evolutionary genomics studies of multicellular (filamentous) fungi, in particular Ascomycetes belonging to the subphylum, Pezizomycotina. Evidence derived from comparative methylome and transcriptome analyses indicates that a non-exhaustive and partly reversible methylation process operates in truffles.

  17. Detection of the free living amoeba Naegleria fowleri by using conventional and real-time PCR based on a single copy DNA sequence.

    PubMed

    Régoudis, Estelle; Pélandakis, Michel

    2016-02-01

    The amoeba-flagellate Naegleria fowleri is a causative agent of primary amoebic meningoencephalitis (PAM). This thermophilic species occurs worldwide and tends to proliferate in warm aquatic environment. The PAM cases remain rare but this infection is mostly fatal. Here, we describe a single copy region which has been cloned and sequenced, and was used for both conventional and real-time PCR. Targeting a single-copy DNA sequence allows to directly quantify the N. fowleri cells. The real-time PCR results give a detection limit of 1 copy per reaction with high reproducibility without the need of a Taqman probe. This procedure is of interest as compared to other procedures which are mostly based on the detection of multi-copy DNA associated with a Taqman probe. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Highly sensitive fluorescence quantitative detection of specific DNA sequences with molecular beacons and nucleic acid dye SYBR Green I.

    PubMed

    Xiang, Dongshan; Zhai, Kun; Xiang, Wenjun; Wang, Lianzhi

    2014-11-01

    A highly sensitive fluorescence method of quantitative detection for specific DNA sequence is developed based on molecular beacon (MB) and nucleic acid dye SYBR Green I by synchronous fluorescence analysis. It is demonstrated by an oligonucleotide sequence of wild-type HBV (target DNA) as a model system. In this strategy, the fluorophore of MB is designed to be 6-carboxyfluorescein group (FAM), and the maximum excitation wavelength and maximum emission wavelength are both very close to that of SYBR Green I. In the presence of targets DNA, the MBs hybridize with the targets DNA and form double-strand DNA (dsDNA), the fluorophore FAM is separated from the quencher BHQ-1, thus the fluorophore emit fluorescence. At the same time, SYBR Green I binds to dsDNA, the fluorescence intensity of SYBR Green I is significantly enhanced. When targets DNA are detected by synchronous fluorescence analysis, the fluorescence peaks of FAM and SYBR Green I overlap completely, so the fluorescence signal of system will be significantly enhanced. Thus, highly sensitive fluorescence quantitative detection for DNA can be realized. Under the optimum conditions, the total fluorescence intensity of FAM and SYBR Green I exhibits good linear dependence on concentration of targets DNA in the range from 2×10(-11) to 2.5×10(-9)M. The detection limit of target DNA is estimated to be 9×10(-12)M (3σ). Compared with previously reported methods of detection DNA with MB, the proposed method can significantly enhance the detection sensitivity. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Comparison of mitochondrial DNA control region sequence and microsatellite DNA analyses in estimating population structure and gene flow rates in Atlantic sturgeon Acipenser oxyrinchus

    USGS Publications Warehouse

    Wirgin, I.; Waldman, J.; Stabile, J.; Lubinski, B.; King, T.

    2002-01-01

    Atlantic sturgeon Acipenser oxyrinchus is large, long-lived, and anadromous with subspecies distributed along the Atlantic (A. oxyrinchus oxyrinchus) and Gulf of Mexico (A. o. desotoi) coasts of North America. Although it is not certain if extirpation of some population units has occurred, because of anthropogenic influences abundances of all populations are low compared with historical levels. Informed management of A. oxyrinchus demands a detailed knowledge of its population structure, levels of genetic diversity, and likelihood to home to natal rivers. We compared the use of mitochondrial DNA (mtDNA) control region sequence and microsatellite nuclear DNA (nDNA) analyses in identifying the stock structure and homing fidelity of Atlantic and Gulf coast populations of A. oxyrinchus. The approaches were concordant in that they revealed moderate to high levels of genetic diversity and suggested that populations of Atlantic sturgeon are highly structured. At least six genetically distinct management units were detected using the two approaches among the rivers surveyed. Mitochondrial DNA sequences revealed a significant cline in haplotype diversity along the Atlantic coast with monomorphism observed in Canadian populations. High levels of nDNA diversity were also observed among populations along the Atlantic coast, including the two Canadian populations, probably resulting from the more rapid rate of mutational and evolutionary change at microsatellite loci. Estimates of gene flow among populations were similar between both approaches with the exception that because of mtDNA monomorphism in Canadian populations, gene flow estimates between them were unobtainable. Analyses of both genomes provided high resolution and confidence in characterizing the population structure of Atlantic sturgeon. Microsatellite analysis was particularly informative in delineating population structure in rivers that were recently glaciated and may prove diagnostic in rivers that are geographically proximal along the south Atlantic coast of the US.

  20. A Method for Preparing DNA Sequencing Templates Using a DNA-Binding Microplate

    PubMed Central

    Yang, Yu; Hebron, Haroun R.; Hang, Jun

    2009-01-01

    A DNA-binding matrix was immobilized on the surface of a 96-well microplate and used for plasmid DNA preparation for DNA sequencing. The same DNA-binding plate was used for bacterial growth, cell lysis, DNA purification, and storage. In a single step using one buffer, bacterial cells were lysed by enzymes, and released DNA was captured on the plate simultaneously. After two wash steps, DNA was eluted and stored in the same plate. Inclusion of phosphates in the culture medium was found to enhance the yield of plasmid significantly. Purified DNA samples were used successfully in DNA sequencing with high consistency and reproducibility. Eleven vectors and nine libraries were tested using this method. In 10 μl sequencing reactions using 3 μl sample and 0.25 μl BigDye Terminator v3.1, the results from a 3730xl sequencer gave a success rate of 90–95% and read-lengths of 700 bases or more. The method is fully automatable and convenient for manual operation as well. It enables reproducible, high-throughput, rapid production of DNA with purity and yields sufficient for high-quality DNA sequencing at a substantially reduced cost. PMID:19568455

Top