Science.gov

Sample records for gc-content dna codes

  1. Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.

    PubMed

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  2. Biased Gene Conversion and GC-Content Evolution in the Coding Sequences of Reptiles and Vertebrates

    PubMed Central

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  3. Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age.

    PubMed

    Meier-Kolthoff, Jan P; Klenk, Hans-Peter; Göker, Markus

    2014-02-01

    The G+C content of a genome is frequently used in taxonomic descriptions of species and genera. In the past it has been determined using conventional, indirect methods, but it is nowadays reasonable to calculate the DNA G+C content directly from the increasingly available and affordable genome sequences. The expected increase in accuracy, however, might alter the way in which the G+C content is used for drawing taxonomic conclusions. We here re-estimate the literature assumption that the G+C content can vary up to 3-5 % within species using genomic datasets. The resulting G+C content differences are compared with DNA-DNA hybridization (DDH) similarities calculated in silico using the GGDC web server, with 70% similarity as the gold standard threshold for species boundaries. The results indicate that the G+C content, if computed from genome sequences, varies no more than 1% within species. Statistical models based on larger differences alone can reject the hypothesis that two strains belong to the same species. Because DDH similarities between two non-type strains occur in the genomic datasets, we also examine to what extent and under which conditions such a similarity could be <70% even though the similarity of either strain to a type strain was ≥ 70%. In theory, their similarity could be as low as 50%, whereas empirical data suggest a boundary closer (but not identical) to 70%. However, it is shown that using a 50% boundary would not affect the conclusions regarding the DNA G+C content. Hence, we suggest that discrepancies between G+C content data provided in species descriptions on the one hand and those recalculated after genome sequencing on the other hand ≥ 1% are due to significant inaccuracies of the applied conventional methods and accordingly call for emendations of species descriptions. PMID:24505073

  4. MITOCHONDRIAL DNA IN THE OOGAMOCHLAMYS CLADE (CHLOROPHYCEAE): HIGH GC CONTENT AND UNIQUE GENOME ARCHITECTURE FOR GREEN ALGAE(1).

    PubMed

    Borza, Tudor; Redmond, Erin K; Laflamme, Mark; Lee, Robert W

    2009-12-01

    Most mitochondrial genomes in the green algal phylum Chlorophyta are AT-rich, circular-mapping DNA molecules. However, mitochondrial genomes from the Reinhardtii clade of the Chlorophyceae lineage are linear and sometimes fragmented into subgenomic forms. Moreover, Polytomella capuana, from the Reinhardtii clade, has an elevated GC content (57.2%). In the present study, we examined mitochondrial genome conformation and GC bias in the Oogamochlamys clade of the Chlorophyceae, which phylogenetic data suggest is closely related to the Reinhardtii clade. Total DNA from selected Oogamochlamys taxa, including four Lobochlamys culleus (H. Ettl) Pröschold, B. Marin, U. G. Schlöss. et Melkonian strains, Lobochlamys segnis (H. Ettl) Pröschold, B. Marin, U. G. Schlöss. et Melkonian, and Oogamochlamys gigantea (O. Dill) Pröschold, B. Marin, U. G. Schlöss. et Melkonian, was subjected to Southern blot analyses with cob and cox1 probes, and the results suggest that the mitochondrial genome of these taxa is represented by multiple-sized linear DNA fragments with overlapping homologies. On the basis of these data, we propose that linear mitochondrial DNA with a propensity to become fragmented arose in an ancestor common to the Reinhardtii and Oogamochlamys clades or even earlier in the evolutionary history of the Chlorophyceae. Analyses of partial cob and cox1 sequences from these Oogamochlamys taxa revealed an unusually high GC content (49.9%-65.1%) and provided evidence for the accumulation of cob and cox1 pseudogenes and truncated sequences in the mitochondrial genome of all L. culleus strains examined. PMID:27032590

  5. DNA codes

    SciTech Connect

    Torney, D. C.

    2001-01-01

    We have begun to characterize a variety of codes, motivated by potential implementation as (quaternary) DNA n-sequences, with letters denoted A, C The first codes we studied are the most reminiscent of conventional group codes. For these codes, Hamming similarity was generalized so that the score for matched letters takes more than one value, depending upon which letters are matched [2]. These codes consist of n-sequences satisfying an upper bound on the similarities, summed over the letter positions, of distinct codewords. We chose similarity 2 for matches of letters A and T and 3 for matches of the letters C and G, providing a rough approximation to double-strand bond energies in DNA. An inherent novelty of DNA codes is 'reverse complementation'. The latter may be defined, as follows, not only for alphabets of size four, but, more generally, for any even-size alphabet. All that is required is a matching of the letters of the alphabet: a partition into pairs. Then, the reverse complement of a codeword is obtained by reversing the order of its letters and replacing each letter by its match. For DNA, the matching is AT/CG because these are the Watson-Crick bonding pairs. Reversal arises because two DNA sequences form a double strand with opposite relative orientations. Thus, as will be described in detail, because in vitro decoding involves the formation of double-stranded DNA from two codewords, it is reasonable to assume - for universal applicability - that the reverse complement of any codeword is also a codeword. In particular, self-reverse complementary codewords are expressly forbidden in reverse-complement codes. Thus, an appropriate distance between all pairs of codewords must, when large, effectively prohibit binding between the respective codewords: to form a double strand. Only reverse-complement pairs of codewords should be able to bind. For most applications, a DNA code is to be bi-partitioned, such that the reverse-complementary pairs are separated

  6. Ecological and evolutionary significance of genomic GC content diversity in monocots

    PubMed Central

    Šmarda, Petr; Bureš, Petr; Horová, Lucie; Leitch, Ilia J.; Mucina, Ladislav; Pacini, Ettore; Tichý, Lubomír; Grulich, Vít; Rotreklová, Olga

    2014-01-01

    Genomic DNA base composition (GC content) is predicted to significantly affect genome functioning and species ecology. Although several hypotheses have been put forward to address the biological impact of GC content variation in microbial and vertebrate organisms, the biological significance of GC content diversity in plants remains unclear because of a lack of sufficiently robust genomic data. Using flow cytometry, we report genomic GC contents for 239 species representing 70 of 78 monocot families and compare them with genomic characters, a suite of life history traits and climatic niche data using phylogeny-based statistics. GC content of monocots varied between 33.6% and 48.9%, with several groups exceeding the GC content known for any other vascular plant group, highlighting their unusual genome architecture and organization. GC content showed a quadratic relationship with genome size, with the decreases in GC content in larger genomes possibly being a consequence of the higher biochemical costs of GC base synthesis. Dramatic decreases in GC content were observed in species with holocentric chromosomes, whereas increased GC content was documented in species able to grow in seasonally cold and/or dry climates, possibly indicating an advantage of GC-rich DNA during cell freezing and desiccation. We also show that genomic adaptations associated with changing GC content might have played a significant role in the evolution of the Earth’s contemporary biota, such as the rise of grass-dominated biomes during the mid-Tertiary. One of the major selective advantages of GC-rich DNA is hypothesized to be facilitating more complex gene regulation. PMID:25225383

  7. On the molecular mechanism of GC content variation among eubacterial genomes

    PubMed Central

    2012-01-01

    Background As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes. Results Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group. Conclusion Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such

  8. Insights from the GC content analysis of 76genome survey sequences (GSS) from Elaeisoleiferaψ

    PubMed Central

    Bhore, Subhash J; Kassim, Amelia; Shah, Farida H

    2010-01-01

    South American oil-palm (Elaeis oleifera) is not cultivated in tropical countries like Malaysia on large scale due to low yield of palm oil derived from its fruit mesocarp. However, its fruit mesocarp oil contains about 68.6 % oleic acid (C18:1) which is more than double in comparison to commercially cultivated oilpalm, E. guineensis Jacq Tenera (hybrid of Dura (♀) x Pisifera (♂)). It is also known that E. oleifera is a good source of tocotrienols and carotenoids. Therefore, it is of interest to know the genome sequence of E. oleifera. The objective of this study is to generate genome survey sequences (GSS) to get GC content insight in the E. oleifera genome. The nuclear genomic DNA isolated from young leaf‐tissues was digested with EcoRI and NdeI/DraI restriction enzymes; and three genomic DNA libraries were constructed using Lambda ZAP‐II, pGEM®‐T Easy, and pDONR 222™ as cloning vectors. Generated 76 GSSs were analyzed by using Bioinformatics tools. The analysis result indicates that the adenine, cytosine, guanine and thymine content in generated GSSs are 30%, 20%, 20%, and 30% respectively. In conclusion, based on the precise GC content analysis of the randomly isolated 76 GSSs by using Bioinformatics tools we hypothesize that GC content in E. oleifera genome is 40%. The hypothesized 40% GC content in E. oleifera genome is expected to remain close to the GC content based on the whole genome analysis. ψThe nucleotide sequence data reported in this paper have been submitted to dbGSS division of the international DNA database (GenBank/DDBJ/EMBL) under accession numbers: DX575945- DX575972 and EI798032-EI798079. Abbreviations gDNA - Nuclear genomic DNA, GSSs - Genome survey sequences K12, SAOP - South American oil‐palm Db1 PMID:21364775

  9. [Comparison study on the methods for finding borders between coding and non-coding DNA regions in rice].

    PubMed

    Sun, Yi-Gang; Gao, Lei; Zhang, Zhong-Hua; Xue, Qing-Zhong

    2005-07-01

    Entropy-based divergence measures have provided an impelling tool in evaluating sequence complexity, predicting CpG island, and detecting borders between coding and non-coding DNA regions etc. In this paper, two new divergence measures: the alpha-KL divergence and the alpha-Jensen-Shannon divergence were defined and a coarse-graining vector of amino acids- corresponding codons was proposed according to codons GC-content, in order to improve the computational approach to finding borders between coding and non-coding in rice. A comparison of the accuracies gained by different vectors (the Jensen-Shannon divergence, the Jensen-Renyi divergence, the alpha-KL divergence and the alpha-Jensen -Shannon divergence) showed that recognition efficiency based on the new information measures with the vector coarse-graining increase by 4-5 times than that of Bernaola's method in the 'stop codon' of coding regions in rice. PMID:16120591

  10. Identification and prevention of a GC content bias in SAGE libraries.

    PubMed

    Margulies, E H; Kardia, S L; Innis, J W

    2001-06-15

    Serial Analysis of Gene Expression (SAGE) is becoming a widely used gene expression profiling method for the study of development, cancer and other human diseases. Investigators using SAGE rely heavily on the quantitative aspect of this method for cataloging gene expression and comparing multiple SAGE libraries. We have developed additional computational and statistical tools to assess the quality and reproducibility of a SAGE library. Using these methods, a critical variable in the SAGE protocol was identified that has the potential to bias the Tag distribution relative to the GC content of the 10 bp SAGE Tag DNA sequence. We also detected this bias in a number of publicly available SAGE libraries. It is important to note that the GC content bias went undetected by quality control procedures in the current SAGE protocol and was only identified with the use of these statistical analyses on as few as 750 SAGE Tags. In addition to keeping any solution of free DiTags on ice, an analysis of the GC content should be performed before sequencing large numbers of SAGE Tags to be confident that SAGE libraries are free from experimental bias. PMID:11410683

  11. Complete chloroplast genome sequences of Drimys, Liriodendron, andPiper: Implications for the phylogeny of magnoliids and the evolution ofGC content

    SciTech Connect

    Zhengqiu, C.; Penaflor, C.; Kuehl, J.V.; Leebens-Mack, J.; Carlson, J.; dePamphilis, C.W.; Boore, J.L.; Jansen, R.K.

    2006-06-01

    the inverted repeat due to the presence of rRNA genes and lowest in the small single copy region where most NADH genes are located. Phylogenetic analyses using maximum parsimony and maximum likelihood methods were performed on DNA sequences of 61 protein-coding genes. Trees from both analyses provided strong support for the monophyly of magnoliids and two strongly supported groups were identified, the Canellales/Piperales and the Laurales/Magnoliales. The phylogenies also provided moderate to strong support for the basal position of Amborella, and a sister relationship of magnoliids to a clade that includes monocots and eudicots. The complete sequences of three magnoliid chloroplast genomes provide new data from the largest basal angiosperm clade. Evolutionary comparisons of these new genome sequences, combined with other published angiosperm genome, confirm that GC content is unevenly distributed across the genome by location, codon position, and functional group. Furthermore, phylogenetic analyses provide the strongest support so far for the hypothesis that the magnoliids are sister to a large clade that includes both monocots and eudicots.

  12. DNA: Polymer and molecular code

    NASA Astrophysics Data System (ADS)

    Shivashankar, G. V.

    1999-10-01

    The thesis work focusses upon two aspects of DNA, the polymer and the molecular code. Our approach was to bring single molecule micromanipulation methods to the study of DNA. It included a home built optical microscope combined with an atomic force microscope and an optical tweezer. This combined approach led to a novel method to graft a single DNA molecule onto a force cantilever using the optical tweezer and local heating. With this method, a force versus extension assay of double stranded DNA was realized. The resolution was about 10 picoN. To improve on this force measurement resolution, a simple light backscattering technique was developed and used to probe the DNA polymer flexibility and its fluctuations. It combined the optical tweezer to trap a DNA tethered bead and the laser backscattering to detect the beads Brownian fluctuations. With this technique the resolution was about 0.1 picoN with a millisecond access time, and the whole entropic part of the DNA force-extension was measured. With this experimental strategy, we measured the polymerization of the protein RecA on an isolated double stranded DNA. We observed the progressive decoration of RecA on the l DNA molecule, which results in the extension of l , due to unwinding of the double helix. The dynamics of polymerization, the resulting change in the DNA entropic elasticity and the role of ATP hydrolysis were the main parts of the study. A simple model for RecA assembly on DNA was proposed. This work presents a first step in the study of genetic recombination. Recently we have started a study of equilibrium binding which utilizes fluorescence polarization methods to probe the polymerization of RecA on single stranded DNA. In addition to the study of material properties of DNA and DNA-RecA, we have developed experiments for which the code of the DNA is central. We studied one aspect of DNA as a molecular code, using different techniques. In particular the programmatic use of template specificity makes

  13. The Bimodal Distribution of Genic GC Content Is Ancestral to Monocot Species

    PubMed Central

    Clément, Yves; Fustier, Margaux-Alison; Nabholz, Benoit; Glémin, Sylvain

    2015-01-01

    In grasses such as rice or maize, the distribution of genic GC content is well known to be bimodal. It is mainly driven by GC content at third codon positions (GC3 for short). This feature is thought to be specific to grasses as closely related species like banana have a unimodal GC3 distribution. GC3 is associated with numerous genomics features and uncovering the origin of this peculiar distribution will help understanding the potential roles and consequences of GC3 variations within and between genomes. Until recently, the origin of the peculiar GC3 distribution in grasses has remained unknown. Thanks to the recent publication of several complete genomes and transcriptomes of nongrass monocots, we studied more than 1,000 groups of one-to-one orthologous genes in seven grasses and three outgroup species (banana, palm tree, and yam). Using a maximum likelihood-based method, we reconstructed GC3 at several ancestral nodes. We found that the bimodal GC3 distribution observed in extant grasses is ancestral to both grasses and most monocot species, and that other species studied here have lost this peculiar structure. We also found that GC3 in grass lineages is globally evolving very slowly and that the decreasing GC3 gradient observed from 5′ to 3′ along coding sequences is also conserved and ancestral to monocots. This result strongly challenges the previous views on the specificity of grass genomes and we discuss its implications for the possible causes of the evolution of GC content in monocots. PMID:25527839

  14. The mutation spectrum in genomic late replication domains shapes mammalian GC content.

    PubMed

    Kenigsberg, Ephraim; Yehuda, Yishai; Marjavaara, Lisette; Keszthelyi, Andrea; Chabes, Andrei; Tanay, Amos; Simon, Itamar

    2016-05-19

    Genome sequence compositions and epigenetic organizations are correlated extensively across multiple length scales. Replication dynamics, in particular, is highly correlated with GC content. We combine genome-wide time of replication (ToR) data, topological domains maps and detailed functional epigenetic annotations to study the correlations between replication timing and GC content at multiple scales. We find that the decrease in genomic GC content at large scale late replicating regions can be explained by mutation bias favoring A/T nucleotide, without selection or biased gene conversion. Quantification of the free dNTP pool during the cell cycle is consistent with a mechanism involving replication-coupled mutation spectrum that favors AT nucleotides at late S-phase. We suggest that mammalian GC content composition is shaped by independent forces, globally modulating mutation bias and locally selecting on functional element. Deconvoluting these forces and analyzing them on their native scales is important for proper characterization of complex genomic correlations. PMID:27085808

  15. The mutation spectrum in genomic late replication domains shapes mammalian GC content

    PubMed Central

    Kenigsberg, Ephraim; Yehuda, Yishai; Marjavaara, Lisette; Keszthelyi, Andrea; Chabes, Andrei; Tanay, Amos; Simon, Itamar

    2016-01-01

    Genome sequence compositions and epigenetic organizations are correlated extensively across multiple length scales. Replication dynamics, in particular, is highly correlated with GC content. We combine genome-wide time of replication (ToR) data, topological domains maps and detailed functional epigenetic annotations to study the correlations between replication timing and GC content at multiple scales. We find that the decrease in genomic GC content at large scale late replicating regions can be explained by mutation bias favoring A/T nucleotide, without selection or biased gene conversion. Quantification of the free dNTP pool during the cell cycle is consistent with a mechanism involving replication-coupled mutation spectrum that favors AT nucleotides at late S-phase. We suggest that mammalian GC content composition is shaped by independent forces, globally modulating mutation bias and locally selecting on functional element. Deconvoluting these forces and analyzing them on their native scales is important for proper characterization of complex genomic correlations. PMID:27085808

  16. Advantages of Single-Molecule Real-Time Sequencing in High-GC Content Genomes

    PubMed Central

    Shin, Seung Chul; Ahn, Do Hwan; Kim, Su Jin; Lee, Hyoungseok; Oh, Tae-Jin; Lee, Jong Eun; Park, Hyun

    2013-01-01

    Next-generation sequencing has become the most widely used sequencing technology in genomics research, but it has inherent drawbacks when dealing with high-GC content genomes. Recently, single-molecule real-time sequencing technology (SMRT) was introduced as a third-generation sequencing strategy to compensate for this drawback. Here, we report that the unbiased and longer read length of SMRT sequencing markedly improved genome assembly with high GC content via gap filling and repeat resolution. PMID:23894349

  17. Selection Maintains Low Genomic GC Content in Marine SAR11 Lineages.

    PubMed

    Luo, Haiwei; Thompson, Luke R; Stingl, Ulrich; Hughes, Austin L

    2015-10-01

    The genomic G+C content of ocean bacteria varies from below 30% to over 60%. This broad range of base composition is likely shaped by distinct mutational processes, recombination, effective population size, and selection driven by environmental factors. A number of studies have hypothesized that depletion of G/C in genomes of marine bacterioplankton cells is an adaptation to the nitrogen-poor pelagic oceans, but they failed to disentangle environmental factors from mutational biases and population history. Here, we reconstructed the evolutionary changes of bases at synonymous sites in genomes of two marine SAR11 populations and a freshwater counterpart with its evolutionary origin rooted in the marine lineage. Although they all have similar genome sizes, DNA repair gene repertoire, and base compositions, there is a stronger bias toward A/T changes, a reduced frequency of nitrogenous amino acids, and an exclusive occurrence of polyamine, opine, and taurine transport systems in the ocean populations, consistent with a greater nitrogen stress in surface oceans compared with freshwater lakes. Furthermore, the ratio of nonsynoymous to synonymous nucleotide diversity is not statistically distinguishable among these populations, suggesting that population history has a limited effect. Taken together, the ecological transition of SAR11 from ocean to freshwater habitats makes nitrogen more available to these organisms, and thus relaxation of purifying selection drove a genome-wide reduction in the frequency of G/C to A/T changes in the freshwater population. PMID:26116859

  18. DNA-guided establishment of nucleosome patterns within coding regions of a eukaryotic genome

    PubMed Central

    Beh, Leslie Y.; Müller, Manuel M.; Muir, Tom W.; Kaplan, Noam; Landweber, Laura F.

    2015-01-01

    A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream from transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes and affects codon usage and amino acid composition in genes. Given that the AT-rich Tetrahymena genome is intrinsically unfavorable for nucleosome formation, we propose that these “seed” nucleosomes—together with trans-acting factors—may facilitate the establishment of nucleosome arrays within genes in vivo, while minimizing changes to the underlying coding sequences. PMID:26330564

  19. GC-Content Evolution in Bacterial Genomes: The Biased Gene Conversion Hypothesis Expands

    PubMed Central

    Lassalle, Florent; Périan, Séverine; Bataillon, Thomas; Nesme, Xavier; Duret, Laurent; Daubin, Vincent

    2015-01-01

    The characterization of functional elements in genomes relies on the identification of the footprints of natural selection. In this quest, taking into account neutral evolutionary processes such as mutation and genetic drift is crucial because these forces can generate patterns that may obscure or mimic signatures of selection. In mammals, and probably in many eukaryotes, another such confounding factor called GC-Biased Gene Conversion (gBGC) has been documented. This mechanism generates patterns identical to what is expected under selection for higher GC-content, specifically in highly recombining genomic regions. Recent results have suggested that a mysterious selective force favouring higher GC-content exists in Bacteria but the possibility that it could be gBGC has been excluded. Here, we show that gBGC is probably at work in most if not all bacterial species. First we find a consistent positive relationship between the GC-content of a gene and evidence of intra-genic recombination throughout a broad spectrum of bacterial clades. Second, we show that the evolutionary force responsible for this pattern is acting independently from selection on codon usage, and could potentially interfere with selection in favor of optimal AU-ending codons. A comparison with data from human populations shows that the intensity of gBGC in Bacteria is comparable to what has been reported in mammals. We propose that gBGC is not restricted to sexual Eukaryotes but also widespread among Bacteria and could therefore be an ancestral feature of cellular organisms. We argue that if gBGC occurs in bacteria, it can account for previously unexplained observations, such as the apparent non-equilibrium of base substitution patterns and the heterogeneity of gene composition within bacterial genomes. Because gBGC produces patterns similar to positive selection, it is essential to take this process into account when studying the evolutionary forces at work in bacterial genomes. PMID:25659072

  20. Determination of GC content of Thermotoga maritima, Thermotoga neapolitana and Thermotoga thermarum strains: A GC dataset for higher level hierarchical classification.

    PubMed

    Rekadwad, Bhagwan N; Khobragade, Chandrahasya N

    2016-09-01

    A total of 16 strains of hyperthermophilic Thermotoga complete genome sequences viz. Thermotoga maritima (AE000512, CP004077, CP007013, CP011107, NC_000853, NC_021214, NC_023151, NZ_CP011107, CP011108, NZ_CP011108, CP010967 & NZ_CP010967), Thermotoga neapolitana (CP000916, & NC_011978) and Thermotoga thermarum (CP002351 & NC_015707) complete genome sequences were retrieved from NCBI BioSample database. ENDMEMO GC used for creation of data on GC content in Thermotoga sp. DNA sequences. Maximum GC content was observed in Thermotoga strains AE000512 & NC_000853 (69 %GC), followed by NZ_CP011108, CP011108, NZ_CP011107, NC_023151, NC_021214, CP011107 & CP004077 (68.5 %GC), followed by NZ_CP010967 & CP010967 (68.3 %GC), followed by CP000916, CP007013 & NC_011978 (68 %GC), followed by CP002351 & NC_015707 (67 %GC) strains. The use of GC dataset ratios helps in higher level hierarchical classification in Bacterial Systematics in addition to phenotypic and other genotypic characters. PMID:27331105

  1. Using Huffman coding method to visualize and analyze DNA sequences.

    PubMed

    Qi, Zhao-Hui; Li, Ling; Qi, Xiao-Qin

    2011-11-30

    On the basis of the Huffman coding method, we propose a new graphical representation of DNA sequence. The representation can avoid degeneracy and loss of information in the transfer of data from a DNA sequence to its graphical representation. Then a multicomponent vector from the representation is introduced to characterize quantitatively DNA sequences. The components of the vector are derived from the graphical representation of DNA primary sequence. The examination of similarities and dissimilarities among the complete coding sequences of β-globin gene of 11 species and six ND6 proteins shows the utility of the scheme. PMID:21953557

  2. DNA barcode goes two-dimensions: DNA QR code web server.

    PubMed

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113

  3. BioCode: Two biologically compatible Algorithms for embedding data in non-coding and coding regions of DNA

    PubMed Central

    2013-01-01

    Background In recent times, the application of deoxyribonucleic acid (DNA) has diversified with the emergence of fields such as DNA computing and DNA data embedding. DNA data embedding, also known as DNA watermarking or DNA steganography, aims to develop robust algorithms for encoding non-genetic information in DNA. Inherently DNA is a digital medium whereby the nucleotide bases act as digital symbols, a fact which underpins all bioinformatics techniques, and which also makes trivial information encoding using DNA straightforward. However, the situation is more complex in methods which aim at embedding information in the genomes of living organisms. DNA is susceptible to mutations, which act as a noisy channel from the point of view of information encoded using DNA. This means that the DNA data embedding field is closely related to digital communications. Moreover it is a particularly unique digital communications area, because important biological constraints must be observed by all methods. Many DNA data embedding algorithms have been presented to date, all of which operate in one of two regions: non-coding DNA (ncDNA) or protein-coding DNA (pcDNA). Results This paper proposes two novel DNA data embedding algorithms jointly called BioCode, which operate in ncDNA and pcDNA, respectively, and which comply fully with stricter biological restrictions. Existing methods comply with some elementary biological constraints, such as preserving protein translation in pcDNA. However there exist further biological restrictions which no DNA data embedding methods to date account for. Observing these constraints is key to increasing the biocompatibility and in turn, the robustness of information encoded in DNA. Conclusion The algorithms encode information in near optimal ways from a coding point of view, as we demonstrate by means of theoretical and empirical (in silico) analyses. Also, they are shown to encode information in a robust way, such that mutations have isolated

  4. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  5. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  6. DNA Barcoding through Quaternary LDPC Codes

    PubMed Central

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10−2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10−9 at the expense of a rate of read losses just in the order of 10−6. PMID:26492348

  7. Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes.

    PubMed

    Romiguier, Jonathan; Ranwez, Vincent; Douzery, Emmanuel J P; Galtier, Nicolas

    2010-08-01

    The origin, evolution, and functional relevance of genomic variations in GC content are a long-debated topic, especially in mammals. Most of the existing literature, however, has focused on a small number of model species and/or limited sequence data sets. We analyzed more than 1000 orthologous genes in 33 fully sequenced mammalian genomes, reconstructed their ancestral isochore organization in the maximum likelihood framework, and explored the evolution of third-codon position GC content in representatives of 16 orders and 27 families. We showed that the previously reported erosion of GC-rich isochores is not a general trend. Several species (e.g., shrew, microbat, tenrec, rabbit) have independently undergone a marked increase in GC content, with a widening gap between the GC-poorest and GC-richest classes of genes. The intensively studied apes and (especially) murids do not reflect the general placental pattern. We correlated GC-content evolution with species life-history traits and cytology. Significant effects of body mass and genome size were detected, with each being consistent with the GC-biased gene conversion model. PMID:20530252

  8. Genes Translocated into the Plastid Inverted Repeat Show Decelerated Substitution Rates and Elevated GC Content.

    PubMed

    Li, Fay-Wei; Kuo, Li-Yaung; Pryer, Kathleen M; Rothfels, Carl J

    2016-01-01

    Plant chloroplast genomes (plastomes) are characterized by an inverted repeat (IR) region and two larger single copy (SC) regions. Patterns of molecular evolution in the IR and SC regions differ, most notably by a reduced rate of nucleotide substitution in the IR compared to the SC region. In addition, the organization and structure of plastomes is fluid, and rearrangements through time have repeatedly shuffled genes into and out of the IR, providing recurrent natural experiments on how chloroplast genome structure can impact rates and patterns of molecular evolution. Here we examine four loci (psbA, ycf2, rps7, and rps12 exon 2-3) that were translocated from the SC into the IR during fern evolution. We use a model-based method, within a phylogenetic context, to test for substitution rate shifts. All four loci show a significant, 2- to 3-fold deceleration in their substitution rate following translocation into the IR, a phenomenon not observed in any other, nontranslocated plastid genes. Also, we show that after translocation, the GC content of the third codon position and of the noncoding regions is significantly increased, implying that gene conversion within the IR is GC-biased. Taken together, our results suggest that the IR region not only reduces substitution rates, but also impacts nucleotide composition. This finding highlights a potential vulnerability of correlating substitution rate heterogeneity with organismal life history traits without knowledge of the underlying genome structure. PMID:27401175

  9. Genes Translocated into the Plastid Inverted Repeat Show Decelerated Substitution Rates and Elevated GC Content

    PubMed Central

    Li, Fay-Wei; Kuo, Li-Yaung; Pryer, Kathleen M.; Rothfels, Carl J.

    2016-01-01

    Plant chloroplast genomes (plastomes) are characterized by an inverted repeat (IR) region and two larger single copy (SC) regions. Patterns of molecular evolution in the IR and SC regions differ, most notably by a reduced rate of nucleotide substitution in the IR compared to the SC region. In addition, the organization and structure of plastomes is fluid, and rearrangements through time have repeatedly shuffled genes into and out of the IR, providing recurrent natural experiments on how chloroplast genome structure can impact rates and patterns of molecular evolution. Here we examine four loci (psbA, ycf2, rps7, and rps12 exon 2–3) that were translocated from the SC into the IR during fern evolution. We use a model-based method, within a phylogenetic context, to test for substitution rate shifts. All four loci show a significant, 2- to 3-fold deceleration in their substitution rate following translocation into the IR, a phenomenon not observed in any other, nontranslocated plastid genes. Also, we show that after translocation, the GC content of the third codon position and of the noncoding regions is significantly increased, implying that gene conversion within the IR is GC-biased. Taken together, our results suggest that the IR region not only reduces substitution rates, but also impacts nucleotide composition. This finding highlights a potential vulnerability of correlating substitution rate heterogeneity with organismal life history traits without knowledge of the underlying genome structure. PMID:27401175

  10. On fuzzy semantic similarity measure for DNA coding.

    PubMed

    Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin

    2016-02-01

    A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. PMID:26773936

  11. Synthesis of Amplified DNA That Codes for Ribosomal RNA

    PubMed Central

    Crippa, Marco; Tocchini-Valentini, Glauco P.

    1971-01-01

    During the amplification stage in ovaries, the complete repetitive unit of the DNA that codes for ribosomal RNA in Xenopus appears to be transcribed. This large RNA transcript is found in a complex with DNA. Substitution experiments with 5-bromodeoxyuridine do not show any evidence that a complete amplified cistron is used as a template for further amplification. A derivative of rifampicin, 2′,5′-dimethyl-N(4′)benzyl-N(4′)[desmethyl] rifampicin, preferentially inhibits the DNA synthesis responsible for ribosomal gene amplification. These results are consistent with the hypothesis that RNA-dependent DNA synthesis is involved in gene amplification. PMID:5288254

  12. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  13. Parallelizing a DNA simulation code for the Cray MTA-2.

    PubMed

    Bokhari, Shahid H; Glaser, Matthew A; Jordan, Harry F; Lansac, Yves; Sauer, Jon R; Van Zeghbroeck, Bart

    2002-01-01

    The Cray MTA-2 (Multithreaded Architecture) is an unusual parallel supercomputer that promises ease of use and high performance. We describe our experience on the MTA-2 with a molecular dynamics code, SIMU-MD, that we are using to simulate the translocation of DNA through a nanopore in a silicon based ultrafast sequencer. Our sequencer is constructed using standard VLSI technology and consists of a nanopore surrounded by Field Effect Transistors (FETs). We propose to use the FETs to sense variations in charge as a DNA molecule translocates through the pore and thus differentiate between the four building block nucleotides of DNA. We were able to port SIMU-MD, a serial C code, to the MTA with only a modest effort and with good performance. Our porting process needed neither a parallelism support platform nor attention to the intimate details of parallel programming and interprocessor communication, as would have been the case with more conventional supercomputers. PMID:15838145

  14. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  15. Structural Code for DNA Recognition Revealed in Crystal Structures of Papillomavirus E2-DNA Targets

    NASA Astrophysics Data System (ADS)

    Rozenberg, Haim; Rabinovich, Dov; Frolow, Felix; Hegde, Rashmi S.; Shakked, Zippora

    1998-12-01

    Transcriptional regulation in papillomaviruses depends on sequence-specific binding of the regulatory protein E2 to several sites in the viral genome. Crystal structures of bovine papillomavirus E2 DNA targets reveal a conformational variant of B-DNA characterized by a roll-induced writhe and helical repeat of 10.5 bp per turn. A comparison between the free and the protein-bound DNA demonstrates that the intrinsic structure of the DNA regions contacted directly by the protein and the deformability of the DNA region that is not contacted by the protein are critical for sequence-specific protein/DNA recognition and hence for gene-regulatory signals in the viral system. We show that the selection of dinucleotide or longer segments with appropriate conformational characteristics, when positioned at correct intervals along the DNA helix, can constitute a structural code for DNA recognition by regulatory proteins. This structural code facilitates the formation of a complementary protein-DNA interface that can be further specified by hydrogen bonds and nonpolar interactions between the protein amino acids and the DNA bases.

  16. Non-coding RNAs in DNA damage response

    PubMed Central

    Liu, Yunhua; Lu, Xiongbin

    2012-01-01

    Genome-wide studies have revealed that human and other mammalian genomes are pervasively transcribed and produce thousands of regulatory non-protein-coding RNAs (ncRNAs), including miRNAs, siRNAs, piRNAs and long non-coding RNAs (lncRNAs). Emerging evidences suggest that these ncRNAs also play a pivotal role in genome integrity and stability via the regulation of DNA damage response (DDR). In this review, we discuss the recent finding on the interplay of ncRNAs with the canonical DDR signaling pathway, with a particular emphasis on miRNAs and lncRNAs. While the expression of ncRNAs is regulated in the DDR, the DDR is also subjected to regulation by those DNA damage-responsive ncRNAs. In addition, the roles of those Dicer- and Drosha-dependent small RNAs produced in the vicinity of double-strand breaks sites are also described. PMID:23226613

  17. Extra-coding RNAs regulate neuronal DNA methylation dynamics

    PubMed Central

    Savell, Katherine E.; Gallus, Nancy V. N.; Simon, Rhiana C.; Brown, Jordan A.; Revanna, Jasmin S.; Osborn, Mary Katherine; Song, Esther Y.; O'Malley, John J.; Stackhouse, Christian T.; Norvil, Allison; Gowher, Humaira; Sweatt, J. David; Day, Jeremy J.

    2016-01-01

    Epigenetic mechanisms such as DNA methylation are essential regulators of the function and information storage capacity of neurons. DNA methylation is highly dynamic in the developing and adult brain, and is actively regulated by neuronal activity and behavioural experiences. However, it is presently unclear how methylation status at individual genes is targeted for modification. Here, we report that extra-coding RNAs (ecRNAs) interact with DNA methyltransferases and regulate neuronal DNA methylation. Expression of ecRNA species is associated with gene promoter hypomethylation, is altered by neuronal activity, and is overrepresented at genes involved in neuronal function. Knockdown of the Fos ecRNA locus results in gene hypermethylation and mRNA silencing, and hippocampal expression of Fos ecRNA is required for long-term fear memory formation in rats. These results suggest that ecRNAs are fundamental regulators of DNA methylation patterns in neuronal systems, and reveal a promising avenue for therapeutic targeting in neuropsychiatric disease states. PMID:27384705

  18. Extra-coding RNAs regulate neuronal DNA methylation dynamics.

    PubMed

    Savell, Katherine E; Gallus, Nancy V N; Simon, Rhiana C; Brown, Jordan A; Revanna, Jasmin S; Osborn, Mary Katherine; Song, Esther Y; O'Malley, John J; Stackhouse, Christian T; Norvil, Allison; Gowher, Humaira; Sweatt, J David; Day, Jeremy J

    2016-01-01

    Epigenetic mechanisms such as DNA methylation are essential regulators of the function and information storage capacity of neurons. DNA methylation is highly dynamic in the developing and adult brain, and is actively regulated by neuronal activity and behavioural experiences. However, it is presently unclear how methylation status at individual genes is targeted for modification. Here, we report that extra-coding RNAs (ecRNAs) interact with DNA methyltransferases and regulate neuronal DNA methylation. Expression of ecRNA species is associated with gene promoter hypomethylation, is altered by neuronal activity, and is overrepresented at genes involved in neuronal function. Knockdown of the Fos ecRNA locus results in gene hypermethylation and mRNA silencing, and hippocampal expression of Fos ecRNA is required for long-term fear memory formation in rats. These results suggest that ecRNAs are fundamental regulators of DNA methylation patterns in neuronal systems, and reveal a promising avenue for therapeutic targeting in neuropsychiatric disease states. PMID:27384705

  19. DNA information: from digital code to analogue structure.

    PubMed

    Travers, A A; Muskhelishvili, G; Thompson, J M T

    2012-06-28

    The digital linear coding carried by the base pairs in the DNA double helix is now known to have an important component that acts by altering, along its length, the natural shape and stiffness of the molecule. In this way, one region of DNA is structurally distinguished from another, constituting an additional form of encoded information manifest in three-dimensional space. These shape and stiffness variations help in guiding and facilitating the DNA during its three-dimensional spatial interactions. Such interactions with itself allow communication between genes and enhanced wrapping and histone-octamer binding within the nucleosome core particle. Meanwhile, interactions with proteins can have a reduced entropic binding penalty owing to advantageous sequence-dependent bending anisotropy. Sequence periodicity within the DNA, giving a corresponding structural periodicity of shape and stiffness, also influences the supercoiling of the molecule, which, in turn, plays an important facilitating role. In effect, the super-helical density acts as an analogue regulatory mode in contrast to the more commonly acknowledged purely digital mode. Many of these ideas are still poorly understood, and represent a fundamental and outstanding biological question. This review gives an overview of very recent developments, and hopefully identifies promising future lines of enquiry. PMID:22615471

  20. Integrative RNA-seq and microarray data analysis reveals GC content and gene length biases in the psoriasis transcriptome

    PubMed Central

    Xing, Xianying; Voorhees, John J.; Elder, James T.; Johnston, Andrew; Gudjonsson, Johann E.

    2014-01-01

    Gene expression profiling of psoriasis has driven research advances and may soon provide the basis for clinical applications. For expression profiling studies, RNA-seq is now a competitive technology, but RNA-seq results may differ from those obtained by microarray. We therefore compared findings obtained by RNA-seq with those from eight microarray studies of psoriasis. RNA-seq and microarray datasets identified similar numbers of differentially expressed genes (DEGs), with certain genes uniquely identified by each technology. Correspondence between platforms and the balance of increased to decreased DEGs was influenced by mRNA abundance, GC content, and gene length. Weakly expressed genes, genes with low GC content, and long genes were all biased toward decreased expression in psoriasis lesions. The strength of these trends differed among array datasets, most likely due to variations in RNA quality. Gene length bias was by far the strongest trend and was evident in all datasets regardless of the expression profiling technology. The effect was due to differences between lesional and uninvolved skin with respect to the genome-wide correlation between gene length and gene expression, which was consistently more negative in psoriasis lesions. These findings demonstrate the complementary nature of RNA-seq and microarray technology and show that integrative analysis of both data types can provide a richer view of the transcriptome than strict reliance on a single method alone. Our results also highlight factors affecting correspondence between technologies, and we have established that gene length is a major determinant of differential expression in psoriasis lesions. PMID:24844236

  1. Insights into corn genes derived from large-scale cDNA sequencing.

    PubMed

    Alexandrov, Nickolai N; Brover, Vyacheslav V; Freidin, Stanislav; Troukhan, Maxim E; Tatarinova, Tatiana V; Zhang, Hongyu; Swaller, Timothy J; Lu, Yu-Ping; Bouck, John; Flavell, Richard B; Feldmann, Kenneth A

    2009-01-01

    We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701-EU977132 (FLI cDNA) and FK944382-FL482108 (EST). PMID:18937034

  2. Coding DNA repeated throughout intergenic regions of the Arabidopsis thaliana genome: Evolutionary footprints of RNA silencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Pyknons are non-random sequence patterns significantly repeated throughout non-coding genomic DNA that also appear at least once among genes. They are interesting because they portend an unforeseen connection between coding and non-coding DNA. Pyknons have only been discovered in the human genome,...

  3. Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Satish, B.; Srinivas, K.; Rao, P. Madhusudana; Manimaran, P.

    2015-10-01

    We propose a new approach combining the chaos game representation and the two dimensional multifractal detrended cross correlation analysis methods to examine multifractal behavior in power law cross correlation between any pair of nucleotide sequences of unequal lengths. In this work, we analyzed the characteristic behavior of coding and non-coding DNA sequences of eight prokaryotes. The results show the presence of strong multifractal nature between coding and non-coding sequences of all data sets. We found that this integrative approach helps us to consider complete DNA sequences for characterization, and further it may be useful for classification, clustering, identification of class affiliation of nucleotide sequences etc. with high precision.

  4. Differences in codon bias and GC content contribute to the balanced expression of TLR7 and TLR9.

    PubMed

    Newman, Zachary R; Young, Janet M; Ingolia, Nicholas T; Barton, Gregory M

    2016-03-01

    The innate immune system detects diverse microbial species with a limited repertoire of immune receptors that recognize nucleic acids. The cost of this immune surveillance strategy is the potential for inappropriate recognition of self-derived nucleic acids and subsequent autoimmune disease. The relative expression of two closely related receptors, Toll-like receptor (TLR) 7 and TLR9, is balanced to allow recognition of microbial nucleic acids while limiting recognition of self-derived nucleic acids. Situations that tilt this balance toward TLR7 promote inappropriate responses, including autoimmunity; therefore, tight control of expression is critical for proper homeostasis. Here we report that differences in codon bias limit TLR7 expression relative to TLR9. Codon optimization of Tlr7 increases protein levels as well as responses to ligands, but, unexpectedly, these changes only modestly affect translation. Instead, we find that much of the benefit attributed to codon optimization is actually the result of enhanced transcription. Our findings, together with other recent examples, challenge the dogma that codon optimization primarily increases translation. We propose that suboptimal codon bias, which correlates with low guanine-cytosine (GC) content, limits transcription of certain genes. This mechanism may establish low levels of proteins whose overexpression leads to particularly deleterious effects, such as TLR7. PMID:26903634

  5. The most frequent short sequences in non-coding DNA.

    PubMed

    Subirana, Juan A; Messeguer, Xavier

    2010-03-01

    The purpose of this work is to determine the most frequent short sequences in non-coding DNA. They may play a role in maintaining the structure and function of eukaryotic chromosomes. We present a simple method for the detection and analysis of such sequences in several genomes, including Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. We also study two chromosomes of man and mouse with a length similar to the whole genomes of the other species. We provide a list of the most common sequences of 9-14 bases in each genome. As expected, they are present in human Alu sequences. Our programs may also give a graph and a list of their position in the genome. Detection of clusters is also possible. In most cases, these sequences contain few alternating regions. Their intrinsic structure and their influence on nucleosome formation are not known. In particular, we have found new features of short sequences in C. elegans, which are distributed in heterogeneous clusters. They appear as punctuation marks in the chromosomes. Such clusters are not found in either A. thaliana or D. melanogaster. We discuss the possibility that they play a role in centromere function and homolog recognition in meiosis. PMID:19966278

  6. Improved Lower Bounds of DNA Tags Based on a Modified Genetic Algorithm

    PubMed Central

    Wang, Bin; Wei, Xiaopeng; Dong, Jing; Zhang, Qiang

    2015-01-01

    The well-known massively parallel sequencing method is efficient and it can obtain sequence data from multiple individual samples. In order to ensure that sequencing, replication, and oligonucleotide synthesis errors do not result in tags (or barcodes) that are unrecoverable or confused, the tag sequences should be abundant and sufficiently different. Recently, many design methods have been proposed for correcting errors in data using error-correcting codes. The existing tag sets contain small tag sequences, so we used a modified genetic algorithm to improve the lower bound of the tag sets in this study. Compared with previous research, our algorithm is effective for designing sets of DNA tags. Moreover, the GC content determined by existing methods includes an imprecise range. Thus, we improved the GC content determination method to obtain tag sets that control the GC content in a more precise range. Finally, previous studies have only considered perfect self-complementarity. Thus, we considered the crossover between different tags and introduced an improved constraint into the design of tag sets. PMID:25693135

  7. What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

    NASA Astrophysics Data System (ADS)

    Liebovitch, Larry

    1998-03-01

    The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find

  8. Stochastic model of homogeneous coding and latent periodicity in DNA sequences.

    PubMed

    Chaley, Maria; Kutyrkin, Vladimir

    2016-02-01

    The concept of latent triplet periodicity in coding DNA sequences which has been earlier extensively discussed is confirmed in the result of analysis of a number of eukaryotic genomes, where latent periodicity of a new type, called profile periodicity, is recognized in the CDSs. Original model of Stochastic Homogeneous Organization of Coding (SHOC-model) in textual string is proposed. This model explains the existence of latent profile periodicity and regularity in DNA sequences. PMID:26656186

  9. ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions

    PubMed Central

    2011-01-01

    ZINBA (Zero-Inflated Negative Binomial Algorithm) identifies genomic regions enriched in a variety of ChIP-seq and related next-generation sequencing experiments (DNA-seq), calling both broad and narrow modes of enrichment across a range of signal-to-noise ratios. ZINBA models and accounts for factors that co-vary with background or experimental signal, such as G/C content, and identifies enrichment in genomes with complex local copy number variations. ZINBA provides a single unified framework for analyzing DNA-seq experiments in challenging genomic contexts. Software website: http://code.google.com/p/zinba/ PMID:21787385

  10. Sequences encoding identical peptides for the analysis and manipulation of coding DNA

    PubMed Central

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression. PMID:23861567

  11. Sequences encoding identical peptides for the analysis and manipulation of coding DNA.

    PubMed

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression. PMID:23861567

  12. Synonymous codon bias and functional constraint on GC3-related DNA backbone dynamics in the prokaryotic nucleoid

    PubMed Central

    Babbitt, Gregory A.; Alawad, Mohammed A.; Schulze, Katharina V.; Hudson, André O.

    2014-01-01

    While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3-base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (≈GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an ‘accessory’ during an early expansion of a primordial genetic code, allowing for multiplexed protein coding and structural dynamic information within the same molecular context. PMID:25200075

  13. An improved Huffman coding method for archiving text, images, and music characters in DNA.

    PubMed

    Ailenberg, Menachem; Rotstein, Ori

    2009-09-01

    An improved Huffman coding method for information storage in DNA is described. The method entails the utilization of modified unambiguous base assignment that enables efficient coding of characters. A plasmid-based library with efficient and reliable information retrieval and assembly with uniquely designed primers is described. We illustrate our approach by synthesis of DNA that encodes text, images, and music, which could easily be retrieved by DNA sequencing using the specific primers. The method is simple and lends itself to automated information retrieval. PMID:19852760

  14. Heterogeneous base distribution in mitochondrial DNA of Neurospora crassa.

    PubMed Central

    Terpstra, P; Holtrop, M; Kroon, A

    1977-01-01

    The mitochondrial DNA of Neurospora crassa has a heterogeneous intramolecular base distribution. A contiguous piece, representing at least 30% of the total genome, has a G+C content that is 6% lower than the overall G+C content of the DNA. The genes for both ribosomal RNAs are contained in the remaining, relatively G+C rich, part of the genome. PMID:141040

  15. Is there an error correcting code in the base sequence in DNA?

    PubMed Central

    Liebovitch, L S; Tao, Y; Todorov, A T; Levine, L

    1996-01-01

    Modern methods of encoding information into digital form include error check digits that are functions of the other information digits. When digital information is transmitted, the values of the error check digits can be computed from the information digits to determine whether the information has been received accurately. These error correcting codes make it possible to detect and correct common errors in transmission. The sequence of bases in DNA is also a digital code consisting of four symbols: A, C, G, and T. Does DNA also contain an error correcting code? Such a code would allow repair enzymes to protect the fidelity of nonreplicating DNA and increase the accuracy of replication. If a linear block error correcting code is present in DNA then some bases would be a linear function of the other bases in each set of bases. We developed an efficient procedure to determine whether such an error correcting code is present in the base sequence. We illustrate the use of this procedure by using it to analyze the lac operon and the gene for cytochrome c. These genes do not appear to contain such a simple error correcting code. PMID:8874027

  16. Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.

    PubMed

    Suyama, Mikita; Lathe, Warren C; Bork, Peer

    2005-10-10

    We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions. PMID:16182294

  17. TOWARDS A PROBABILISTIC RECOGNITION CODE FOR PROTEIN-DNA INTERACTIONS

    SciTech Connect

    P. BENOS; ET AL

    2000-09-01

    We are investigating the rules that govern protein-DNA interactions, using a statistical mechanics based formalism that is related to the Boltzmann Machine of the neural net literature. Our approach is data-driven, in which probabilistic algorithms are used to model protein-DNA interactions, given SELEX and phage data as input. Under the ''one-to-one'' model for interactions (i.e. one amino acid contacts one base), we can successfully identify the wild-type binding sites of EGR and MIG protein families. The predictions using our method are the same or better than that of methods existing in the literature, however our methodology offers the potential to capitalize in quantitative detail on more data as it becomes available.

  18. Role of GC-biased mutation pressure on synonymous codon choice in Micrococcus luteus, a bacterium with a high genomic GC-content.

    PubMed Central

    Ohama, T; Muto, A; Osawa, S

    1990-01-01

    The GC (G + C, or G or C)-contents of codon silent positions in all two-codon sets and three codons AUY/A (IIe), and in most of the family boxes of Micrococcus luteus (genomic GC-content: 74%) are 95% to 100% in both the highly and weakly expressed genes. In some family boxes, there is a decrease in NNC codons and an increase in NNG codons from the highly expressed to weakly expressed genes without apparent involvement of NNU and NNA codons. From these observations, we conclude that the selective use of synonymous codons in M. luteus may be largely determined by GC-biased mutation pressure and that in the highly expressed genes tRNAs would act as a weak selection pressure in some family boxes. Available data suggest that the effect of selection pressure by tRNAs on the synonymous codon choice becomes more apparent in the highly expressed genes in eubacteria with intermediate GC-contents such as Escherichia coli and Bacillus subtilis, and that the U/C ratio of the codon third positions in NNU/C-type two-codon sets in the weakly expressed genes would represent the approximate magnitude of directional mutation pressure throughout eubacteria. PMID:2326195

  19. Study of E. coli Hfq’s RNA annealing acceleration and duplex destabilization activities using substrates with different GC-contents

    PubMed Central

    Doetsch, Martina; Stampfl, Sabine; Fürtig, Boris; Beich-Frandsen, Mads; Saxena, Krishna; Lybecker, Meghan; Schroeder, Renée

    2013-01-01

    Folding of RNA molecules into their functional three-dimensional structures is often supported by RNA chaperones, some of which can catalyse the two elementary reactions helix disruption and helix formation. Hfq is one such RNA chaperone, but its strand displacement activity is controversial. Whereas some groups found Hfq to destabilize secondary structures, others did not observe such an activity with their RNA substrates. We studied Hfq’s activities using a set of short RNAs of different thermodynamic stabilities (GC-contents from 4.8% to 61.9%), but constant length. We show that Hfq’s strand displacement as well as its annealing activity are strongly dependent on the substrate’s GC-content. However, this is due to Hfq’s preferred binding of AU-rich sequences and not to the substrate’s thermodynamic stability. Importantly, Hfq catalyses both annealing and strand displacement with comparable rates for different substrates, hinting at RNA strand diffusion and annealing nucleation being rate-limiting for both reactions. Hfq’s strand displacement activity is a result of the thermodynamic destabilization of the RNA through preferred single-strand binding whereas annealing acceleration is independent from Hfq’s thermodynamic influence. Therefore, the two apparently disparate activities annealing acceleration and duplex destabilization are not in energetic conflict with each other. PMID:23104381

  20. Functional characterization and inhibition of the type II DNA topoisomerase coded by African swine fever virus.

    PubMed

    Coelho, João; Ferreira, Fernando; Martins, Carlos; Leitão, Alexandre

    2016-06-01

    DNA topoisomerases are essential for DNA metabolism and while their role is well studied in prokaryotes and eukaryotes, it is less known for virally-encoded topoisomerases. African swine fever virus (ASFV) is a nucleo-cytoplasmic large DNA virus that infects Ornithodoros ticks and all members of the family Suidae, representing a global threat for pig husbandry with no effective vaccine nor treatment. It was recently demonstrated that ASFV codes for a type II topoisomerase, highlighting a possible target for control of the virus. In this work, the ASFV DNA topoisomerase II was expressed in Saccharomyces cerevisiae and found to efficiently decatenate kDNA and to processively relax supercoiled DNA. Optimal conditions for its activity were determined and its sensitivity to a panel of topoisomerase poisons and inhibitors was evaluated. Overall, our results provide new knowledge on viral topoisomerases and on ASFV, as well as a possible target for the control of this virus. PMID:27060564

  1. A novel Lie algebra of the genetic code over the Galois field of four DNA bases.

    PubMed

    Sánchez, Robersy; Grau, Ricardo; Morgado, Eberto

    2006-07-01

    Starting from the four DNA bases order in the Boolean lattice, a novel Lie Algebra of the genetic code is proposed. Here, the main partitions of the genetic code table were obtained as equivalent classes of quotient spaces of the genetic code vector space over the Galois field of the four DNA bases. The new algebraic structure shows strong connections among algebraic relationships, codon assignments and physicochemical properties of amino acids. Moreover, a distance defined between codons expresses a physicochemical meaning. It was also noticed that the distance between wild type and mutant codons tends to be small in mutational variants of four genes: human phenylalanine hydroxylase, human beta-globin, HIV-1 protease and HIV-1 reverse transcriptase. These results strongly suggest that deterministic rules in genetic code origin must be involved. PMID:16780898

  2. Peculiar symmetry of DNA sequences and evidence suggesting its evolutionary origin in a primeval genetic code

    NASA Astrophysics Data System (ADS)

    Jolivet, R.; Rothen, F.

    2001-08-01

    Statistical analysis of the distribution of codons in DNA coding sequences of bacteria or archaea suggests that, at some stage of the prebiotic world, the most successful RNA replicating sequences afforded some tendency toward a weak form of palindromic symmetry, namely complementary symmetry. As a consequence, as soon as the machinery allowing translation into proteins was beginning to settle, we assume that primeval versions of the genetic code essentially consisted of pairs of sense-antisense codons. Present-day DNA sequences display footprints of this early symmetry, provided that statistics are made over coding sequences issued from groups of organisms and not only from the genome of an individual species. These fossil traces are proven to be significant from the statistical point of view. They shed some light onto the possible evolution of the genetic code and set some constraints on the way it had to follow.

  3. Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy

    PubMed Central

    Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.

    2015-01-01

    Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID

  4. Non-Coding RNA: Sequence-Specific Guide for Chromatin Modification and DNA Damage Signaling

    PubMed Central

    Francia, Sofia

    2015-01-01

    Chromatin conformation shapes the environment in which our genome is transcribed into RNA. Transcription is a source of DNA damage, thus it often occurs concomitantly to DNA damage signaling. Growing amounts of evidence suggest that different types of RNAs can, independently from their protein-coding properties, directly affect chromatin conformation, transcription and splicing, as well as promote the activation of the DNA damage response (DDR) and DNA repair. Therefore, transcription paradoxically functions to both threaten and safeguard genome integrity. On the other hand, DNA damage signaling is known to modulate chromatin to suppress transcription of the surrounding genetic unit. It is thus intriguing to understand how transcription can modulate DDR signaling while, in turn, DDR signaling represses transcription of chromatin around the DNA lesion. An unexpected player in this field is the RNA interference (RNAi) machinery, which play roles in transcription, splicing and chromatin modulation in several organisms. Non-coding RNAs (ncRNAs) and several protein factors involved in the RNAi pathway are well known master regulators of chromatin while only recent reports show their involvement in DDR. Here, we discuss the experimental evidence supporting the idea that ncRNAs act at the genomic loci from which they are transcribed to modulate chromatin, DDR signaling and DNA repair. PMID:26617633

  5. Diversity and Recombination of Dispersed Ribosomal DNA and Protein Coding Genes in Microsporidia

    PubMed Central

    Ironside, Joseph Edward

    2013-01-01

    Microsporidian strains are usually classified on the basis of their ribosomal DNA (rDNA) sequences. Although rDNA occurs as multiple copies, in most non-microsporidian species copies within a genome occur as tandem arrays and are homogenised by concerted evolution. In contrast, microsporidian rDNA units are dispersed throughout the genome in some species, and on this basis are predicted to undergo reduced concerted evolution. Furthermore many microsporidian species appear to be asexual and should therefore exhibit reduced genetic diversity due to a lack of recombination. Here, DNA sequences are compared between microsporidia with different life cycles in order to determine the effects of concerted evolution and sexual reproduction upon the diversity of rDNA and protein coding genes. Comparisons of cloned rDNA sequences between microsporidia of the genus Nosema with different life cycles provide evidence of intragenomic variability coupled with strong purifying selection. This suggests a birth and death process of evolution. However, some concerted evolution is suggested by clustering of rDNA sequences within species. Variability of protein-coding sequences indicates that considerable intergenomic variation also occurs between microsporidian cells within a single host. Patterns of variation in microsporidian DNA sequences indicate that additional diversity is generated by intragenomic and/or intergenomic recombination between sequence variants. The discovery of intragenomic variability coupled with strong purifying selection in microsporidian rRNA sequences supports the hypothesis that concerted evolution is reduced when copies of a gene are dispersed rather than repeated tandemly. The presence of intragenomic variability also renders the use of rDNA sequences for barcoding microsporidia questionable. Evidence of recombination in the single-copy genes of putatively asexual microsporidia suggests that these species may undergo cryptic sexual reproduction, a

  6. DNA methylation patterns of protein-coding genes and long non-coding RNAs in males with schizophrenia

    PubMed Central

    LIAO, QI; WANG, YUNLIANG; CHENG, JIA; DAI, DONGJUN; ZHOU, XINGYU; ZHANG, YUZHENG; LI, JINFENG; YIN, HONGLEI; GAO, SHUGUI; DUAN, SHIWEI

    2015-01-01

    Schizophrenia (SCZ) is one of the most complex mental illnesses affecting ~1% of the population worldwide. SCZ pathogenesis is considered to be a result of genetic as well as epigenetic alterations. Previous studies have aimed to identify the causative genes of SCZ. However, DNA methylation of long non-coding RNAs (lncRNAs) involved in SCZ has not been fully elucidated. In the present study, a comprehensive genome-wide analysis of DNA methylation was conducted using samples from two male patients with paranoid and undifferentiated SCZ, respectively. Methyl-CpG binding domain protein-enriched genome sequencing was used. In the two patients with paranoid and undifferentiated SCZ, 1,397 and 1,437 peaks were identified, respectively. Bioinformatic analysis demonstrated that peaks were enriched in protein-coding genes, which exhibited nervous system and brain functions. A number of these peaks in gene promoter regions may affect gene expression and, therefore, influence SCZ-associated pathways. Furthermore, 7 and 20 lncRNAs, respectively, in the Refseq database were hypermethylated. According to the lncRNA dataset in the NONCODE database, ~30% of intergenic peaks overlapped with novel lncRNA loci. The results of the present study demonstrated that aberrant hypermethylation of lncRNA genes may be an important epigenetic factor associated with SCZ. However, further studies using larger sample sizes are required. PMID:26503909

  7. A Molecular Bar-Coded DNA Repair Resource for Pooled Toxicogenomic Screens

    PubMed Central

    Rooney, John P.; Patil, Ashish; Zappala, Maria R.; Conklin, Douglas S.; Cunningham, Richard P.; Begley, Thomas J.

    2008-01-01

    DNA damage from exogenous and endogenous sources can promote mutations and cell death. Fortunately, cells contain DNA repair and damage signalling pathways to reduce the mutagenic and cytotoxic effects of DNA damage. The identification of specific DNA repair proteins and the coordination of DNA repair pathways after damage has been a central theme to the field of Genetic Toxicology and we have developed a tool for use in this area. We have produced 99 molecular bar-coded Escherichia coli gene-deletion mutants specific to DNA repair and damage signalling pathways, and each bar-coded mutant can be tracked in pooled format using bar-code specific microarrays. Our design adapted bar-codes developed for the Saccharomyces cerevisiae Gene Deletion Project, which allowed us to utilize an available microarray product for pooled gene-exposure studies. Microarray-based screens were used for en masse identification of individual mutants sensitive to methyl methanesulfonate (MMS). As expected, gene deletion mutants specific to direct, base excision, and recombinational DNA repair pathways were identified as MMS-sensitive in our pooled assay, thus validating our resource. We have demonstrated that molecular bar-codes designed for S. cerevisiae are transferable to E. coli, and that they can be used with pre-existing microarrays to perform competitive growth experiments. Further, when comparing microarray to traditional plate-based screens both over-lapping and distinct results were obtained, which is a novel technical finding, with discrepancies between the two approaches explained by differences in output measurements (DNA content verse cell mass). The microarray-based classification of Δtag and ΔdinG cells as depleted after MMS exposure, contrary to plate-based methods, led to the discovery that Δtag and ΔdinG cells show a filamentation phenotype after MMS exposure, thus accounting for the discrepancy. A novel biological finding is the observation that while ΔdinG cells

  8. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  9. Differentiating the Protein Coding and Noncoding RNA Segments of DNA Using Shannon Entropy

    NASA Astrophysics Data System (ADS)

    Mazaheri, P.; Shirazi, A. H.; Saeedi, N.; Reza Jafari, G.; Sahimi, Muhammad

    The complexity of DNA sequences is evaluated in order to differentiate between protein-coding and noncoding RNA segments. The method is based on computing the Shannon entropy of the sequences. By comparing the entropy of the original sequence with that of its shuffled one, we identify the source of the difference between the two segments and their relative contributions to the sequence. To demonstrate the method, the DNA sequences of the bacterium Clostridium difficile 630 (G + C = 29.1%) and Bdellovibrio bacteriovorus (G + C = 50.6%) are analyzed, which are representatives of bacteria with unbalanced and balanced nucleotide content, respectively. It is shown that in both bacteria, regardless of nucleotide content, ΔrS — the relative difference of the two entropies — is significantly greater in protein-coding regions, when compared with noncoding RNA segments.

  10. Non-coding chloroplast DNA for plant molecular systematics at the infrageneric level.

    PubMed

    Böhle, U R; Hilger, H; Cerff, R; Martin, W F

    1994-01-01

    With primers constructed against highly conserved regions of tRNA genes (trnTUGU, trnLUAA and trnFGAA) in chloroplast DNA, we have amplified two different non-coding spacers and one intron from four species within the genus Echium L. (Boraginaceae) and from two confamilial outgroups. The trnTUGU-trnLUAA intergenic spacer contains a greater number of polymorphic sites than the trnLUAA intron or the trnLUAA-trnFGAA intergenic spacer. We analyzed a total of 11 kb of sequence data from this non-coding DNA. Total nucleotide divergence between Echium species is on the order of 1% for these regions, all of which possess infrageneric length polymorphisms. The latter two regions contain indels which occur only in the 14 Macaronesian Island endemic species of Echium studied and suggest that these may form a monophyletic group. PMID:7994117

  11. Junk DNA and the long non-coding RNA twist in cancer genetics

    PubMed Central

    Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A

    2015-01-01

    The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839

  12. Junk DNA and the long non-coding RNA twist in cancer genetics.

    PubMed

    Ling, H; Vincent, K; Pichler, M; Fodde, R; Berindan-Neagoe, I; Slack, F J; Calin, G A

    2015-09-24

    The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single-nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual's susceptibility to cancer. PMID:25619839

  13. A molecular code dictates sequence-specific DNA recognition by homeodomains.

    PubMed Central

    Damante, G; Pellizzari, L; Esposito, G; Fogolari, F; Viglino, P; Fabbro, D; Tell, G; Formisano, S; Di Lauro, R

    1996-01-01

    Most homeodomains bind to DNA sequences containing the motif 5'-TAAT-3'. The homeodomain of thyroid transcription factor 1 (TTF-1HD) binds to sequences containing a 5'-CAAG-3' core motif, delineating a new mechanism for differential DNA recognition by homeodomains. We investigated the molecular basis of the DNA binding specificity of TTF-1HD by both structural and functional approaches. As already suggested by the three-dimensional structure of TTF-1HD, the DNA binding specificities of the TTF-1, Antennapedia and Engrailed homeodomains, either wild-type or mutants, indicated that the amino acid residue in position 54 is involved in the recognition of the nucleotide at the 3' end of the core motif 5'-NAAN-3'. The nucleotide at the 5' position of this core sequence is recognized by the amino acids located in position 6, 7 and 8 of the TTF-1 and Antennapedia homeodomains. These data, together with previous suggestions on the role of amino acids in position 50, indicate that the DNA binding specificity of homeodomains can be determined by a combinatorial molecular code. We also show that some specific combinations of the key amino acid residues involved in DNA recognition do not follow a simple, additive rule. Images PMID:8890172

  14. HyDEn: a hybrid steganocryptographic approach for data encryption using randomized error-correcting DNA codes.

    PubMed

    Tulpan, Dan; Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

    2013-01-01

    This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392

  15. HyDEn: A Hybrid Steganocryptographic Approach for Data Encryption Using Randomized Error-Correcting DNA Codes

    PubMed Central

    Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

    2013-01-01

    This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392

  16. Estimation of correlations between copy-number variants in non-coding DNA.

    PubMed

    Stamoulis, Catherine

    2011-01-01

    Allelic DNA aberrations across our genome have been associated with normal human genetic heterogeneity as well as with a number of diseases and disorders. When copy-number variations (CNVs) occur in gene-coding regions, known relationships between genes may help us understand correlations between CNVs. However, a large number of these aberrations occur in non-coding, extragenic regions and their correlations may be characterized only quantitatively, e.g., probabilistically, but not functionally. Using a signal processing approach to CNV detection, we identified distributed CNVs in short, non-coding regions across chromosomes and investigated their potential correlations. We estimated predominantly local correlations between CNVs within the same chromosome, and a small number of apparently random long-distance correlations. PMID:22255599

  17. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  18. DNA strand breaks induced by electrons simulated with Nanodosimetry Monte Carlo Simulation Code: NASIC.

    PubMed

    Li, Junli; Li, Chunyan; Qiu, Rui; Yan, Congchong; Xie, Wenzhang; Wu, Zhen; Zeng, Zhi; Tung, Chuanjong

    2015-09-01

    The method of Monte Carlo simulation is a powerful tool to investigate the details of radiation biological damage at the molecular level. In this paper, a Monte Carlo code called NASIC (Nanodosimetry Monte Carlo Simulation Code) was developed. It includes physical module, pre-chemical module, chemical module, geometric module and DNA damage module. The physical module can simulate physical tracks of low-energy electrons in the liquid water event-by-event. More than one set of inelastic cross sections were calculated by applying the dielectric function method of Emfietzoglou's optical-data treatments, with different optical data sets and dispersion models. In the pre-chemical module, the ionised and excited water molecules undergo dissociation processes. In the chemical module, the produced radiolytic chemical species diffuse and react. In the geometric module, an atomic model of 46 chromatin fibres in a spherical nucleus of human lymphocyte was established. In the DNA damage module, the direct damages induced by the energy depositions of the electrons and the indirect damages induced by the radiolytic chemical species were calculated. The parameters should be adjusted to make the simulation results be agreed with the experimental results. In this paper, the influence study of the inelastic cross sections and vibrational excitation reaction on the parameters and the DNA strand break yields were studied. Further work of NASIC is underway. PMID:25883312

  19. DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutations.

    PubMed

    Hoffmann, Christian; Minkah, Nana; Leipzig, Jeremy; Wang, Gary; Arens, Max Q; Tebas, Pablo; Bushman, Frederic D

    2007-01-01

    Treatment of HIV-infected individuals with antiretroviral agents selects for drug-resistant mutants, resulting in frequent treatment failures. Although the major antiretroviral resistance mutations are routinely characterized by DNA sequencing, treatment failures are still common, probably in part because undetected rare resistance mutations facilitate viral escape. Here we combined DNA bar coding and massively parallel pyrosequencing to quantify rare drug resistance mutations. Using DNA bar coding, we were able to analyze seven viral populations in parallel, overall characterizing 118 093 sequence reads of average length 103 bp. Analysis of a control HIV mixture showed that resistance mutations present as 5% of the population could be readily detected without false positive calls. In three samples of multidrug-resistant HIV populations from patients, all the drug-resistant mutations called by conventional analysis were identified, as well as four additional low abundance drug resistance mutations, some of which would be expected to influence the response to antiretroviral therapy. Methods for sensitive characterization of HIV resistance alleles have been reported, but only the pyrosequencing method allows all the positions at risk for drug resistance mutations to be interrogated deeply for many HIV populations in a single experiment. PMID:17576693

  20. Comparative Sequence Analysis of the Non-Protein-Coding Mitochondrial DNA of Inbred Rat Strains

    PubMed Central

    Abhyankar, Avinash; Park, Hee-Bok; Tonolo, Giancarlo; Luthman, Holger

    2009-01-01

    The proper function of mammalian mitochondria necessitates a coordinated expression of both nuclear and mitochondrial genes, most likely due to the co-evolution of nuclear and mitochondrial genomes. The non-protein coding regions of mitochondrial DNA (mtDNA) including the D-loop, tRNA and rRNA genes form a major component of this regulated expression unit. Here we present comparative analyses of the non-protein-coding regions from 27 Rattus norvegicus mtDNA sequences. There were two variable positions in 12S rRNA, 20 in 16S rRNA, eight within the tRNA genes and 13 in the D-loop. Only one of the three neutrality tests used demonstrated statistically significant evidence for selection in 16S rRNA and tRNA-Cys. Based on our analyses of conserved sequences, we propose that some of the variable nucleotide positions identified in 16S rRNA and tRNA-Cys, and the D-loop might be important for mitochondrial function and its regulation. PMID:19997590

  1. DANIO-CODE: Toward an Encyclopedia of DNA Elements in Zebrafish

    PubMed Central

    2016-01-01

    Abstract The zebrafish has emerged as a model organism for genomics studies. The symposium “Toward an encyclopedia of DNA elements in zebrafish” held in London in December 2014, was coorganized by Ferenc Müller and Fiona Wardle. This meeting is a follow-up of a similar previous workshop held 2 years earlier and represents a push toward the formalization of a community effort to annotate functional elements in the zebrafish genome. The meeting brought together zebrafish researchers, bioinformaticians, as well as members of established consortia, to exchange scientific findings and experience, as well as to discuss the initial steps toward the formation of a DANIO-CODE consortium. In this study, we provide the latest updates on the current progress of the consortium's efforts, opening up a broad invitation to researchers to join in and contribute to DANIO-CODE. PMID:26671609

  2. Resetting the histone code at CDKN2A in HNSCC by inhibition of DNA methylation.

    PubMed

    Coombes, Madelene M; Briggs, Katrina L; Bone, James R; Clayman, Gary L; El-Naggar, Adel K; Dent, Sharon Y R

    2003-12-01

    Head and neck squamous cell carcinoma (HNSCC) is the fifth most frequent cancer in the US. Several genetic and epigenetic alterations are associated with HNSCC tumorigenesis, including inactivation of CDKN2A, which encodes the p16 tumor suppressor, in cell lines and primary tumors by DNA methylation. Reactivation of tumor suppressor genes by DNA-demethylating agents and histone deacetylase (HDAC) inhibitors shows therapeutic promise for other cancers. Therefore, we investigated the ability of these agents to reactivate p16 in Tu159 HNSCC cells. Treatment of cells with 5-aza-2'deoxycytidine (5-aza-dC) increases CDKN2A expression and slightly increases histone H3 acetylation at this gene. No reactivation of CDKN2A is observed upon treatment with the HDAC inhibitor trichostatin A (TSA), but synergistic reactivation of CDKN2A is observed upon sequential treatment of Tu159 cells with both 5-aza-dC and TSA. Silencing of CDKN2A in Tu159 cells is correlated with increased methylation of histone H3 at lysine 9 and decreased methylation at lysine 4 relative to the upstream p15 gene promoter. Interestingly, global levels of H3-K9 methylation are decreased upon treatment with 5-aza-dC. Together these data indicate that DNA methylation is a dominant epigenetic mark for silencing of CDKN2A in Tu159 tumor cells. Moreover, changes in DNA methylation can reset the histone code by impacting multiple H3 modifications. PMID:14654786

  3. The relationship between non-protein-coding DNA and eukaryotic complexity.

    PubMed

    Taft, Ryan J; Pheasant, Michael; Mattick, John S

    2007-03-01

    There are two intriguing paradoxes in molecular biology--the inconsistent relationship between organismal complexity and (1) cellular DNA content and (2) the number of protein-coding genes--referred to as the C-value and G-value paradoxes, respectively. The C-value paradox may be largely explained by varying ploidy. The G-value paradox is more problematic, as the extent of protein coding sequence remains relatively static over a wide range of developmental complexity. We show by analysis of sequenced genomes that the relative amount of non-protein-coding sequence increases consistently with complexity. We also show that the distribution of introns in complex organisms is non-random. Genes composed of large amounts of intronic sequence are significantly overrepresented amongst genes that are highly expressed in the nervous system, and amongst genes downregulated in embryonic stem cells and cancers. We suggest that the informational paradox in complex organisms may be explained by the expansion of cis-acting regulatory elements and genes specifying trans-acting non-protein-coding RNAs. PMID:17295292

  4. SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability

    PubMed Central

    Iakovishina, Daria; Janoueix-Lerosey, Isabelle; Barillot, Emmanuel; Regnier, Mireille; Boeva, Valentina

    2016-01-01

    Motivation: Whole genome sequencing of paired-end reads can be applied to characterize the landscape of large somatic rearrangements of cancer genomes. Several methods for detecting structural variants with whole genome sequencing data have been developed. So far, none of these methods has combined information about abnormally mapped read pairs connecting rearranged regions and associated global copy number changes automatically inferred from the same sequencing data file. Our aim was to create a computational method that could use both types of information, i.e. normal and abnormal reads, and demonstrate that by doing so we can highly improve both sensitivity and specificity rates of structural variant prediction. Results: We developed a computational method, SV-Bay, to detect structural variants from whole genome sequencing mate-pair or paired-end data using a probabilistic Bayesian approach. This approach takes into account depth of coverage by normal reads and abnormalities in read pair mappings. To estimate the model likelihood, SV-Bay considers GC-content and read mappability of the genome, thus making important corrections to the expected read count. For the detection of somatic variants, SV-Bay makes use of a matched normal sample when it is available. We validated SV-Bay on simulated datasets and an experimental mate-pair dataset for the CLB-GA neuroblastoma cell line. The comparison of SV-Bay with several other methods for structural variant detection demonstrated that SV-Bay has better prediction accuracy both in terms of sensitivity and false-positive detection rate. Availability and implementation: https://github.com/InstitutCurie/SV-Bay Contact: valentina.boeva@inserm.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26740523

  5. Analysis of phylogeny and codon usage bias and relationship of GC content, amino acid composition with expression of the structural nif genes.

    PubMed

    Mondal, Sunil Kanti; Kundu, Sudip; Das, Rabindranath; Roy, Sujit

    2016-08-01

    Bacteria and archaea have evolved with the ability to fix atmospheric dinitrogen in the form of ammonia, catalyzed by the nitrogenase enzyme complex which comprises three structural genes nifK, nifD and nifH. The nifK and nifD encodes for the beta and alpha subunits, respectively, of component 1, while nifH encodes for component 2 of nitrogenase. Phylogeny based on nifDHK have indicated that Cyanobacteria is closer to Proteobacteria alpha and gamma but not supported by the tree based on 16SrRNA. The evolutionary ancestor for the different trees was also different. The GC1 and GC2% analysis showed more consistency than GC3% which appeared to below for Firmicutes, Cyanobacteria and Euarchaeota while highest in Proteobacteria beta and clearly showed the proportional effect on the codon usage with a few exceptions. Few genes from Firmicutes, Euryarchaeota, Proteobacteria alpha and delta were found under mutational pressure. These nif genes with low and high GC3% from different classes of organisms showed similar expected number of codons. Distribution of the genes and codons, based on codon usage demonstrated opposite pattern for different orientation of mirror plane when compared with each other. Overall our results provide a comprehensive analysis on the evolutionary relationship of the three structural nif genes, nifK, nifD and nifH, respectively, in the context of codon usage bias, GC content relationship and amino acid composition of the encoded proteins and exploration of crucial statistical method for the analysis of positive data with non-constant variance to identify the shape factors of codon adaptation index. PMID:26309237

  6. A Two-Locus Global DNA Barcode for Land Plants: The Coding rbcL Gene Complements the Non-Coding trnH-psbA Spacer Region

    PubMed Central

    Kress, W. John; Erickson, David L.

    2007-01-01

    Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588

  7. Coding region SNP analysis to enhance dog mtDNA discrimination power in forensic casework.

    PubMed

    Verscheure, Sophie; Backeljau, Thierry; Desmyter, Stijn

    2015-01-01

    The high population frequencies of three control region haplotypes contribute to the low discrimination power of the dog mtDNA control region. It also diminishes the evidential power of a match with one of these haplotypes in forensic casework. A mitochondrial genome study of 214 Belgian dogs suggested 26 polymorphic coding region sites that successfully resolved dogs with the three most frequent control region haplotypes. In this study, three SNP assays were developed to determine the identity of the 26 informative sites. The control region of 132 newly sampled dogs was sequenced and added to the study of 214 dogs. The assays were applied to 58 dogs of the haplotypes of interest, which confirmed their suitability for enhancing dog mtDNA discrimination power. In the Belgian population study of 346 dogs, the set of 26 sites divided the dogs into 25 clusters of mtGenome sequences with substantially lower population frequency estimates than their control region sequences. In case of a match with one of the three control region haplotypes, using these three SNP assays in conjunction with control region sequencing would augment the exclusion probability of dog mtDNA analysis from 92.9% to 97.0%. PMID:25299153

  8. Detection of spurious interruptions of protein-coding regions in cloned cDNA sequences by GeneMark analysis.

    PubMed

    Hirosawa, M; Ishikawa, K; Nagase, T; Ohara, O

    2000-09-01

    cDNA is an artificial copy of mRNA and, therefore, no cDNA can be completely free from suspicion of cloning errors. Because overlooking these cloning errors results in serious misinterpretation of cDNA sequences, development of an alerting system targeting spurious sequences in cloned cDNAs is an urgent requirement for massive cDNA sequence analysis. We describe here the application of a modified GeneMark program, originally designed for prokaryotic gene finding, for detection of artifacts in cDNA clones. This program serves to provide a warning when any spurious split of protein-coding regions is detected through statistical analysis of cDNA sequences based on Markov models. In this study, 817 cDNA sequences deposited in public databases by us were subjected to analysis using this alerting system to assess its sensitivity and specificity. The results indicated that any spurious split of protein-coding regions in cloned cDNAs could be sensitively detected and systematically revised by means of this system after the experimental validation of the alerts. Furthermore, this study offered us, for the first time, statistical data regarding the rates and types of errors causing protein-coding splits in cloned cDNAs obtained by conventional cloning methods. PMID:10984451

  9. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach

    SciTech Connect

    Uberbacher, E.C.; Mural, R.J. Univ. of Tennessee, Oak Ridge )

    1991-12-15

    Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. The authors describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, the authors method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the coding recognition module identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which the authors are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.

  10. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.

    PubMed Central

    Uberbacher, E C; Mural, R J

    1991-01-01

    Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the "coding recognition module" identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which we are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts. PMID:1763041

  11. Basal jawed vertebrate phylogeny inferred from multiple nuclear DNA-coded genes

    PubMed Central

    Kikugawa, Kanae; Katoh, Kazutaka; Kuraku, Shigehiro; Sakurai, Hiroshi; Ishida, Osamu; Iwabe, Naoyuki; Miyata, Takashi

    2004-01-01

    Background Phylogenetic analyses of jawed vertebrates based on mitochondrial sequences often result in confusing inferences which are obviously inconsistent with generally accepted trees. In particular, in a hypothesis by Rasmussen and Arnason based on mitochondrial trees, cartilaginous fishes have a terminal position in a paraphyletic cluster of bony fishes. No previous analysis based on nuclear DNA-coded genes could significantly reject the mitochondrial trees of jawed vertebrates. Results We have cloned and sequenced seven nuclear DNA-coded genes from 13 vertebrate species. These sequences, together with sequences available from databases including 13 jawed vertebrates from eight major groups (cartilaginous fishes, bichir, chondrosteans, gar, bowfin, teleost fishes, lungfishes and tetrapods) and an outgroup (a cyclostome and a lancelet), have been subjected to phylogenetic analyses based on the maximum likelihood method. Conclusion Cartilaginous fishes have been inferred to be basal to other jawed vertebrates, which is consistent with the generally accepted view. The minimum log-likelihood difference between the maximum likelihood tree and trees not supporting the basal position of cartilaginous fishes is 18.3 ± 13.1. The hypothesis by Rasmussen and Arnason has been significantly rejected with the minimum log-likelihood difference of 123 ± 23.3. Our tree has also shown that living holosteans, comprising bowfin and gar, form a monophyletic group which is the sister group to teleost fishes. This is consistent with a formerly prevalent view of vertebrate classification, although inconsistent with both of the current morphology-based and mitochondrial sequence-based trees. Furthermore, the bichir has been shown to be the basal ray-finned fish. Tetrapods and lungfish have formed a monophyletic cluster in the tree inferred from the concatenated alignment, being consistent with the currently prevalent view. It also remains possible that tetrapods are more closely

  12. High resolution methylome map of rat indicates role of intragenic DNA methylation in identification of coding region.

    PubMed

    Sati, Satish; Tanwar, Vinay Singh; Kumar, K Anand; Patowary, Ashok; Jain, Vaibhav; Ghosh, Sourav; Ahmad, Shadab; Singh, Meghna; Reddy, S Umakar; Chandak, Giriraj Ratan; Raghunath, Manchala; Sivasubbu, Sridhar; Chakraborty, Kausik; Scaria, Vinod; Sengupta, Shantanu

    2012-01-01

    DNA methylation is crucial for gene regulation and maintenance of genomic stability. Rat has been a key model system in understanding mammalian systemic physiology, however detailed rat methylome remains uncharacterized till date. Here, we present the first high resolution methylome of rat liver generated using Methylated DNA immunoprecipitation and high throughput sequencing (MeDIP-Seq) approach. We observed that within the DNA/RNA repeat elements, simple repeats harbor the highest degree of methylation. Promoter hypomethylation and exon hypermethylation were common features in both RefSeq genes and expressed genes (as evaluated by proteomic approach). We also found that although CpG islands were generally hypomethylated, about 6% of them were methylated and a large proportion (37%) of methylated islands fell within the exons. Notably, we obeserved significant differences in methylation of terminal exons (UTRs); methylation being more pronounced in coding/partially coding exons compared to the non-coding exons. Further, events like alternate exon splicing (cassette exon) and intron retentions were marked by DNA methylation and these regions are retained in the final transcript. Thus, we suggest that DNA methylation could play a crucial role in marking coding regions thereby regulating alternative splicing. Apart from generating the first high resolution methylome map of rat liver tissue, the present study provides several critical insights into methylome organization and extends our understanding of interplay between epigenome, gene expression and genome stability. PMID:22355382

  13. Improved PCR Amplification of Broad Spectrum GC DNA Templates

    PubMed Central

    Guido, Nicholas; Starostina, Elena; Leake, Devin; Saaem, Ishtiaq

    2016-01-01

    Many applications in molecular biology can benefit from improved PCR amplification of DNA segments containing a wide range of GC content. Conventional PCR amplification of DNA sequences with regions of GC less than 30%, or higher than 70%, is complex due to secondary structures that block the DNA polymerase as well as mispriming and mis-annealing of the DNA. This complexity will often generate incomplete or nonspecific products that hamper downstream applications. In this study, we address multiplexed PCR amplification of DNA segments containing a wide range of GC content. In order to mitigate amplification complications due to high or low GC regions, we tested a combination of different PCR cycling conditions and chemical additives. To assess the fate of specific oligonucleotide (oligo) species with varying GC content in a multiplexed PCR, we developed a novel method of sequence analysis. Here we show that subcycling during the amplification process significantly improved amplification of short template pools (~200 bp), particularly when the template contained a low percent of GC. Furthermore, the combination of subcycling and 7-deaza-dGTP achieved efficient amplification of short templates ranging from 10–90% GC composition. Moreover, we found that 7-deaza-dGTP improved the amplification of longer products (~1000 bp). These methods provide an updated approach for PCR amplification of DNA segments containing a broad range of GC content. PMID:27271574

  14. 3-base periodicity in coding DNA is affected by intercodon dinucleotides

    PubMed Central

    Sánchez, Joaquín

    2011-01-01

    All coding DNAs exhibit 3-base periodicity (TBP), which may be defined as the tendency of nucleotides and higher order n-tuples, e.g. trinucleotides (triplets), to be preferentially spaced by 3, 6, 9 etc, bases, and we have proposed an association between TBP and clustering of same-phase triplets. We here investigated if TBP was affected by intercodon dinucleotide tendencies and whether clustering of same-phase triplets was involved. Under constant protein sequence intercodon dinucleotide frequencies depend on the distribution of synonymous codons. So, possible effects were revealed by randomly exchanging synonymous codons without altering protein sequences to subsequently document changes in TBP via frequency distribution of distances (FDD) of DNA triplets. A tripartite positive correlation was found between intercodon dinucleotide frequencies, clustering of same-phase triplets and TBP. So, intercodon C|A (where “|” indicates the boundary between codons) was more frequent in native human DNA than in the codon-shuffled sequences; higher C|A frequency occurred along with more frequent clustering of C|AN triplets (where N jointly represents A, C, G and T) and with intense CAN TBP. The opposite was found for C|G, which was less frequent in native than in shuffled sequences; lower C|G frequency occurred together with reduced clustering of C|GN triplets and with less intense CGN TBP. We hence propose that intercodon dinucleotides affect TBP via same-phase triplet clustering. A possible biological relevance of our findings is briefly discussed. PMID:21814388

  15. Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics

    NASA Astrophysics Data System (ADS)

    Nicorici, Daniel; Astola, Jaakko

    2004-12-01

    Heterogeneous DNA sequences can be partitioned into homogeneous domains that are comprised of the four nucleotides A, C, G, and T and the stop-codons. Recursively, we apply a new entropic segmentation method on DNA sequences using Jensen-Shannon and Jensen-Rényi divergences in order to find the borders between coding and noncoding DNA regions. We have chosen 12- and 18-symbol alphabets that capture (i) the differential nucleotide composition in codons, and (ii) the differential stop-codon composition along all the three phases in both strands of the DNA. The new segmentation method is based on the Jensen-Rényi divergence measure, nucleotide statistics, and stop-codon statistics in both DNA strands. The recursive segmentation process requires no prior training on known datasets. Consequently, for three entire genomes of bacteria, we find that the use of nucleotide composition, stop-codon composition, and Jensen-Rényi divergence improve the accuracy of finding the borders between coding and noncoding regions in DNA sequences.

  16. URF6, Last Unidentified Reading Frame of Human mtDNA, Codes for an NADH Dehydrogenase Subunit

    NASA Astrophysics Data System (ADS)

    Chomyn, Anne; Cleeter, Michael W. J.; Ragan, C. Ian; Riley, Marcia; Doolittle, Russell F.; Attardi, Giuseppe

    1986-10-01

    The polypeptide encoded in URF6, the last unassigned reading frame of human mitochondrial DNA, has been identified with antibodies to peptides predicted from the DNA sequence. Antibodies prepared against highly purified respiratory chain NADH dehydrogenase from beef heart or against the cytoplasmically synthesized 49-kilodalton iron-sulfur subunit isolated from this enzyme complex, when added to a deoxycholate or a Triton X-100 mitochondrial lysate of HeLa cells, specifically precipitated the URF6 product together with the six other URF products previously identified as subunits of NADH dehydrogenase. These results strongly point to the URF6 product as being another subunit of this enzyme complex. Thus, almost 60% of the protein coding capacity of mammalian mitochondrial DNA is utilized for the assembly of the first enzyme complex of the respiratory chain. The absence of such information in yeast mitochondrial DNA dramatizes the variability in gene content of different mitochondrial genomes.

  17. Methods for sequencing GC-rich and CCT repeat DNA templates

    DOEpatents

    Robinson, Donna L.

    2007-02-20

    The present invention is directed to a PCR-based method of cycle sequencing DNA and other polynucleotide sequences having high CG content and regions of high GC content, and includes for example DNA strands with a high Cytosine and/or Guanosine content and repeated motifs such as CCT repeats.

  18. Structure of the gene coding for the sequence-specific DNA-methyltransferase of the B. subtilis phage SPR.

    PubMed Central

    Pósfai, G; Baldauf, F; Erdei, S; Pósfai, J; Venetianer, P; Kiss, A

    1984-01-01

    The nucleotide sequence of the gene coding for the 5'-GGCC and 5'-CCGG specific DNA methyltransferase of the Bacillus subtilis phage SPR was determined by the Maxam-Gilbert procedure. Transcriptional and translational signals of the sequence were assigned with the help of S1 mapping and translation in E. coli minicells. The gene codes for a 49 kd polypeptide. The amino acid sequence of the SPR methylase shows regions of homology with the sequence of the 5'-GGCC-specific BspRI modification methylase. Images PMID:6096817

  19. DNA sequence variation in a non-coding region of low recombination on the human X chromosome.

    PubMed

    Kaessmann, H; Heissig, F; von Haeseler, A; Pääbo, S

    1999-05-01

    DNA sequence variation has become a major source of insight regarding the origin and history of our species as well as an important tool for the identification of allelic variants associated with disease. Comparative sequencing of DNA has to date focused mainly on mitochondrial (mt) DNA, which due to its apparent lack of recombination and high evolutionary rate lends itself well to the study of human evolution. These advantages also entail limitations. For example, the high mutation rate of mtDNA results in multiple substitutions that make phylogenetic analysis difficult and, because mtDNA is maternally inherited, it reflects only the history of females. For the history of males, the non-recombining part of the paternally inherited Y chromosome can be studied. The extent of variation on the Y chromosome is so low that variation at particular sites known to be polymorphic rather than entire sequences are typically determined. It is currently unclear how some forms of analysis (such as the coalescent) should be applied to such data. Furthermore, the lack of recombination means that selection at any locus affects all 59 Mb of DNA. To gauge the extent and pattern of point substitutional variation in non-coding parts of the human genome, we have sequenced 10 kb of non-coding DNA in a region of low recombination at Xq13.3. Analysis of this sequence in 69 individuals representing all major linguistic groups reveals the highest overall diversity in Africa, whereas deep divergences also exist in Asia. The time elapsed since the most recent common ancestor (MRCA) is 535,000+/-119,000 years. We expect this type of nuclear locus to provide more answers about the genetic origin and history of humans. PMID:10319866

  20. Comparative analyses of distributions and functions of Z-DNA in Arabidopsis and rice.

    PubMed

    Zhou, Chan; Zhou, Fengfeng; Xu, Ying

    2009-04-01

    Left-handed Z-DNA is an energetically unfavorable DNA structure that could form mostly under certain physiological conditions and was known to be involved in a number of cellular activities such as transcription regulation. We have compared the distributions and functions of Z-DNA in the genomes of Arabidopsis and rice, and observed that Z-DNA occurs in rice at least 9 times more often than in Arabidopsis; similar observations hold for other monocots and dicots. In addition, Z-DNA is significantly enriched in the coding regions of Arabidopsis, and in the high-GC-content regions of rice. Based on our analyses, we speculate that Z-DNA may play a role in regulating the expression of transcription factors, inhibitors, translation repressors, succinate dehydrogenases and glutathione-disulfide reductases in Arabidopsis, and it may affect the expression of vesicle and nucleosome genes and genes involved in alcohol transporter activity, stem cell maintenance, meristem development and reproductive structure development in rice. PMID:19103278

  1. Signalign: An Ontology of DNA as Signal for Comparative Gene Structure Prediction Using Information-Coding-and-Processing Techniques.

    PubMed

    Yu, Ning; Guo, Xuan; Gu, Feng; Pan, Yi

    2016-03-01

    Conventional character-analysis-based techniques in genome analysis manifest three main shortcomings-inefficiency, inflexibility, and incompatibility. In our previous research, a general framework, called DNA As X was proposed for character-analysis-free techniques to overcome these shortcomings, where X is the intermediates, such as digit, code, signal, vector, tree, graph network, and so on. In this paper, we further implement an ontology of DNA As Signal, by designing a tool named Signalign for comparative gene structure analysis, in which DNA sequences are converted into signal series, processed by modified method of dynamic time warping and measured by signal-to-noise ratio (SNR). The ontology of DNA As Signal integrates the principles and concepts of other disciplines including information coding theory and signal processing into sequence analysis and processing. Comparing with conventional character-analysis-based methods, Signalign can not only have the equivalent or superior performance, but also enrich the tools and the knowledge library of computational biology by extending the domain from character/string to diverse areas. The evaluation results validate the success of the character-analysis-free technique for improved performances in comparative gene structure prediction. PMID:27046906

  2. Natural Selection on Coding and Noncoding DNA Sequences Is Associated with Virulence Genes in a Plant Pathogenic Fungus

    PubMed Central

    Rech, Gabriel E.; Sanz-Martín, José M.; Anisimova, Maria; Sukno, Serenella A.; Thon, Michael R.

    2014-01-01

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5′ untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen. PMID:25193312

  3. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system.

    PubMed

    Kawano, Tomonori

    2013-03-01

    There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303

  4. Complete Mitochondrial DNA Sequence of the Mucoralean Fungus Absidia glauca, a Model for Studying Host-Parasite Interactions

    PubMed Central

    Ellenberger, Sabrina; Burmester, Anke

    2016-01-01

    The mitochondrial DNA (mtDNA) of Absidia glauca has been completely sequenced. It is 63,080 bp long, has a G+C content of 28%, and contains the standard fungal gene set. A. glauca is the recipient in a laboratory model for horizontal gene transfer with Parasitella parasitica as a donor of nuclei and mitochondria. PMID:27013042

  5. Triplet code-independent programming of living systems organisation by DNA: the link with intelligence and memory.

    PubMed

    Adams, D H

    1995-05-01

    Previous suggestions from this laboratory (3), (a) that within its molecular electronic structure, DNA houses a computer-analog program of immense complexity, operating independently of, but complementary to, triplet coding and (b) that, inter alia, this program is the driving force for organising and executing the construction of species individuals in three dimensions, are extended in the present communication. It is now concluded that the DNA program also embodies an 'intelligence' component, which extends its organising ability both qualitatively and quantitatively beyond any of the heavily circumscribed 'self-organising' attributes claimed to be associated with naturally occurring inanimate systems. Further, that as part of the developmental process, a program component organises the fabrication of mammalian central nervous systems, including that of human beings with the associated attributes of intelligence, creativity and constructional skills. It is further suggested that the sophisticated random access memory system associated with human beings in particular may be explicable in terms of an extension of the DNA programming system: basically this involves the latter operating as computer-type 'hardware' for the storage of long-term memory and interacting with, primarily, glial cell RNA, acting as 'software' and storing short term traces. Finally, it is suggested that such an interrelationship between DNA/RNA molecular electronic structures can provide the necessary memory storage capacity and flexibility and also facilitates random access to the long-term DNA memory store. PMID:8583976

  6. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    PubMed Central

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-01-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity. PMID:26472689

  7. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    NASA Astrophysics Data System (ADS)

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-10-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity.

  8. The DNA sequence and biology of human chromosome 19

    SciTech Connect

    Grimwood, J; Gordon, L A; Olsen, A; Terry, A; Schmutz, J; Lamerdin, J; Hellsten, U; Goodstein, D; Couronne, O; Tran-Gyamfi, M

    2004-04-06

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high GC content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in Mendelian disorders, including familial hypercholesterolemia and insulin-resistant diabetes. Nearly one quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

  9. Evolutionary genomics in Metazoa: the mitochondrial DNA as a model system.

    PubMed

    Saccone, C; De Giorgi, C; Gissi, C; Pesole, G; Reyes, A

    1999-09-30

    One of the most important aspects of mitochondrial (mt) genome evolution in Metazoa is constancy of size and gene content of mtDNA, whose plasticity is maintained through a great variety of gene rearrangements probably mediated by tRNA genes. The trend of mtDNA to maintain the same genetic structure within a phylum (e.g., Chordata) is generally accepted, although more recent reports show that a considerable number of transpositions are observed also between closely related organisms. Base composition of mtDNA is extremely variable. Genome GC content is often low and, when it increases, the two complementary bases distribute asymmetrically, creating, particularly in vertebrates, a negative GC-skew. In mammals, we have found coding strand base composition and average degree of gene conservation to be related to the asymmetric replication mechanism of mtDNA. A quantitative measurement of mtDNA evolutionary rate has revealed that each of the various components has a different evolutionary rate. Non-synonymous rates are gene specific and fall in a range comparable to that of nuclear genes, whereas synonymous rates are about 22-fold higher in mt than in nuclear genes. tRNA genes are among the most conserved but, when compared to their nuclear counterparts, they evolve 100 times faster. Finally, we describe some molecular phylogenetic reconstructions which have produced unexpected outcomes, and might change our vision of the classification of living organisms. PMID:10570997

  10. Bio-bar-code dendrimer-like DNA as signal amplifier for cancerous cells assay using ruthenium nanoparticle-based ultrasensitive chemiluminescence detection.

    PubMed

    Bi, Sai; Hao, Shuangyuan; Li, Li; Zhang, Shusheng

    2010-09-01

    Bio-bar-code dendrimer-like DNA (bbc-DL-DNA) is employed as a label for the amplification assay of cancer cells in combination with the newly explored chemiluminescence (CL) system of luminol-H(2)O(2)-Ru(3+) and specificity of structure-switching aptamers selected by cell-based SELEX. PMID:20652188

  11. Counterintuitive DNA Sequence Dependence in Supercoiling-Induced DNA Melting

    PubMed Central

    Vlijm, Rifka; v.d. Torre, Jaco; Dekker, Cees

    2015-01-01

    The metabolism of DNA in cells relies on the balance between hybridized double-stranded DNA (dsDNA) and local de-hybridized regions of ssDNA that provide access to binding proteins. Traditional melting experiments, in which short pieces of dsDNA are heated up until the point of melting into ssDNA, have determined that AT-rich sequences have a lower binding energy than GC-rich sequences. In cells, however, the double-stranded backbone of DNA is destabilized by negative supercoiling, and not by temperature. To investigate what the effect of GC content is on DNA melting induced by negative supercoiling, we studied DNA molecules with a GC content ranging from 38% to 77%, using single-molecule magnetic tweezer measurements in which the length of a single DNA molecule is measured as a function of applied stretching force and supercoiling density. At low force (<0.5pN), supercoiling results into twisting of the dsDNA backbone and loop formation (plectonemes), without inducing any DNA melting. This process was not influenced by the DNA sequence. When negative supercoiling is introduced at increasing force, local melting of DNA is introduced. We measured for the different DNA molecules a characteristic force Fchar, at which negative supercoiling induces local melting of the dsDNA. Surprisingly, GC-rich sequences melt at lower forces than AT-rich sequences: Fchar = 0.56pN for 77% GC but 0.73pN for 38% GC. An explanation for this counterintuitive effect is provided by the realization that supercoiling densities of a few percent only induce melting of a few percent of the base pairs. As a consequence, denaturation bubbles occur in local AT-rich regions and the sequence-dependent effect arises from an increased DNA bending/torsional energy associated with the plectonemes. This new insight indicates that an increased GC-content adjacent to AT-rich DNA regions will enhance local opening of the double-stranded DNA helix. PMID:26513573

  12. A novel non-coding RNA lncRNA-JADE connects DNA damage signalling to histone H4 acetylation

    PubMed Central

    Wan, Guohui; Hu, Xiaoxiao; Liu, Yunhua; Han, Cecil; Sood, Anil K; Calin, George A; Zhang, Xinna; Lu, Xiongbin

    2013-01-01

    A prompt and efficient DNA damage response (DDR) eliminates the detrimental effects of DNA lesions in eukaryotic cells. Basic and preclinical studies suggest that the DDR is one of the primary anti-cancer barriers during tumorigenesis. The DDR involves a complex network of processes that detect and repair DNA damage, in which long non-coding RNAs (lncRNAs), a new class of regulatory RNAs, may play an important role. In the current study, we identified a novel lncRNA, lncRNA-JADE, that is induced after DNA damage in an ataxia-telangiectasia mutated (ATM)-dependent manner. LncRNA-JADE transcriptionally activates Jade1, a key component in the HBO1 (human acetylase binding to ORC1) histone acetylation complex. Consequently, lncRNA-JADE induces histone H4 acetylation in the DDR. Markedly higher levels of lncRNA-JADE were observed in human breast tumours in comparison with normal breast tissues. Knockdown of lncRNA-JADE significantly inhibited breast tumour growth in vivo. On the basis of these results, we propose that lncRNA-JADE is a key functional link that connects the DDR to histone H4 acetylation, and that dysregulation of lncRNA-JADE may contribute to breast tumorigenesis. PMID:24097061

  13. Comparison of nanodosimetric parameters of track structure calculated by the Monte Carlo codes Geant4-DNA and PTra

    NASA Astrophysics Data System (ADS)

    Lazarakis, P.; Bug, M. U.; Gargioni, E.; Guatelli, S.; Rabus, H.; Rosenfeld, A. B.

    2012-03-01

    The concept of nanodosimetry is based on the assumption that initial damage to cells is related to the number of ionizations (the ionization cluster size) directly produced by single particles within, or in the close vicinity of, short segments of DNA. The ionization cluster-size distribution and other nanodosimetric quantities, however, are not directly measurable in biological targets and our current knowledge is mostly based on numerical simulations of particle tracks in water, calculating track structure parameters for nanometric target volumes. The assessment of nanodosimetric quantities derived from particle-track calculations using different Monte Carlo codes plays, therefore, an important role for a more accurate evaluation of the initial damage to cells and, as a consequence, of the biological effectiveness of ionizing radiation. The aim of this work is to assess the differences in the calculated nanodosimetric quantities obtained with Geant4-DNA as compared to those of the ad hoc particle-track Monte Carlo code ‘PTra’ developed at Physikalisch-Technische Bundesanstalt (PTB), Germany. The comparison of the two codes was made for incident electrons of energy in the range between 50 eV and 10 keV, for protons of energy between 300 keV and 10 MeV, and for alpha particles of energy between 1 and 10 MeV as these were the energy ranges available in both codes at the time this investigation was carried out. Good agreement was found for nanodosimetric characteristics of track structure calculated in the high-energy range of each particle type. For lower energies, significant differences were observed, most notably in the estimates of the biological effectiveness. The largest relative differences obtained were over 50%; however, generally the order of magnitude was between 10% and 20%.

  14. Comparison of nanodosimetric parameters of track structure calculated by the Monte Carlo codes Geant4-DNA and PTra.

    PubMed

    Lazarakis, P; Bug, M U; Gargioni, E; Guatelli, S; Rabus, H; Rosenfeld, A B

    2012-03-01

    The concept of nanodosimetry is based on the assumption that initial damage to cells is related to the number of ionizations (the ionization cluster size) directly produced by single particles within, or in the close vicinity of, short segments of DNA. The ionization cluster-size distribution and other nanodosimetric quantities, however, are not directly measurable in biological targets and our current knowledge is mostly based on numerical simulations of particle tracks in water, calculating track structure parameters for nanometric target volumes. The assessment of nanodosimetric quantities derived from particle-track calculations using different Monte Carlo codes plays, therefore, an important role for a more accurate evaluation of the initial damage to cells and, as a consequence, of the biological effectiveness of ionizing radiation. The aim of this work is to assess the differences in the calculated nanodosimetric quantities obtained with Geant4-DNA as compared to those of the ad hoc particle-track Monte Carlo code 'PTra' developed at Physikalisch-Technische Bundesanstalt (PTB), Germany. The comparison of the two codes was made for incident electrons of energy in the range between 50 eV and 10 keV, for protons of energy between 300 keV and 10 MeV, and for alpha particles of energy between 1 and 10 MeV as these were the energy ranges available in both codes at the time this investigation was carried out. Good agreement was found for nanodosimetric characteristics of track structure calculated in the high-energy range of each particle type. For lower energies, significant differences were observed, most notably in the estimates of the biological effectiveness. The largest relative differences obtained were over 50%; however, generally the order of magnitude was between 10% and 20%. PMID:22330641

  15. Roles of DNA mutation in the coding region and DNA methylation in the 5' flanking region of BRCA1 in canine mammary tumors.

    PubMed

    Qiu, Hengbin; Lin, Deigui

    2016-07-01

    The Breast cancer 1, early onset gene (BRCA1) is known to be significantly associated with human familial breast cancer and is identified to play an important role in canine mammary tumors. Here, genetic variations in the coding region and DNA methylation in the 5' flanking region of BRCA1 in canine mammary tumor samples, 15 each of benign and malignant against 10 normal canine mammary tissue samples, were analyzed using the direct sequencing method. The results indicated two point mutations each in the coding region of canine BRCA1 in one benign mammary tumor sample (4702G >T and 4765G >T) and in one malignant canine mammary tumor sample (3619A >G and 4006G >A). No mutations were detected in the normal canine mammary tissue samples. The 4702G >T mutation was found to terminate further translation. The physical effect of the 4765G >T mutation was found to be the repalacement of the glutamate residue with glutamine. The physical effect of the 3619A >G mutation was found to be the replacement of the threonine residue with alanine, and that of mutation 4006G >A was the replacement of the valine residue with isoleucine in the BRCA1 protein. Bisulfite sequencing detected methylated CpG sites in one canine malignant mammary tumor sample. In conclusion, the present study elucidated the mutational status of the BRCA1 coding region and methylation status of the 5' flanking region of BRCA1 in canine mammary tumors. PMID:26888582

  16. Roles of DNA mutation in the coding region and DNA methylation in the 5′ flanking region of BRCA1 in canine mammary tumors

    PubMed Central

    QIU, Hengbin; LIN, Deigui

    2016-01-01

    The Breast cancer 1, early onset gene (BRCA1) is known to be significantly associated with human familial breast cancer and is identified to play an important role in canine mammary tumors. Here, genetic variations in the coding region and DNA methylation in the 5′ flanking region of BRCA1 in canine mammary tumor samples, 15 each of benign and malignant against 10 normal canine mammary tissue samples, were analyzed using the direct sequencing method. The results indicated two point mutations each in the coding region of canine BRCA1 in one benign mammary tumor sample (4702G >T and 4765G >T) and in one malignant canine mammary tumor sample (3619A >G and 4006G >A). No mutations were detected in the normal canine mammary tissue samples. The 4702G >T mutation was found to terminate further translation. The physical effect of the 4765G >T mutation was found to be the repalacement of the glutamate residue with glutamine. The physical effect of the 3619A >G mutation was found to be the replacement of the threonine residue with alanine, and that of mutation 4006G >A was the replacement of the valine residue with isoleucine in the BRCA1 protein. Bisulfite sequencing detected methylated CpG sites in one canine malignant mammary tumor sample. In conclusion, the present study elucidated the mutational status of the BRCA1 coding region and methylation status of the 5′ flanking region of BRCA1 in canine mammary tumors. PMID:26888582

  17. Recognition Code of ZNF191(243-368) and Its Interaction with DNA

    PubMed Central

    Zhao, Dongxin; Huang, Zhongxian

    2015-01-01

    ZNF191(243-368) is the C-terminal region of ZNF191 which contains a putative DNA-binding domain of four Cys2His2 zinc finger motifs. In this study, an expression vector of a fusion protein of ZNF191(243-368) with glutathione-S-transferase (GST) was constructed and transformed into Escherichia coli BL21. The fusion protein GST-ZNF191(243-368) was expressed using this vector to investigate the protein-DNA binding reaction through an affinity selection strategy on the basis of the binding quality of the zinc finger domain. Results showed that ZNF191(243-368) can selectively bind with sequences and react with genes which contain an AGGG core. However, the recognition mechanism of Cys2His2 zinc finger proteins to DNA warrants further investigation. PMID:26457075

  18. Non-coding RNA generated following lariat-debranching mediates targeting of AID to DNA

    PubMed Central

    Zheng, Simin; Vuong, Bao Q.; Vaidyanathan, Bharat; Lin, Jia-Yu; Huang, Feng-Ting; Chaudhuri, Jayanta

    2015-01-01

    SUMMARY Transcription through immunoglobulin switch (S) regions is essential for class switch recombination (CSR) but no molecular function of the transcripts has been described. Likewise, recruitment of activation-induced cytidine deaminase (AID) to S regions is critical for CSR; however, the underlying mechanism has not been fully elucidated. Here, we demonstrate that intronic switch RNA acts in trans to target AID to S region DNA. AID binds directly to switch RNA through G-quadruplexes formed by the RNA molecules. Disruption of this interaction by mutation of a key residue in the putative RNA-binding domain of AID impairs recruitment of AID to S region DNA, thereby abolishing CSR. Additionally, inhibition of RNA lariat processing leads to loss of AID localization to S regions and compromises CSR; both defects can be rescued by exogenous expression of switch transcripts in a sequence-specific manner. These studies uncover an RNA-mediated mechanism of targeting AID to DNA. PMID:25957684

  19. DNA sequencing and bar-coding using solid-state nanopores.

    PubMed

    Atas, Evrim; Singer, Alon; Meller, Amit

    2012-12-01

    Nanopores have emerged as a prominent single-molecule analytic tool with particular promise for genomic applications. In this review, we discuss two potential applications of the nanopore sensors: First, we present a nanopore-based single-molecule DNA sequencing method that utilizes optical detection for massively parallel throughput. Second, we describe a method by which nanopores can be used as single-molecule genotyping tools. For DNA sequencing, the distinction among the four types of DNA nucleobases is achieved by employing a biochemical procedure for DNA expansion. In this approach, each nucleobase in each DNA strand is converted into one of four predefined unique 16-mers in a process that preserves the nucleobase sequence. The resulting converted strands are then hybridized to a library of four molecular beacons, each carrying a unique fluorophore tag, that are perfect complements to the 16-mers used for conversion. Solid-state nanopores are then used to sequentially remove these beacons, one after the other, leading to a series of photon bursts in four colors that can be optically detected. Single-molecule genotyping is achieved by tagging the DNA fragments with γ-modified synthetic peptide nucleic acid probes coupled to an electronic characterization of the complexes using solid-state nanopores. This method can be used to identify and differentiate genes with a high level of sequence similarity at the single-molecule level, but different pathology or response to treatment. We will illustrate this method by differentiating the pol gene for two highly similar human immunodeficiency virus subtypes, paving the way for a novel diagnostics platform for viral classification. PMID:23109189

  20. Human phosphoribosylformylglycineamide amidotransferase (FGARAT): regional mapping, complete coding sequence, isolation of a functional genomic clone, and DNA sequence analysis.

    PubMed

    Patterson, D; Bleskan, J; Gardiner, K; Bowersox, J

    1999-11-01

    Purines play essential roles in many cellular functions, including DNA replication, transcription, intra- and extra-cellular signaling, energy metabolism, and as coenzymes for many biochemical reactions. The de-novo synthesis of purines requires 10 enzymatic steps for the production of inosine monophosphate (IMP). Defects in purine metabolism are associated with human diseases. Further, many anticancer agents function as inhibitors of the de-novo biosynthetic pathway. Genes or cDNAs for most of the enzymes comprising this pathway have been isolated from humans or other mammals. One notable exception is the phosphoribosylformylglycineamide amidotransferase (FGARAT) gene, which encodes the fourth step of this pathway. This gene has been cloned from numerous microorganisms and from Drosophila melanogaster and C. elegans. We report here the identification of a human cDNA containing the coding region of the FGARAT mRNA and the isolation of a P1 clone that contains an intact human FGARAT gene. The P1 clone corrects the purine auxotrophy and protein deficiency of Chinese hamster ovary (CHO) cell mutants (AdeB) deficient in both the activity and the protein for FGARAT. The P1 clone was used to regionally map the FGARAT gene to chromosome region 17p13, a location consistent with our prior assignment of this gene to chromosome 17. A comparison of the DNA sequence of the human FGARAT and FGARAT DNA sequence from 17 other organisms is reported. The isolation of this gene means that DNA clones for all the 10 steps of IMP synthesis have been isolated from humans or other mammals. PMID:10548741

  1. Absence of Novel CYP4F2 and VKORC1 Coding Region DNA Variants in Patients Requiring High Warfarin Doses

    PubMed Central

    Burmester, James K.; Berg, Richard L.; Glurich, Ingrid; Yale, Steven H.; Schmelzer, John R.; Caldwell, Michael D.

    2011-01-01

    Objective Warfarin is an FDA-approved oral anticoagulant for long-term prevention of thromboembolism. Substantial inter-individual variation in dosing requirements and the narrow therapeutic index of this widely-prescribed drug make safe initiation and dose stabilization challenging. Single nucleotide polymorphisms (SNPs) occurring in CYP2C9, VKORC1, and CYP4F2 genes are known to impact dose, and VKORC1 and CYP4F2 polymorphisms are associated with higher therapeutic dose requirements in our cohort. However, the most advanced regression models using personal, clinical, and genetic factors to predict individual stable dose account for only 50% to 60% of the observed variability in stable theapeutic dose in Caucasians. Design and Methods In this study, we used DNA sequence analysis to determine whether additional variants in CYP4F2 and VKORC1 gene coding regions contribute to variable dosing requirements among individuals for whom the actual dose was the highest relative to regression model- predicted dose. Results and Conclusions No novel DNA variants in the coding regions of these genes were identified among subjects requiring high warfarin doses, suggesting that other factors yet to be defined contribute to variability in warfarin dose requirements in this subset of our cohort. PMID:21562135

  2. Molecular cloning of the cDNA coding for mouse aldehyde oxidase: tissue distribution and regulation in vivo by testosterone.

    PubMed Central

    Kurosaki, M; Demontis, S; Barzago, M M; Garattini, E; Terao, M

    1999-01-01

    The cDNA coding for mouse aldehyde oxidase (AO), a molybdoflavoprotein, has been isolated and characterized. The cDNA is 4347 nt long and consists of an open reading frame predicting a polypeptide of 1333 amino acid residues, with 5' and 3' untranslated regions of 13 and 335 nt respectively. The apparent molecular mass of the translation product in vitro derived from the corresponding cRNA is consistent with that of the monomeric subunit of the AO holoenzyme. The cDNA codes for a catalytically active form of AO, as demonstrated by transient transfection experiments conducted in the HC11 mouse mammary epithelial cell line. The deduced primary structure of the AO protein contains consensus sequences for two distinct 2Fe-2S redox centres and a molybdopterin-binding site. The amino acid sequence of the mouse AO has a high degree of similarity with the human and bovine counterparts, and a significant degree of relatedness to AO proteins of plant origin. Northern blot and in situ hybridization analyses demonstrate that hepatocytes, cardiocytes, lung endothelial or epithelial cells and oesophagus epithelial cells express high levels of AO mRNA. In the various tissues and organs considered, the level of AO mRNA expression is not strictly correlated with the amount of the corresponding protein, suggesting that the synthesis of the AO enzyme is under translational or post-translational control. In addition, we observed sex-related regulation of AO protein synthesis. In the liver of male animals, despite similar amounts of AO mRNA, the levels of the AO enzyme and corresponding polypeptide are significantly higher than those in female animals. Treatment of female mice with testosterone increases the amounts of AO mRNA and of the relative translation product to levels similar to those in male animals. PMID:10377246

  3. An Abundant Class of Non-coding DNA Can Prevent Stochastic Gene Silencing in the C. elegans Germline.

    PubMed

    Frøkjær-Jensen, Christian; Jain, Nimit; Hansen, Loren; Davis, M Wayne; Li, Yongbin; Zhao, Di; Rebora, Karine; Millet, Jonathan R M; Liu, Xiao; Kim, Stuart K; Dupuy, Denis; Jorgensen, Erik M; Fire, Andrew Z

    2016-07-14

    Cells benefit from silencing foreign genetic elements but must simultaneously avoid inactivating endogenous genes. Although chromatin modifications and RNAs contribute to maintenance of silenced states, the establishment of silenced regions will inevitably reflect underlying DNA sequence and/or structure. Here, we demonstrate that a pervasive non-coding DNA feature in Caenorhabditis elegans, characterized by 10-base pair periodic An/Tn-clusters (PATCs), can license transgenes for germline expression within repressive chromatin domains. Transgenes containing natural or synthetic PATCs are resistant to position effect variegation and stochastic silencing in the germline. Among endogenous genes, intron length and PATC-character undergo dramatic changes as orthologs move from active to repressive chromatin over evolutionary time, indicating a dynamic character to the An/Tn periodicity. We propose that PATCs form the basis of a cellular immune system, identifying certain endogenous genes in heterochromatic contexts as privileged while foreign DNA can be suppressed with no requirement for a cellular memory of prior exposure. PMID:27374334

  4. Codon usage, genetic code and phylogeny of Dictyostelium discoideum mitochondrial DNA as deduced from a 7.3-kb region.

    PubMed

    Angata, K; Kuroe, K; Yanagisawa, K; Tanaka, Y

    1995-02-01

    We have sequenced a region (7,376-bp) of the mitochondrial (mt) DNA (54 kb) of the cellular slime mold, Dictyostelium discoideum. From the DNA and amino-acid sequence comparisons with known sequences, genes for ATPase subunit 9 (ATP9), cytochrome b (CYTB), NADH dehydrogenase subunits 1, 3 and 6 (ND1, ND3 and ND6), small subunit rRNA (SSU rRNA) and seven tRNAs (Arg, Asn, Cys, Lys, f-Met, Met and Pro) have been identified. The sequenced region of the mtDNA has a high average A + T-content (70.8%). The A + T-content of protein-genes (73.6%) is considerably higher than that of RNA genes (61.3%). Even with the strong AT-bias, the genetic code employed is most probably the universal one. All seven tRNAs are able to form typical clover leaf structures. The molecular phylogenetic trees of CYTB and SSU rRNA suggest that D. discoideum is closer to green plants than to animals and fungi. PMID:7736610

  5. Specific gene hypomethylation and cancer: New insights into coding region feature trends

    PubMed Central

    Daura-Oller, Elias; Cabre, Maria; Montero, Miguel A; Paternain, Jose L; Romeu, Antoni

    2009-01-01

    Giving coding region structural features a role in the hypomethylation of specific genes, the occurrence of G+C content, CpG islands, repeat and retrotransposable elements in demethylated genes related to cancer has been evaluated. A comparative analysis among different cancer types has also been performed. In this work, the inter-cancer coding region features comparative analysis carried out, show insights into what structural trends/patterns are present in the studied cancers. PMID:19707296

  6. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system

    PubMed Central

    Kawano, Tomonori

    2013-01-01

    There have been a wide variety of approaches for handling the pieces of DNA as the “unplugged” tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given “passwords” and/or secret numbers using DNA sequences. The “passwords” of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original “passwords.” The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303

  7. Fine-tuning the ubiquitin code at DNA double-strand breaks: deubiquitinating enzymes at work

    PubMed Central

    Citterio, Elisabetta

    2015-01-01

    Ubiquitination is a reversible protein modification broadly implicated in cellular functions. Signaling processes mediated by ubiquitin (ub) are crucial for the cellular response to DNA double-strand breaks (DSBs), one of the most dangerous types of DNA lesions. In particular, the DSB response critically relies on active ubiquitination by the RNF8 and RNF168 ub ligases at the chromatin, which is essential for proper DSB signaling and repair. How this pathway is fine-tuned and what the functional consequences are of its deregulation for genome integrity and tissue homeostasis are subject of intense investigation. One important regulatory mechanism is by reversal of substrate ubiquitination through the activity of specific deubiquitinating enzymes (DUBs), as supported by the implication of a growing number of DUBs in DNA damage response processes. Here, we discuss the current knowledge of how ub-mediated signaling at DSBs is controlled by DUBs, with main focus on DUBs targeting histone H2A and on their recent implication in stem cell biology and cancer. PMID:26442100

  8. Mitochondrial comparative genomics and phylogenetic signal assessment of mtDNA among arbuscular mycorrhizal fungi.

    PubMed

    Nadimi, Maryam; Daubois, Laurence; Hijri, Mohamed

    2016-05-01

    Mitochondrial (mt) genes, such as cytochrome C oxidase genes (cox), have been widely used for barcoding in many groups of organisms, although this approach has been less powerful in the fungal kingdom due to the rapid evolution of their mt genomes. The use of mt genes in phylogenetic studies of Dikarya has been met with success, while early diverging fungal lineages remain less studied, particularly the arbuscular mycorrhizal fungi (AMF). Advances in next-generation sequencing have substantially increased the number of publically available mtDNA sequences for the Glomeromycota. As a result, comparison of mtDNA across key AMF taxa can now be applied to assess the phylogenetic signal of individual mt coding genes, as well as concatenated subsets of coding genes. Here we show comparative analyses of publically available mt genomes of Glomeromycota, augmented with two mtDNA genomes that were newly sequenced for this study (Rhizophagus irregularis DAOM240159 and Glomus aggregatum DAOM240163), resulting in 16 complete mtDNA datasets. R. irregularis isolate DAOM240159 and G. aggregatum isolate DAOM240163 showed mt genomes measuring 72,293bp and 69,505bp with G+C contents of 37.1% and 37.3%, respectively. We assessed the phylogenies inferred from single mt genes and complete sets of coding genes, which are referred to as "supergenes" (16 concatenated coding genes), using Shimodaira-Hasegawa tests, in order to identify genes that best described AMF phylogeny. We found that rnl, nad5, cox1, and nad2 genes, as well as concatenated subset of these genes, provided phylogenies that were similar to the supergene set. This mitochondrial genomic analysis was also combined with principal coordinate and partitioning analyses, which helped to unravel certain evolutionary relationships in the Rhizophagus genus and for G. aggregatum within the Glomeromycota. We showed evidence to support the position of G. aggregatum within the R. irregularis 'species complex'. PMID:26868331

  9. Generation and analysis of end sequence database for T-DNA tagging lines in rice.

    PubMed

    An, Suyoung; Park, Sunhee; Jeong, Dong-Hoon; Lee, Dong-Yeon; Kang, Hong-Gyu; Yu, Jung-Hwa; Hur, Junghe; Kim, Sung-Ryul; Kim, Young-Hea; Lee, Miok; Han, Soonki; Kim, Soo-Jin; Yang, Jungwon; Kim, Eunjoo; Wi, Soo Jin; Chung, Hoo Sun; Hong, Jong-Pil; Choe, Vitnary; Lee, Hak-Kyung; Choi, Jung-Hee; Nam, Jongmin; Kim, Seong-Ryong; Park, Phun-Bum; Park, Ky Young; Kim, Woo Taek; Choe, Sunghwa; Lee, Chin-Bum; An, Gynheung

    2003-12-01

    We analyzed 6749 lines tagged by the gene trap vector pGA2707. This resulted in the isolation of 3793 genomic sequences flanking the T-DNA. Among the insertions, 1846 T-DNAs were integrated into genic regions, and 1864 were located in intergenic regions. Frequencies were also higher at the beginning and end of the coding regions and upstream near the ATG start codon. The overall GC content at the insertion sites was close to that measured from the entire rice (Oryza sativa) genome. Functional classification of these 1846 tagged genes showed a distribution similar to that observed for all the genes in the rice chromosomes. This indicates that T-DNA insertion is not biased toward a particular class of genes. There were 764, 327, and 346 T-DNA insertions in chromosomes 1, 4 and 10, respectively. Insertions were not evenly distributed; frequencies were higher at the ends of the chromosomes and lower near the centromere. At certain sites, the frequency was higher than in the surrounding regions. This sequence database will be valuable in identifying knockout mutants for elucidating gene function in rice. This resource is available to the scientific community at http://www.postech.ac.kr/life/pfg/risd. PMID:14630961

  10. Genomic DNA sequence of a rice gene coding for a pullulanase-type of starch debranching enzyme.

    PubMed

    Francisco, P B; Zhang, Y; Park, S Y; Ogata, N; Yamanouchi, H; Nakamura, Y

    1998-09-01

    A genomic DNA containing a rice (Oryza sativa L., cv. Norin-8) gene coding for a pullulanase-type starch debranching enzyme (EC 3.2.1. 41) was sequenced (EMBL/GenBank/DDBJ accession number AB012915). Along the 15, 248 bp DNA, the pullulanase gene is split into 26 exons. The four pullulanase consensus regions are positioned in the middle portion of the sequence and are separated by long introns and 1-3 exons. Comparison of the rice cv. Norin-8 pullulanase genomic structure with that of barley pullulanase (limit dextrinase) (F. Lok et al., EMBL/GenBank/DDBJ accession number AF022725) indicates that most of the pullulanase exons are highly conserved. Alignment of the nucleotide bases of rice exon 8 with those of barley exon 8-intron 8-exon 9 fragment suggests that the 85 bp internal sequence of rice exon 8 was originally an intron, a possibility further indicated by the absence in barley and spinach (A. Renz et al., EMBL/GenBank/DDBJ accession number X83969) pullulanases of amino acid residues encoded by the 85 bp fragment. PMID:9748665

  11. DNA Damage-Induced Transcription of Transposable Elements and Long Non-coding RNAs in Arabidopsis Is Rare and ATM-Dependent.

    PubMed

    Wang, Zhenxing; Schwacke, Rainer; Kunze, Reinhard

    2016-08-01

    Induction and mobilization of transposable elements (TEs) following DNA damage or other stresses has been reported in prokaryotes and eukaryotes. Recently it was discovered that eukaryotic TEs are frequently associated with long non-coding RNAs (lncRNAs), many of which are also upregulated by stress. Yet, it is unknown whether DNA damage-induced transcriptional activation of TEs and lncRNAs occurs sporadically or is a synchronized, genome-wide response. Here we investigated the transcriptome of Arabidopsis wild-type (WT) and ataxia telangiectasia mutated (atm) mutant plants 3 h after induction of DNA damage. In WT, expression of 5.2% of the protein-coding genes is ≥2-fold changed, whereas in atm plants, only 2.6% of these genes are regulated, and the response of genes associated with DNA repair, replication, and cell cycle is largely lost. In contrast, only less than 0.6% of TEs and lncRNAs respond to DNA damage in WT plants, and the regulation of ≥95% of them is ATM-dependent. The ATM-downstream factors BRCA1, DRM1, JMJ30, AGO2, and the ATM-independent AGO4 participate in the regulation of individual TEs and lncRNAs. Remarkably, protein-coding genes located adjacent to DNA damage-responsive TEs and lncRNAs are frequently coexpressed, which is consistent with the hypothesis that TEs and lncRNAs located close to genes commonly function as controlling elements. PMID:27150037

  12. First approximation of a stereochemical rationale for the genetic code based on the topography and physicochemical properties of "cavities" constructed from models of DNA.

    PubMed Central

    Hendry, L B; Bransome, E D; Hutson, M S; Campbell, L K

    1981-01-01

    To examine the question of whether or not the genetic code has a stereochemical basis, we used artificial constructs of the topography and physicochemical features of unique "cavities" formed by removal of the second codon base in B-DNA. The effects of base changes on the stereochemistry of the cavities are consistent with the pattern of the genetic code. Fits into the cavities of the side chains of the 20 L amino acids involved in protein synthesis can be demonstrated by using conventional physicochemical principles of hydrogen bonding and steric constraints. The specificity of the fits is remarkably consistent with the genetic code. Images PMID:6950386

  13. Identification of internal transcribed spacer sequence motifs in truffles: a first step toward their DNA bar coding.

    PubMed

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-08-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  14. A pathogenic non-coding RNA induces changes in dynamic DNA methylation of ribosomal RNA genes in host plants.

    PubMed

    Martinez, German; Castellano, Mayte; Tortosa, Maria; Pallas, Vicente; Gomez, Gustavo

    2014-02-01

    Viroids are plant-pathogenic non-coding RNAs able to interfere with as yet poorly known host-regulatory pathways and to cause alterations recognized as diseases. The way in which these RNAs coerce the host to express symptoms remains to be totally deciphered. In recent years, diverse studies have proposed a close interplay between viroid-induced pathogenesis and RNA silencing, supporting the belief that viroid-derived small RNAs mediate the post-transcriptional cleavage of endogenous mRNAs by acting as elicitors of symptoms expression. Although the evidence supporting the role of viroid-derived small RNAs in pathogenesis is robust, the possibility that this phenomenon can be a more complex process, also involving viroid-induced alterations in plant gene expression at transcriptional levels, has been considered. Here we show that plants infected with the 'Hop stunt viroid' accumulate high levels of sRNAs derived from ribosomal transcripts. This effect was correlated with an increase in the transcription of ribosomal RNA (rRNA) precursors during infection. We observed that the transcriptional reactivation of rRNA genes correlates with a modification of DNA methylation in their promoter region and revealed that some rRNA genes are demethylated and transcriptionally reactivated during infection. This study reports a previously unknown mechanism associated with viroid (or any other pathogenic RNA) infection in plants providing new insights into aspects of host alterations induced by the viroid infectious cycle. PMID:24178032

  15. cDNA sequence coding for the alpha'-chain of the third complement component in the African lungfish.

    PubMed

    Sato, A; Sültmann, H; Mayer, W E; Figueroa, F; Tichy, H; Klein, J

    1999-04-01

    cDNA clones coding for almost the entire C3 alpha-chain of the African lungfish (Protopterus aethiopicus), a representative of the Sarcopterygii (lobe-finned fishes), were sequenced and characterized. From the sequence it is deduced that the lungfish C3 molecule is probably a disulphide-bonded alpha:beta dimer similar to that of the C3 components of other jawed vertebrates. The deduced sequence contains conserved sites presumably recognized by proteolytic enzymes (e.g. factor I) involved in the activation and inactivation of the component. It also contains the conserved thioester region and the putative site for binding properdin. However, the site for the interaction with complement receptor 2 and factor H are poorly conserved. Either complement receptor 2 and factor H are not present in the lungfish or they bind to different residues at the same or a different site than mammalian complement receptor 2 and factor H. The C3 alpha-chain sequences faithfully reflect the phylogenetic relationships among vertebrate classes and can therefore be used to help to resolve the long-standing controversy concerning the origin of the tetrapods. PMID:10219761

  16. Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

    PubMed

    Qiu, Guo-Hua

    2016-01-01

    In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense. PMID:27036064

  17. Systematic comparison of gene expression through analysis of cDNA fragments within or near to the protein-coding region.

    PubMed

    Ke, Y; Jing, C; Rudland, P S; Smith, P H; Foster, C S

    1999-02-01

    Life is controlled by the timely and ordered expression of genes. Identification of important genes involved in specific physiological and pathological conditions requires efficient methods to analyse differential gene expression. We describe a novel strategy, namely complete comparison of gene expression (CCGE), for a systematic assessment of differentially expressed genes. Using the CCGE method, double-stranded cDNA is digested with two restriction enzymes that cut with different frequencies, the representative cDNA fragments are generated within or near to the protein-coding region. After being flanked by two different types of adapters, and amplified by a nested suppression PCR, the selected cDNA fragments, representing entire cDNA population, can be divided into 256 subsets; amplified and compared in a systematic manner. PMID:9889292

  18. Isolation and characterization of a cDNA clone for the complete protein coding region of the delta subunit of the mouse acetylcholine receptor.

    PubMed Central

    LaPolla, R J; Mayne, K M; Davidson, N

    1984-01-01

    A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870

  19. DNA.

    ERIC Educational Resources Information Center

    Felsenfeld, Gary

    1985-01-01

    Structural form, bonding scheme, and chromatin structure of and gene-modification experiments with deoxyribonucleic acid (DNA) are described. Indicates that DNA's double helix is variable and also flexible as it interacts with regulatory and other molecules to transfer hereditary messages. (DH)

  20. New Insights into the Lake Chad Basin Population Structure Revealed by High-Throughput Genotyping of Mitochondrial DNA Coding SNPs

    PubMed Central

    Černý, Viktor; Carracedo, Ángel

    2011-01-01

    Background Located in the Sudan belt, the Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed. Both from an archaeological and a genetic point of view, this region has been interpreted to be the center of a bidirectional corridor connecting West and East Africa, as well as a meeting point for populations coming from North Africa through the Saharan desert. Methodology/Principal Findings Samples from twelve ethnic groups from the Chad Basin (n = 542) have been high-throughput genotyped for 230 coding region mitochondrial DNA (mtDNA) Single Nucleotide Polymorphisms (mtSNPs) using Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) mass spectrometry. This set of mtSNPs allowed for much better phylogenetic resolution than previous studies of this geographic region, enabling new insights into its population history. Notable haplogroup (hg) heterogeneity has been observed in the Chad Basin mirroring the different demographic histories of these ethnic groups. As estimated using a Bayesian framework, nomadic populations showed negative growth which was not always correlated to their estimated effective population sizes. Nomads also showed lower diversity values than sedentary groups. Conclusions/Significance Compared to sedentary population, nomads showed signals of stronger genetic drift occurring in their ancestral populations. These populations, however, retained more haplotype diversity in their hypervariable segments I (HVS-I), but not their mtSNPs, suggesting a more ancestral ethnogenesis. Whereas the nomadic population showed a higher Mediterranean influence signaled mainly by sub-lineages of M1, R0, U6, and U5, the other populations showed a more consistent sub-Saharan pattern. Although lifestyle may have an influence on diversity patterns and hg composition, analysis of molecular variance has not identified these differences. The present study indicates that analysis of mt

  1. DNA-LCEB: a high-capacity and mutation-resistant DNA data-hiding approach by employing encryption, error correcting codes, and hybrid twofold and fourfold codon-based strategy for synonymous substitution in amino acids.

    PubMed

    Hafeez, Ibbad; Khan, Asifullah; Qadir, Abdul

    2014-11-01

    Data-hiding in deoxyribonucleic acid (DNA) sequences can be used to develop an organic memory and to track parent genes in an offspring as well as in genetically modified organism. However, the main concerns regarding data-hiding in DNA sequences are the survival of organism and successful extraction of watermark from DNA. This implies that the organism should live and reproduce without any functional disorder even in the presence of the embedded data. Consequently, performing synonymous substitution in amino acids for watermarking becomes a primary option. In this regard, a hybrid watermark embedding strategy that employs synonymous substitution in both twofold and fourfold codons of amino acids is proposed. This work thus presents a high-capacity and mutation-resistant watermarking technique, DNA-LCEB, for hiding secret information in DNA of living organisms. By employing the different types of synonymous codons of amino acids, the data storage capacity has been significantly increased. It is further observed that the proposed DNA-LCEB employing a combination of synonymous substitution, lossless compression, encryption, and Bose-Chaudary-Hocquenghem coding is secure and performs better in terms of both capacity and robustness compared to existing DNA data-hiding schemes. The proposed DNA-LCEB is tested against different mutations, including silent, miss-sense, and non-sense mutations, and provides substantial improvement in terms of mutation detection/correction rate and bits per nucleotide. A web application for DNA-LCEB is available at http://111.68.99.218/DNA-LCEB. PMID:25195035

  2. Detecting selection in the blue crab, Callinectes sapidus, using DNA sequence data from multiple nuclear protein-coding genes.

    PubMed

    Yednock, Bree K; Neigel, Joseph E

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available. PMID:24896825

  3. Gene control in eukaryotes and the c-value paradox "excess" DNA as an impediment to transcription of coding sequences.

    PubMed

    Zuckerkandl, E

    1976-12-31

    Ways in which control of gene activity may lead to the observed high DNA content per haploid eukaryote genome are examined. It is proposed that deoxyribonucleoprotein (DNP) acts as a barrier to transcription at two distinct structural levels. At the lower level, melting of the nucleosome supercoil (quaternary structure) and of the nucleosomes (tertiary structure) might be brought about by the process of transcription itself. After unwinding the barrier section, the polymerase would eventually reach the structural gene. The transcripts of noncoding sequences, at least as far as their "unique" sequence components are concerned, may thus have filled their main function through the very process of transcription. The possibility of an inverse relationship between the length of the DNP barrier and the rates of transcription of the coding sequences is to some extent supported by available data. Different modes of coordination between the transcription of mRNA and of hnRNA from a single functional unit of gene action (funga) are considered. An analysis of gene control at high structural levels of DNP is made on the basis of other data, in relation to the concepts of eurygenic and stenogenic control. The concept of a euryon is introduced, namely of a set of linked fugas under common eurygenic control. Structure of order higher than quaternary can be inferred to exist in larger chromomeres of polytene chromosomes and in corresponding sections of ordinary chromosomes. Only moderate amounts of highest order interphase euchromatic structure are likely to be able to be accomodated in average chromomeres and none in very thin chromomeres. Puffs are interpreted as the melting of highest order interphase structure, and the absence of puffs during transcription as the absence of this highest order structure in the resting state of the chromomeres. Genes that are constantly active in all tissues may dispense with highest order interphase structure and with the corresponding control

  4. Screening for Functional Non-coding Genetic Variants Using Electrophoretic Mobility Shift Assay (EMSA) and DNA-affinity Precipitation Assay (DAPA).

    PubMed

    Miller, Daniel E; Patel, Zubin H; Lu, Xiaoming; Lynch, Arthur T; Weirauch, Matthew T; Kottyan, Leah C

    2016-01-01

    Population and family-based genetic studies typically result in the identification of genetic variants that are statistically associated with a clinical disease or phenotype. For many diseases and traits, most variants are non-coding, and are thus likely to act by impacting subtle, comparatively hard to predict mechanisms controlling gene expression. Here, we describe a general strategic approach to prioritize non-coding variants, and screen them for their function. This approach involves computational prioritization using functional genomic databases followed by experimental analysis of differential binding of transcription factors (TFs) to risk and non-risk alleles. For both electrophoretic mobility shift assay (EMSA) and DNA affinity precipitation assay (DAPA) analysis of genetic variants, a synthetic DNA oligonucleotide (oligo) is used to identify factors in the nuclear lysate of disease or phenotype-relevant cells. For EMSA, the oligonucleotides with or without bound nuclear factors (often TFs) are analyzed by non-denaturing electrophoresis on a tris-borate-EDTA (TBE) polyacrylamide gel. For DAPA, the oligonucleotides are bound to a magnetic column and the nuclear factors that specifically bind the DNA sequence are eluted and analyzed through mass spectrometry or with a reducing sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) followed by Western blot analysis. This general approach can be widely used to study the function of non-coding genetic variants associated with any disease, trait, or phenotype. PMID:27585267

  5. H3.3 demarcates GC-rich coding and subtelomeric regions and serves as potential memory mark for virulence gene expression in Plasmodium falciparum.

    PubMed

    Fraschka, Sabine Anne-Kristin; Henderson, Rob Wilhelmus Maria; Bártfai, Richárd

    2016-01-01

    Histones, by packaging and organizing the DNA into chromatin, serve as essential building blocks for eukaryotic life. The basic structure of the chromatin is established by four canonical histones (H2A, H2B, H3 and H4), while histone variants are more commonly utilized to alter the properties of specific chromatin domains. H3.3, a variant of histone H3, was found to have diverse localization patterns and functions across species but has been rather poorly studied in protists. Here we present the first genome-wide analysis of H3.3 in the malaria-causing, apicomplexan parasite, P. falciparum, which revealed a complex occupancy profile consisting of conserved and parasite-specific features. In contrast to other histone variants, PfH3.3 primarily demarcates euchromatic coding and subtelomeric repetitive sequences. Stable occupancy of PfH3.3 in these regions is largely uncoupled from the transcriptional activity and appears to be primarily dependent on the GC-content of the underlying DNA. Importantly, PfH3.3 specifically marks the promoter region of an active and poised, but not inactive antigenic variation (var) gene, thereby potentially contributing to immune evasion. Collectively, our data suggest that PfH3.3, together with other histone variants, indexes the P. falciparum genome to functionally distinct domains and contribute to a key survival strategy of this deadly pathogen. PMID:27555062

  6. H3.3 demarcates GC-rich coding and subtelomeric regions and serves as potential memory mark for virulence gene expression in Plasmodium falciparum

    PubMed Central

    Fraschka, Sabine Anne-Kristin; Henderson, Rob Wilhelmus Maria; Bártfai, Richárd

    2016-01-01

    Histones, by packaging and organizing the DNA into chromatin, serve as essential building blocks for eukaryotic life. The basic structure of the chromatin is established by four canonical histones (H2A, H2B, H3 and H4), while histone variants are more commonly utilized to alter the properties of specific chromatin domains. H3.3, a variant of histone H3, was found to have diverse localization patterns and functions across species but has been rather poorly studied in protists. Here we present the first genome-wide analysis of H3.3 in the malaria-causing, apicomplexan parasite, P. falciparum, which revealed a complex occupancy profile consisting of conserved and parasite-specific features. In contrast to other histone variants, PfH3.3 primarily demarcates euchromatic coding and subtelomeric repetitive sequences. Stable occupancy of PfH3.3 in these regions is largely uncoupled from the transcriptional activity and appears to be primarily dependent on the GC-content of the underlying DNA. Importantly, PfH3.3 specifically marks the promoter region of an active and poised, but not inactive antigenic variation (var) gene, thereby potentially contributing to immune evasion. Collectively, our data suggest that PfH3.3, together with other histone variants, indexes the P. falciparum genome to functionally distinct domains and contribute to a key survival strategy of this deadly pathogen. PMID:27555062

  7. Reduced-Median-Network Analysis of Complete Mitochondrial DNA Coding-Region Sequences for the Major African, Asian, and European Haplogroups

    PubMed Central

    Herrnstadt, Corinna; Elson, Joanna L.; Fahy, Eoin; Preston, Gwen; Turnbull, Douglass M.; Anderson, Christen; Ghosh, Soumitra S.; Olefsky, Jerrold M.; Beal, M. Flint; Davis, Robert E.; Howell, Neil

    2002-01-01

    The evolution of the human mitochondrial genome is characterized by the emergence of ethnically distinct lineages or haplogroups. Nine European, seven Asian (including Native American), and three African mitochondrial DNA (mtDNA) haplogroups have been identified previously on the basis of the presence or absence of a relatively small number of restriction-enzyme recognition sites or on the basis of nucleotide sequences of the D-loop region. We have used reduced-median-network approaches to analyze 560 complete European, Asian, and African mtDNA coding-region sequences from unrelated individuals to develop a more complete understanding of sequence diversity both within and between haplogroups. A total of 497 haplogroup-associated polymorphisms were identified, 323 (65%) of which were associated with one haplogroup and 174 (35%) of which were associated with two or more haplogroups. Approximately one-half of these polymorphisms are reported for the first time here. Our results confirm and substantially extend the phylogenetic relationships among mitochondrial genomes described elsewhere from the major human ethnic groups. Another important result is that there were numerous instances both of parallel mutations at the same site and of reversion (i.e., homoplasy). It is likely that homoplasy in the coding region will confound evolutionary analysis of small sequence sets. By a linkage-disequilibrium approach, additional evidence for the absence of human mtDNA recombination is presented here. PMID:11938495

  8. Massively parallel sequencing of the entire control region and targeted coding region SNPs of degraded mtDNA using a simplified library preparation method.

    PubMed

    Lee, Eun Young; Lee, Hwan Young; Oh, Se Yoon; Jung, Sang-Eun; Yang, In Seok; Lee, Yang-Han; Yang, Woo Ick; Shin, Kyoung-Jin

    2016-05-01

    The application of next-generation sequencing (NGS) to forensic genetics is being explored by an increasing number of laboratories because of the potential of high-throughput sequencing for recovering genetic information from multiple markers and multiple individuals in a single run. A cumbersome and technically challenging library construction process is required for NGS. In this study, we propose a simplified library preparation method for mitochondrial DNA (mtDNA) analysis that involves two rounds of PCR amplification. In the first-round of multiplex PCR, six fragments covering the entire mtDNA control region and 22 fragments covering interspersed single nucleotide polymorphisms (SNPs) in the coding region that can be used to determine global haplogroups and East Asian haplogroups were amplified using template-specific primers with read sequences. In the following step, indices and platform-specific sequences for the MiSeq(®) system (Illumina) were added by PCR. The barcoded library produced using this simplified workflow was successfully sequenced on the MiSeq system using the MiSeq Reagent Nano Kit v2. A total of 0.4 GB of sequences, 80.6% with base quality of >Q30, were obtained from 12 degraded DNA samples and mapped to the revised Cambridge Reference Sequence (rCRS). A relatively even read count was obtained for all amplicons, with an average coverage of 5200 × and a less than three-fold read count difference between amplicons per sample. Control region sequences were successfully determined, and all samples were assigned to the relevant haplogroups. In addition, enhanced discrimination was observed by adding coding region SNPs to the control region in in silico analysis. Because the developed multiplex PCR system amplifies small-sized amplicons (<250 bp), NGS analysis using the library preparation method described here allows mtDNA analysis using highly degraded DNA samples. PMID:26844917

  9. A sandwich-hybridization assay for simultaneous determination of HIV and tuberculosis DNA targets based on signal amplification by quantum dots-PowerVision™ polymer coding nanotracers.

    PubMed

    Yan, Zhongdan; Gan, Ning; Zhang, Huairong; Wang, De; Qiao, Li; Cao, Yuting; Li, Tianhua; Hu, Futao

    2015-09-15

    A novel sandwich-hybridization assay for simultaneous electrochemical detection of multiple DNA targets related to human immune deficiency virus (HIV) and tuberculosis (TB) was developed based on the different quantum dots-PowerVision(TM) polymer nanotracers. The polymer nanotracers were respectively fabricated by immobilizing SH-labeled oligonucleotides (s-HIV or s-TB), which can partially hybrid with virus DNA (HIV or TB), on gold nanoparticles (Au NPs) and then modified with PowerVision(TM) (PV) polymer-encapsulated quantum dots (CdS or PbS) as signal tags. PV is a dendrimer enzyme linked polymer, which can immobilize abundant QDs to amplify the stripping voltammetry signals from the metal ions (Pb or Cd). The capture probes were prepared through the immobilization of SH-labeled oligonucleotides, which can complementary with HIV and TB DNA, on the magnetic Fe3O4@Au (GMPs) beads. After sandwich-hybridization, the polymer nanotracers together with HIV and TB DNA targets were simultaneously introduced onto the surface of GMPs. Then the two encoding metal ions (Cd(2+) and Pb(2+)) were used to differentiate two viruses DNA due to the different subsequent anodic stripping voltammetric peaks at -0.84 V (Cd) and -0.61 V (Pb). Because of the excellent signal amplification of the polymer nanotracers and the great specificity of DNA targets, this assay could detect targets DNA as low as 0.2 femtomolar and exhibited excellent selectivity with the dynamitic range from 0.5 fM to 500 pM. Those results demonstrated that this electrochemical coding assay has great potential in applications for screening more viruses DNA while changing the probes. PMID:25911447

  10. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea

    PubMed Central

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190

  11. DNA

    ERIC Educational Resources Information Center

    Stent, Gunther S.

    1970-01-01

    This history for molecular genetics and its explanation of DNA begins with an analysis of the Golden Jubilee essay papers, 1955. The paper ends stating that the higher nervous system is the one major frontier of biological inquiry which still offers some romance of research. (Author/VW)

  12. Genetic code evolution reveals the neutral emergence of mutational robustness, and information as an evolutionary constraint.

    PubMed

    Massey, Steven E

    2015-01-01

    The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of "neutral emergence". The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these "pseudaptations", and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an "unfreezing" of the codon - amino acid mapping that defines the genetic code, consistent with Crick's Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content between organisms, a

  13. Genetic Code Evolution Reveals the Neutral Emergence of Mutational Robustness, and Information as an Evolutionary Constraint

    PubMed Central

    Massey, Steven E.

    2015-01-01

    The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of “neutral emergence”. The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these “pseudaptations”, and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an “unfreezing” of the codon – amino acid mapping that defines the genetic code, consistent with Crick’s Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content

  14. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  15. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  16. Salamander Hox clusters contain repetitive DNA and expanded non-coding regions: a typical Hox structure for non-mammalian tetrapod vertebrates?

    PubMed Central

    2013-01-01

    Hox genes encode transcription factors that regulate embryonic and post-embryonic developmental processes. The expression of Hox genes is regulated in part by the tight, spatial arrangement of conserved coding and non-coding sequences. The potential for evolutionary changes in Hox cluster structure is thought to be low among vertebrates; however, recent studies of a few non-mammalian taxa suggest greater variation than originally thought. Using next generation sequencing of large genomic fragments (>100 kb) from the red spotted newt (Notophthalamus viridescens), we found that the arrangement of Hox cluster genes was conserved relative to orthologous regions from other vertebrates, but the length of introns and intergenic regions varied. In particular, the distance between hoxd13 and hoxd11 is longer in newt than orthologous regions from vertebrate species with expanded Hox clusters and is predicted to exceed the length of the entire HoxD clusters (hoxd13–hoxd4) of humans, mice, and frogs. Many repetitive DNA sequences were identified for newt Hox clusters, including an enrichment of DNA transposon-like sequences relative to non-coding genomic fragments. Our results suggest that Hox cluster expansion and transposon accumulation are common features of non-mammalian tetrapod vertebrates. PMID:23561734

  17. Undetectable levels of N6-methyl adenine in mouse DNA: Cloning and analysis of PRED28, a gene coding for a putative mammalian DNA adenine methyltransferase.

    PubMed

    Ratel, David; Ravanat, Jean-Luc; Charles, Marie-Pierre; Platet, Nadine; Breuillaud, Lionel; Lunardi, Joël; Berger, François; Wion, Didier

    2006-05-29

    Three methylated bases, 5-methylcytosine, N4-methylcytosine and N6-methyladenine (m6A), can be found in DNA. However, to date, only 5-methylcytosine has been detected in mammalian genomes. To reinvestigate the presence of m6A in mammalian DNA, we used a highly sensitive method capable of detecting one N6-methyldeoxyadenosine per million nucleosides. Our results suggest that the total mouse genome contains, if any, less than 10(3) m6A. Experiments were next performed on PRED28, a putative mammalian N6-DNA methyltransferase. The murine PRED28 encodes two alternatively spliced RNA. However, although recombinant PRED28 proteins are found in the nucleus, no evidence for an adenine-methyltransferase activity was detected. PMID:16684535

  18. Cloning and sequence of a cDNA coding for the human beta-migrating endothelial-cell-type plasminogen activator inhibitor.

    PubMed Central

    Ny, T; Sawdey, M; Lawrence, D; Millan, J L; Loskutoff, D J

    1986-01-01

    A lambda gt11 expression library containing cDNA inserts prepared from human placental mRNA was screened immunologically using an antibody probe developed against the beta-migrating plasminogen activator inhibitor (beta-PAI) purified from cultured bovine aortic endothelial cells. Thirty-four positive clones were isolated after screening 7 X 10(5) phages. Three clones (lambda 1.2, lambda 3, and lambda 9.2) were randomly picked and further characterized. These contained inserts 1.9, 3.0, and 1.9 kilobases (kb) long, respectively. Escherichia coli lysogenic for lambda 9.2, but not for lambda gt11, produced a fusion protein of 180 kDa that was recognized by affinity-purified antibodies against the bovine aortic endothelial cell beta-PAI and had beta-PAI activity when analyzed by reverse fibrin autography. The largest cDNA insert was sequenced and shown to be 2944 base pairs (bp) long. It has a large 3' untranslated region [1788 bp, excluding the poly(A) tail] and contains the entire coding region of the mature protein but lacks the initiation codon and part of the signal peptide coding region at the 5' terminus. The two clones carrying the 1.9-kb cDNA inserts were partially sequenced and shown to be identical to the 3.0-kb cDNA except that they were truncated, lacking much of the 3' untranslated region. Blot hybridization analysis of electrophoretically fractionated RNA from the human fibrosarcoma cell line HT-1080 was performed using the 3.0-kb cDNA as hybridization probe. Two distinct transcripts, 2.2 and 3.0 kb, were detected, suggesting that the 1.9-kb cDNA may have been copied from the shorter RNA transcript. The amino acid sequence deduced from the cDNA was aligned with the NH2-terminal sequence of the human beta-PAI. Based on this alignment, the mature human beta-PAI is 379 amino acids long and contains an NH2-terminal valine. The deduced amino acid sequence has extensive (30%) homology with alpha 1-antitrypsin and antithrombin III, indicating that the beta

  19. Cloning and Stable Expression of cDNA Coding For Platelet Endothelial Cell Adhesion Molecule -1 (PECAM-1, CD31) in NIH-3T3 Cell Line

    PubMed Central

    Salehi-Lalemarzi, Hamed; Shanehbandi, Dariush; Shafaghat, Farzaneh; Abbasi-Kenarsari, Hajar; Baradaran, Behzad; Movassaghpour, Ali Akbar; Kazemi, Tohid

    2015-01-01

    Purpose: PECAM-1 (CD31) is a glycoprotein expressed on endothelial and bone marrow precursor cells. It plays important roles in angiogenesis, maintenance and integration of the cytoskeleton and direction of leukocytes to the site of inflammation. We aimed to clone the cDNA coding for human CD31 from KG1a for further subcloning and expression in NIH-3T3 mouse cell line. Methods: CD31 cDNA was cloned from KG1a cell line after total RNA extraction and cDNA synthesis. Pfu DNA polymerase-amplified specific band was ligated to pGEMT-easy vector and sub-cloned in pCMV6-Neo expression vector. After transfection of NIH-3T3 cells using 3 μg of recombinant construct and 6 μl of JetPEI transfection reagent, stable expression was obtained by selection of cells by G418 antibiotic and confirmed by surface flow cytometry. Results: 2235 bp specific band was aligned completely to human CD31 reference sequence in NCBI database. Transient and stable expression of human CD31 on transfected NIH-3T3 mouse fibroblast cells was achieved (23% and 96%, respectively) as shown by flow cytometry. Conclusion: Due to murine origin of NIH-3T3 cell line, CD31-expressing NIH-3T3 cells could be useful as immunogen in production of diagnostic monoclonal antibodies against human CD31, with no need for purification of recombinant proteins. PMID:26236664

  20. Cloning and expression of a cDNA coding for the human platelet-derived growth factor receptor: Evidence for more than one receptor class

    SciTech Connect

    Gronwald, R.G.K.; Grant, F.J.; Haldeman, B.A.; Hart, C.E.; O'Hara, P.J.; Hagen, F.S.; Ross, R.; Bowen-Pope, D.F.; Murray, M.J. )

    1988-05-01

    The complete nucleotide sequence of a cDNA encoding the human platelet-derived growth factor (PDGF) receptor is presented. The cDNA contains an open reading frame that codes for a protein of 1106 amino acids. Comparison to the mouse PDGF receptor reveals an overall amino acid sequence identity of 86%. This sequence identity rises to 98% in the cytoplasmic split tyrosine kinase domain. RNA blot hybridization analysis of poly(A){sup +} RNA from human dermal fibroblasts detects a major and a minor transcript using the cDNA as a probe. Baby hamster kidney cells, transfected with an expression vector containing the receptor cDNA, express an {approx} 190-kDa cell surface protein that is recognized by an anti-human PDGF receptor antibody. The recombinant PDGF receptor is functional in the transfected baby hamster kidney cells as demonstrated by ligand-induced phosphorylation of the receptor. Binding properties of the recombinant PDGF receptor were also assessed with pure preparations of BB and AB isoforms of PDGF. Unlike human dermal fibroblasts, which bind both isoforms with high affinity, the transfected baby hamster kidney cells bind only the BB isoform of PDGF with high affinity. This observation is consistent with the existence of more than one PDGF receptor class.

  1. Analysis of the coding potential of the ORF in the control region of the female-transmitted Mytilus mtDNA.

    PubMed

    Minoiu, Ioana; Burzyński, Artur; Breton, Sophie

    2016-01-15

    Key elements in determining the sex-specific transmission of the female and male mitochondrial genomes in Mytilus species with doubly uniparental inheritance of mtDNA are suspected to be contained in the control region. A novel F genome-specific open reading frame (ORF) identified in this region has previously been hypothesized to be involved in the DUI mechanism. In their recent work Kyriakou et al. (2014a) questioned the functionality of this ORF. Here, we present evidence that this ORF is transcribed and may thus code for a functional product. PMID:26424598

  2. XR-C1, a new CHO cell mutant which is defective in DNA-PKcs, is impaired in both V(D)J coding and signal joint formation.

    PubMed Central

    Errami, A; He, D M; Friedl, A A; Overkamp, W J; Morolli, B; Hendrickson, E A; Eckardt-Schupp, F; Oshimura, M; Lohman, P H; Jackson, S P; Zdzienicka, M Z

    1998-01-01

    DNA-dependent protein kinase (DNA-PK) plays an important role in DNA double-strand break (DSB) repair and V(D)J recombination. We have isolated a new X-ray-sensitive CHO cell line, XR-C1, which is impaired in DSB repair and which was assigned to complementation group 7, the group that is defective in the XRCC7 / SCID ( Prkdc ) gene encoding the catalytic subunit of DNA-PK (DNA-PKcs). Consistent with this complementation analysis, XR-C1 cells lackeddetectable DNA-PKcs protein, did not display DNA-PK catalytic activity and were complemented by the introduction of a single human chromosome 8 (providing the Prkdc gene). The impact of the XR-C1 mutation on V(D)J recombination was quite different from that found in most rodent cells defective in DNA-PKcs, which are preferentially blocked in coding joint formation, whereas XR-C1 cells were defective in forming both coding and signal joints. These results suggest that DNA-PKcs is required for both coding and signal joint formation during V(D)J recombination and that the XR-C1 mutant cell line may prove to be a useful tool in understanding this pathway. PMID:9628911

  3. Ribosomal DNA analysis of tsetse and non-tsetse transmitted Ethiopian Trypanosoma vivax strains in view of improved molecular diagnosis.

    PubMed

    Fikru, Regassa; Matetovici, Irina; Rogé, Stijn; Merga, Bekana; Goddeeris, Bruno Maria; Büscher, Philippe; Van Reet, Nick

    2016-04-15

    Animal trypanosomosis caused by Trypanosoma vivax (T. vivax) is a devastating disease causing serious economic losses. Most molecular diagnostics for T. vivax infection target the ribosomal DNA locus (rDNA) but are challenged by the heterogeneity among T. vivax strains. In this study, we investigated the rDNA heterogeneity of Ethiopian T. vivax strains in relation to their presence in tsetse-infested and tsetse-free areas and its effect on molecular diagnosis. We sequenced the rDNA loci of six Ethiopian (three from tsetse-infested and three from tsetse-free areas) and one Nigerian T. vivax strain. We analysed the obtained sequences in silico for primer-mismatches of some commonly used diagnostic PCR assays and for GC content. With these data, we selected some rDNA diagnostic PCR assays for evaluation of their diagnostic accuracy. Furthermore we constructed two phylogenetic networks based on sequences within the smaller subunit (SSU) of 18S and within the 5.8S and internal transcribed spacer 2 (ITS2) to assess the relatedness of Ethiopian T. vivax strains to strains from other African countries and from South America. In silico analysis of the rDNA sequence showed important mismatches of some published diagnostic PCR primers and high GC content of T. vivax rDNA. The evaluation of selected diagnostic PCR assays with specimens from cattle under natural T. vivax challenge showed that this high GC content interferes with the diagnostic accuracy of PCR, especially in cases of mixed infections with T. congolense. Adding betain to the PCR reaction mixture can enhance the amplification of T. vivax rDNA but decreases the sensitivity for T. congolense and Trypanozoon. The networks illustrated that Ethiopian T. vivax strains are considerably heterogeneous and two strains (one from tsetse-infested and one from tsetse-free area) are more related to the West African and South American strains than to the East African strains. The rDNA locus sequence of six Ethiopian T. vivax

  4. Sequence analysis of a non-classified, non-occluded DNA virus that causes salivary gland hypertrophy of Musca domestica, MdSGHV

    PubMed Central

    Garcia-Maruniak, Alejandra; Maruniak, James E.; Farmerie, William; Boucias, Drion G.

    2008-01-01

    The genome of the virus that causes salivary gland hypertrophy in Musca domestica (MdSGHV) was sequenced. This non-classified, enveloped, double stranded, circular DNA virus had a 124,279 bp genome. The G+C content was 43.5% with 108 putative methionine-initiated open reading frames (ORFs). Thirty ORFs had homology to database proteins: eleven to proteins coded by both baculoviruses and nudiviruses (p74, pif-1, pif-2, pif-3, odv-e66, rr1, rr2, iap, dUTPase, MMP, and Ac81-like), seven to nudiviruses (mcp, dhfr, ts, tk and three unknown proteins), one to baculovirus (Ac150-like), one to herpesvirus (dna pol), and ten to cellular proteins. Mass spectrum analysis of the viral particles’ protein components identified 29 structural ORFs, with only p74 and odv-e66 previously characterized as baculovirus structural proteins. Although most of the homology observed was to nudiviruses, phylogenetic analysis showed that MdSGHV was not closely related to them or to the baculoviruses. PMID:18495197

  5. Temporal and spatial trends in prey composition of wahoo Acanthocybium solandri: a diet analysis from the central North Pacific Ocean using visual and DNA bar-coding techniques.

    PubMed

    Oyafuso, Z S; Toonen, R J; Franklin, E C

    2016-04-01

    A diet analysis was conducted on 444 wahoo Acanthocybium solandri caught in the central North Pacific Ocean longline fishery and a nearshore troll fishery surrounding the Hawaiian Islands from June to December 2014. In addition to traditional observational methods of stomach contents, a DNA bar-coding approach was integrated into the analysis by sequencing the cytochrome c oxidase subunit 1 (COI) region of the mtDNA genome to taxonomically identify individual prey items that could not be classified visually to species. For nearshore-caught A. solandri, juvenile pre-settlement reef fish species from various families dominated the prey composition during the summer months, followed primarily by Carangidae in autumn months. Gempylidae, Echeneidae and Scombridae were dominant prey taxa from the offshore fishery. Molidae was a common prey family found in stomachs collected north-east of the Hawaiian Archipelago while tetraodontiform reef fishes, known to have extended pelagic stages, were prominent prey items south-west of the Hawaiian Islands. The diet composition of A. solandri was indicative of an adaptive feeder and thus revealed dominant geographic and seasonal abundances of certain taxa from various ecosystems in the marine environment. The addition of molecular bar-coding to the traditional visual method of prey identifications allowed for a more comprehensive range of the prey field of A. solandri to be identified and should be used as a standard component in future diet studies. PMID:27059148

  6. Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) - Definition of a Distinct Class of Begomovirus-Associated Satellites.

    PubMed

    Lozano, Gloria; Trenado, Helena P; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W; Navas-Castillo, Jesús

    2016-01-01

    Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem-loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem-loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed. PMID:26925037

  7. Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) – Definition of a Distinct Class of Begomovirus-Associated Satellites

    PubMed Central

    Lozano, Gloria; Trenado, Helena P.; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W.; Navas-Castillo, Jesús

    2016-01-01

    Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem–loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem–loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed. PMID:26925037

  8. Arabidopsis RNASE THREE LIKE2 Modulates the Expression of Protein-Coding Genes via 24-Nucleotide Small Interfering RNA-Directed DNA Methylation[OPEN

    PubMed Central

    Hachet, Mélanie; Comella, Pascale; Zytnicki, Matthias; Vaucheret, Hervé

    2016-01-01

    RNaseIII enzymes catalyze the cleavage of double-stranded RNA (dsRNA) and have diverse functions in RNA maturation. Arabidopsis thaliana RNASE THREE LIKE2 (RTL2), which carries one RNaseIII and two dsRNA binding (DRB) domains, is a unique Arabidopsis RNaseIII enzyme resembling the budding yeast small interfering RNA (siRNA)-producing Dcr1 enzyme. Here, we show that RTL2 modulates the production of a subset of small RNAs and that this activity depends on both its RNaseIII and DRB domains. However, the mode of action of RTL2 differs from that of Dcr1. Whereas Dcr1 directly cleaves dsRNAs into 23-nucleotide siRNAs, RTL2 likely cleaves dsRNAs into longer molecules, which are subsequently processed into small RNAs by the DICER-LIKE enzymes. Depending on the dsRNA considered, RTL2-mediated maturation either improves (RTL2-dependent loci) or reduces (RTL2-sensitive loci) the production of small RNAs. Because the vast majority of RTL2-regulated loci correspond to transposons and intergenic regions producing 24-nucleotide siRNAs that guide DNA methylation, RTL2 depletion modifies DNA methylation in these regions. Nevertheless, 13% of RTL2-regulated loci correspond to protein-coding genes. We show that changes in 24-nucleotide siRNA levels also affect DNA methylation levels at such loci and inversely correlate with mRNA steady state levels, thus implicating RTL2 in the regulation of protein-coding gene expression. PMID:26764378

  9. Cloning and sequence analysis of cDNA coding for a lectin from Helianthus tuberosus callus and its jasmonate-induced expression.

    PubMed

    Nakagawa, R; Yasokawa, D; Okumura, Y; Nagashima, K

    2000-06-01

    Two lectins (designated as HTA I and HTA II) that seemed to be isolectins were found in Helianthus tuberosus callus. cDNA encoding HTA I was isolated from a ZAP Express expression library by immunoselection by using the anti-HTA antiserum. The sequence of this cDNA consisted of 432 bp nucleotides coding for a polypeptide of 143 amino acid residues (Mr, 15,314). When introduced into E. coli, the cDNA directed the synthesis of active HTA I as indicated by the hemagglutination activity. The deduced amino acid sequence showed homology with some lectins and jasmonate-induced proteins. When callus was cultured in the presence of methyl jasmonate (MeJA), the hemagglutination activity increased in a dose-dependent manner. The levels of expression of the HTA protein and of the corresponding mRNA also increased in the treated callus. In view of these results, HTA I is considered to be a jasmonate-induced protein. PMID:10923797

  10. The non-coding B2 RNA binds to the DNA cleft and active-site region of RNA polymerase II.

    PubMed

    Ponicsan, Steven L; Houel, Stephane; Old, William M; Ahn, Natalie G; Goodrich, James A; Kugel, Jennifer F

    2013-10-01

    The B2 family of short interspersed elements is transcribed into non-coding RNA by RNA polymerase III. The ~180-nt B2 RNA has been shown to potently repress mRNA transcription by binding tightly to RNA polymerase II (Pol II) and assembling with it into complexes on promoter DNA, where it keeps the polymerase from properly engaging the promoter DNA. Mammalian Pol II is an ~500-kDa complex that contains 12 different protein subunits, providing many possible surfaces for interaction with B2 RNA. We found that the carboxy-terminal domain of the largest Pol II subunit was not required for B2 RNA to bind Pol II and repress transcription in vitro. To identify the surface on Pol II to which the minimal functional region of B2 RNA binds, we coupled multi-step affinity purification, reversible formaldehyde cross-linking, peptide sequencing by mass spectrometry, and analysis of peptide enrichment. The Pol II peptides most highly recovered after cross-linking to B2 RNA mapped to the DNA binding cleft and active-site region of Pol II. These studies determine the location of a defined nucleic acid binding site on a large, native, multi-subunit complex and provide insight into the mechanism of transcriptional repression by B2 RNA. PMID:23416138

  11. Two hybrid plasmids with D. melanogaster DNA sequences complementary to mRNA coding for the major heat shock protein.

    PubMed

    Schedl, P; Artavanis-Tsakonas, S; Steward, R; Gehring, W J; Mirault, M E; Goldschmidt-Clermont, M; Moran, L; Tissières, A

    1978-08-01

    The isolation and partial characterization of two cloned segments of Drosophila melanogaster DNA containing "heat shock" gene sequences is described. We have inserted sheared embryonic D. melanogaster DNA by the poly(dA-dt) connector method (Lobban and Kaiser, 1973) into the R1 restriction site of the ampicillin-resistant plasmid pSF2124 (So, Gill and Falkow, 1975). A collection of independent hybrid plasmids was screened by colony hybridization (Grunstein and Hogness, 1975) for sequences complementary to in vitro labeled polysomal poly(A)+ heat shock RNA. Two clones were identified which contain sequences complementary to a heat shock mRNA species that directs the in vitro synthesis of the 70,000 dalton heat-induced polypeptide. Both cloned segments hybridize in situ to the heat-induced puff sites located at 87A and 87C of the salivary gland polytene chromosomes. PMID:99246

  12. Restriction maps of the regions coding for methicillin and tobramycin resistances on chromosomal DNA in methicillin-resistant staphylococci.

    PubMed Central

    Ubukata, K; Nonoguchi, R; Matsuhashi, M; Song, M D; Konno, M

    1989-01-01

    Chromosomal BamHI DNA fragments containing both the mecA gene encoding the penicillin-binding protein responsible for methicillin resistance and the aadD gene encoding 4',4"-adenylyltransferase responsible for tobramycin resistance were cloned from three methicillin- and tobramycin-resistant strains of Staphylococcus aureus and one strain of Staphylococcus epidermidis. Physical maps of the fragments were similar, suggesting their unique origin. Images PMID:2817861

  13. Physical Model for the Evolution of the Genetic Code

    NASA Astrophysics Data System (ADS)

    Yamashita, Tatsuro; Narikiyo, Osamu

    2011-12-01

    Using the shape space of codons and tRNAs we give a physical description of the genetic code evolution on the basis of the codon capture and ambiguous intermediate scenarios in a consistent manner. In the lowest dimensional version of our description, a physical quantity, codon level is introduced. In terms of the codon levels two scenarios are typically classified into two different routes of the evolutional process. In the case of the ambiguous intermediate scenario we perform an evolutional simulation implemented cost selection of amino acids and confirm a rapid transition of the code change. Such rapidness reduces uncomfortableness of the non-unique translation of the code at intermediate state that is the weakness of the scenario. In the case of the codon capture scenario the survival against mutations under the mutational pressure minimizing GC content in genomes is simulated and it is demonstrated that cells which experience only neutral mutations survive.

  14. Phylogenetic analysis of Pythium insidiosum Thai strains using cytochrome oxidase II (COX II) DNA coding sequences and internal transcribed spacer regions (ITS).

    PubMed

    Kammarnjesadakul, Patcharee; Palaga, Tanapat; Sritunyalucksana, Kallaya; Mendoza, Leonel; Krajaejun, Theerapong; Vanittanakom, Nongnuch; Tongchusak, Songsak; Denduangboripant, Jessada; Chindamporn, Ariya

    2011-04-01

    To investigate the phylogenetic relationship among Pythium insidiosum isolates in Thailand, we investigated the genomic DNA of 31 P. insidiosum strains isolated from humans and environmental sources from Thailand, and two from North and Central America. We used PCR to amplify the partial COX II DNA coding sequences and the ITS regions of these isolates. The nucleotide sequences of both amplicons were analyzed by the Bioedit program. Phylogenetic analysis using genetic distance method with Neighbor Joining (NJ) approach was performed using the MEGA4 software. Additional sequences of three other Pythium species, Phytophthora sojae and Lagenidium giganteum were employed as outgroups. The sizes of the COX II amplicons varied from 558-564 bp, whereas the ITS products varied from approximately 871-898 bp. Corrected sequence divergences with Kimura 2-parameter model calculated for the COX II and the ITS DNA sequences ranged between 0.0000-0.0608 and 0.0000-0.2832, respectively. Phylogenetic analysis using both the COX II and the ITS DNA sequences showed similar trees, where we found three sister groups (A(TH), B(TH), and C(TH)) among P. insidiosum strains. All Thai isolates from clinical cases and environmental sources were placed in two separated sister groups (B(TH) and C(TH)), whereas the Americas isolates were grouped into A(TH.) Although the phylogenetic tree based on both regions showed similar distribution, the COX II phylogenetic tree showed higher resolution than the one using the ITS sequences. Our study indicates that COX II gene is the better of the two alternatives to study the phylogenetic relationships among P. insidiosum strains. PMID:20818919

  15. Multilocus sequence analysis supports the taxonomic position of Astragalus glycyphyllos symbionts based on DNA-DNA hybridization.

    PubMed

    Gnat, Sebastian; Małek, Wanda; Oleńska, Ewa; Wdowiak-Wróbel, Sylwia; Kalita, Michał; Rogalski, Jerzy; Wójcik, Magdalena

    2016-04-01

    In this study, the phylogenetic relationship and taxonomic status of six strains, representing different phenons and genomic groups of Astragalus glycyphyllos symbionts, originating from Poland, were established by comparative analysis of five concatenated housekeeping gene sequences (atpD, dnaK, glnA, recA and rpoB), DNA-DNA hybridization and total DNA G+C content. Maximum-likelihood phylogenetic analysis of combined atpD, dnaK, glnA, recA and rpoB sequence data placed the studied bacteria into the clade comprising the genus Mesorhizobium. In the core gene phylograms, four A. glycyphyllos nodule isolates (AG1, AG7, AG15 and AG27) formed a cluster common with Mesorhizobium ciceri, whereas the two other A. glycyphyllos symbionts (AG17 and AG22) were grouped together with Mesorhizobium amorphae and M. septentrionale. The species position of the studied bacteria was clarified by DNA-DNA hybridization. The DNA-DNA relatedness between isolates AG1, AG7, AG15 and AG27 and reference strain M. ciceri USDA 3383T was 76.4-84.2 %, and all these A. glycyphyllos nodulators were defined as members of the genomospecies M. ciceri. DNA-DNA relatedness for isolates AG17 and AG22 and the reference strain M. amorphae ICMP 15022T was 77.5 and 80.1 %, respectively. We propose that the nodule isolates AG17 and AG22 belong to the genomic species M. amorphae. Additionally, it was found that the total DNA G+C content of the six test A. glycyphyllos symbionts was 59.4-62.1 mol%, within the range for species of the genus Mesorhizobium. PMID:26704062

  16. The evolution of the coding exome of the Arabidopsis species - the influences of DNA methylation, relative exon position, and exon length

    PubMed Central

    2014-01-01

    Background The evolution of the coding exome is a major driving force of functional divergence both between species and between protein isoforms. Exons at different positions in the transcript or in different transcript isoforms may (1) mutate at different rates due to variations in DNA methylation level; and (2) serve distinct biological roles, and thus be differentially targeted by natural selection. Furthermore, intrinsic exonic features, such as exon length, may also affect the evolution of individual exons. Importantly, the evolutionary effects of these intrinsic/extrinsic features may differ significantly between animals and plants. Such inter-lineage differences, however, have not been systematically examined. Results Here we examine how DNA methylation at CpG dinucleotides (CpG methylation), in the context of intrinsic exonic features (exon length and relative exon position in the transcript), influences the evolution of coding exons of Arabidopsis thaliana. We observed fairly different evolutionary patterns in A. thaliana as compared with those reported for animals. Firstly, the mutagenic effect of CpG methylation is the strongest for internal exons and the weakest for first exons despite the stringent selective constraints on the former group. Secondly, the mutagenic effect of CpG methylation increases significantly with length in first exons but not in the other two exon groups. Thirdly, CpG methylation level is correlated with evolutionary rates (dS, dN, and the dN/dS ratio) with markedly different patterns among the three exon groups. The correlations are generally positive, negative, and mixed for first, last, and internal exons, respectively. Fourthly, exon length is a CpG methylation-independent indicator of evolutionary rates, particularly for dN and the dN/dS ratio in last and internal exons. Finally, the evolutionary patterns of coding exons with regard to CpG methylation differ significantly between Arabidopsis species and mammals. Conclusions

  17. The tamas gene, identified as a mutation that disrupts larval behavior in Drosophila melanogaster, codes for the mitochondrial DNA polymerase catalytic subunit (DNApol-gamma125).

    PubMed Central

    Iyengar, B; Roote, J; Campos, A R

    1999-01-01

    From a screen of pupal lethal lines of Drosophila melanogaster we identified a mutant strain that displayed a reproducible reduction in the larval response to light. Moreover, this mutant strain showed defects in the development of the adult visual system and failure to undergo behavioral changes characteristic of the wandering stage. The foraging third instar larvae remained in the food substrate for a prolonged period and died at or just before pupariation. Using a new assay for individual larval photobehavior we determined that the lack of response to light in these mutants was due to a primary deficit in locomotion. The mutation responsible for these phenotypes was mapped to the lethal complementation group l(2)34Dc, which we renamed tamas (translated from Sanskrit as "dark inertia"). Sequencing of mutant alleles demonstrated that tamas codes for the mitochondrial DNA polymerase catalytic subunit (DNApol-gamma125). PMID:10581287

  18. Rare Failures of DNA Bar Codes to Separate Morphologically Distinct Species in a Biodiversity Survey of Iberian Leaf Beetles

    PubMed Central

    Baselga, Andrés; Gómez-Rodríguez, Carola; Novoa, Francisco; Vogler, Alfried P.

    2013-01-01

    During a survey of genetic and species diversity patterns of leaf beetle (Coleoptera: Chrysomelidae) assemblages across the Iberian Peninsula we found a broad congruence between morphologically delimited species and variation in the cytochrome oxidase (cox1) gene. However, one species pair each in the genera Longitarsus Berthold and Pachybrachis Chevrolat was inseparable using molecular methods, whereas diagnostic morphological characters (including male or female genitalia) unequivocally separated the named species. Parsimony haplotype networks and maximum likelihood trees built from cox1 showed high genetic structure within each species pair, but no correlation with the morphological types and neither with geographic distributions. This contrasted with all analysed congeneric species, which were recovered as monophyletic. A limited number of specimens were sequenced for the nuclear 18S rRNA gene, which showed no or very limited variation within the species pair and no separation of morphological types. These results suggest that processes of lineage sorting for either group are lagging behind the clear morphological and presumably reproductive separation. In the Iberian chrysomelids, incongruence between DNA-based and morphological delimitations is a rare exception, but the discovery of these species pairs may be useful as an evolutionary model for studying the process of speciation in this ecological and geographical setting. In addition, the study of biodiversity patterns based on DNA requires an evolutionary understanding of these incongruences and their potential causes. PMID:24040352

  19. Color bar coding the BRCA1 gene on combed DNA: a useful strategy for detecting large gene rearrangements.

    PubMed

    Gad, S; Aurias, A; Puget, N; Mairal, A; Schurra, C; Montagna, M; Pages, S; Caux, V; Mazoyer, S; Bensimon, A; Stoppa-Lyonnet, D

    2001-05-01

    Genetic linkage data have shown that alterations of the BRCA1 gene are responsible for the majority of hereditary breast and ovarian cancers. BRCA1 germline mutations, however, are found less frequently than expected. Mutation detection strategies, which are generally based on the polymerase chain reaction, therefore focus on point and small gene alterations. These approaches do not allow for the detection of large gene rearrangements, which also can be involved in BRCA1 alterations. Indeed, a few of them, spread over the entire BRCA1 gene, have been detected recently by Southern blotting or transcript analysis. We have developed an alternative strategy allowing a panoramic view of the BRCA1 gene, based on dynamic molecular combing and the design of a full four-color bar code of the BRCA1 region. The strategy was tested with the study of four large BRCA1 rearrangements previously reported. In addition, when screening a series of 10 breast and ovarian cancer families negatively tested for point mutation in BRCA1/2, we found an unreported 17-kb BRCA1 duplication encompassing exons 3 to 8. The detection of rearrangements as small as 2 to 6 kb with respect to the normal size of the studied fragment is achieved when the BRCA1 region is divided into 10 fragments. In addition, as the BRCA1 bar code is a morphologic approach, the direct observation of complex and likely underreported rearrangements, such as inversions and insertions, becomes possible. PMID:11284038

  20. Application of DNA bar codes for screening of industrially important fungi: the haplotype of Trichoderma harzianum sensu stricto indicates superior chitinase formation.

    PubMed

    Nagy, Viviana; Seidl, Verena; Szakacs, George; Komoń-Zelazowska, Monika; Kubicek, Christian P; Druzhinina, Irina S

    2007-11-01

    Selection of suitable strains for biotechnological purposes is frequently a random process supported by high-throughput methods. Using chitinase production by Hypocrea lixii/Trichoderma harzianum as a model, we tested whether fungal strains with superior enzyme formation may be diagnosed by DNA bar codes. We analyzed sequences of two phylogenetic marker loci, internal transcribed spacer 1 (ITS1) and ITS2 of the rRNA-encoding gene cluster and the large intron of the elongation factor 1-alpha gene, tef1, from 50 isolates of H. lixii/T. harzianum, which were also tested to determine their ability to produce chitinases in solid-state fermentation (SSF). Statistically supported superior chitinase production was obtained for strains carrying one of the observed ITS1 and ITS2 and tef1 alleles corresponding to an allele of T. harzianum type strain CBS 226.95. A tef1-based DNA bar code tool, TrichoCHIT, for rapid identification of these strains was developed. The geographic origin of the strains was irrelevant for chitinase production. The improved chitinase production by strains containing this haplotype was not due to better growth on N-acetyl-beta-D-glucosamine or glucosamine. Isoenzyme electrophoresis showed that neither the isoenzyme profile of N-acetyl-beta-glucosaminidases or the endochitinases nor the intensity of staining of individual chitinase bands correlated with total chitinase in the culture filtrate. The superior chitinase producers did not exhibit similarly increased cellulase formation. Biolog Phenotype MicroArray analysis identified lack of N-acetyl-beta-D-mannosamine utilization as a specific trait of strains with the chitinase-overproducing haplotype. This observation was used to develop a plate screening assay for rapid microbiological identification of the strains. The data illustrate that desired industrial properties may be an attribute of certain populations within a species, and screening procedures should thus include a balanced mixture of all

  1. PCR assay based on DNA coding for 16S rRNA for detection and identification of mycobacteria in clinical samples.

    PubMed Central

    Kox, L F; van Leeuwen, J; Knijper, S; Jansen, H M; Kolk, A H

    1995-01-01

    A PCR and a reverse cross blot hybridization assay were developed for the detection and identification of mycobacteria in clinical samples. The PCR amplifies a part of the DNA coding for 16S rRNA with a set of primers that is specific for the genus Mycobacterium and that flanks species-specific sequences within the genes coding for 16S rRNA. The PCR product is analyzed in a reverse cross blot hybridization assay with probes specific for M. tuberculosis complex (pTub1), M. avium (pAvi3), M. intracellulare (pInt5 and pInt7), M. kansasii complex-M. scrofulaceum complex (pKan1), M. xenopi (pXen1), M. fortuitum (pFor1), M. smegmatis (pSme1), and Mycobacterium spp. (pMyc5a). The PCR assay can detect 10 fg of DNA, the equivalent of two mycobacteria. The specificities of the probes were tested with 108 mycobacterial strains (33 species) and 31 nonmycobacterial strains (of 17 genera). The probes pAvi3, pInt5, pInt7, pKan1, pXen1, and pMyc5a were specific. With probes pTub1, pFor1, and pSme1, slight cross hybridization occurred. However, the mycobacterial strains from which the cross-hybridizing PCR products were derived belonged to nonpathogenic or nonopportunistic species which do not occur in clinical samples. The test was used on 31 different clinical specimens obtained from patients suspected of having mycobacterial disease, including a patient with a double mycobacterial infection. The samples included sputum, bronchoalveolar lavage, tissue biopsy samples, cerebrospinal fluid, pus, peritoneal fluid, pleural fluid, and blood. The results of the PCR assay agreed with those of conventional identification methods or with clinical data, showing that the test can be used for the direct and rapid detection and identification of mycobacteria in clinical samples. PMID:8586707

  2. Loss of genes for DNA recombination and repair in the reductive genome evolution of thioautotrophic symbionts of Calyptogena clams

    PubMed Central

    2011-01-01

    Background Two Calyptogena clam intracellular obligate symbionts, Ca. Vesicomyosocius okutanii (Vok; C. okutanii symbiont) and Ca. Ruthia magnifica (Rma; C. magnifica symbiont), have small genomes (1.02 and 1.16 Mb, respectively) with low G+C contents (31.6% and 34.0%, respectively) and are thought to be in an ongoing stage of reductive genome evolution (RGE). They lack recA and some genes for DNA repair, including mutY. The loss of recA and mutY is thought to contribute to the stabilization of their genome architectures and GC bias, respectively. To understand how these genes were lost from the symbiont genomes, we surveyed these genes in the genomes from 10 other Calyptogena clam symbionts using the polymerase chain reaction (PCR). Results Phylogenetic trees reconstructed using concatenated 16S and 23S rRNA gene sequences showed that the symbionts formed two clades, clade I (symbionts of C. kawamurai, C. laubieri, C. kilmeri, C. okutanii and C. soyoae) and clade II (those of C. pacifica, C. fausta, C. nautilei, C. stearnsii, C. magnifica, C. fossajaponica and C. phaseoliformis). recA was detected by PCR with consensus primers for recA in the symbiont of C. phaseoliformis. A detailed homology search revealed a remnant recA in the Rma genome. Using PCR with a newly designed primer set, intact recA or its remnant was detected in clade II symbionts. In clade I symbionts, the recA coding region was found to be mostly deleted. In the Rma genome, a pseudogene of mutY was found. Using PCR with newly designed primer sets, mutY was not found in clade I symbionts but was found in clade II symbionts. The G+C content of 16S and 23S rRNA genes in symbionts lacking mutY was significantly lower than in those with mutY. Conclusions The extant Calyptogena clam symbionts in clade II were shown to have recA and mutY or their remnants, while those in clade I did not. The present results indicate that the extant symbionts are losing these genes in RGE, and that the loss of mut

  3. Cloning by differential screening of a Xenopus cDNA coding for a protein highly homologous to cdc2.

    PubMed Central

    Paris, J; Le Guellec, R; Couturier, A; Le Guellec, K; Omilli, F; Camonis, J; MacNeill, S; Philippe, M

    1991-01-01

    Fertilization of Xenopus laevis eggs triggers a period of rapid cell division comprising 12 nearly synchronous mitoses. Protein synthesis is required for these divisions, and new proteins appear after fertilization. Others proteins however, which are synthesized in the unfertilized egg, are no longer made in the early embryo. To identify such proteins, a differential screen of an egg cDNA library gave nine clones corresponding to mRNAs that are deadenylylated soon after fertilization. The sequence of one of these clones (Eg1) revealed a high homology to p34cdc2, the kinase subunit of maturation-promoting factor. Only 12 amino acids in the deduced amino acid sequence were unique to Eg1 when its sequence was compared to all other known examples of cdc2. Despite this strong similarity, however, Eg1 was unable to complement a yeast cdc2- mutant in Schizosaccharomyces pombe or a cdc28 mutant of Saccharomyces cerevisiae. Four Eg1 transcripts, two major and two minor, were found in Xenopus oocytes and early embryos. These RNAs appeared very early (stage I) in oogenesis and their level remained constant until the midblastula transition, at which time they declined. Eg1 RNA is found in the poly(A)+ fraction of oocytes only between the time of meiotic maturation and fertilization--that is to say, in the unfertilized egg. At fertilization the RNA loses its poly(A) tail and at the same time leaves the polyribosomes. Images PMID:1704128

  4. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence

    PubMed Central

    Neme, Rafik; Tautz, Diethard

    2016-01-01

    Deep sequencing analyses have shown that a large fraction of genomes is transcribed, but the significance of this transcription is much debated. Here, we characterize the phylogenetic turnover of poly-adenylated transcripts in a comprehensive sampling of taxa of the mouse (genus Mus), spanning a phylogenetic distance of 10 Myr. Using deep RNA sequencing we find that at a given sequencing depth transcriptome coverage becomes saturated within a taxon, but keeps extending when compared between taxa, even at this very shallow phylogenetic level. Our data show a high turnover of transcriptional states between taxa and that no major transcript-free islands exist across evolutionary time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. We conclude that any part of the non-coding genome can potentially become subject to evolutionary functionalization via de novo gene evolution within relatively short evolutionary time spans. DOI: http://dx.doi.org/10.7554/eLife.09977.001 PMID:26836309

  5. 3D-Trajectories Adopted by Coding and Regulatory DNA Elements: First-Passage Times for Genomic Interactions

    PubMed Central

    Lucas, Joseph S.; Zhang, Yaojun; Dudko, Olga K.; Murre, Cornelis

    2014-01-01

    SUMMARY During B lymphocyte development, immunoglobulin heavy chain variable (VH), diversity (DH) and joining (JH) segments assemble to generate a diverse antigen receptor repertoire. Here we have marked the distal VH and DH-JH-Eμ regions with Tet-operator binding sites and traced their 3D-trajectories in pro-B cells transduced with a retrovirus encoding Tet-repressor-EGFP. We found that these elements displayed fractional Langevin motion (fLm) due to the viscoelastic hindrance from the surrounding network of proteins and chromatin fibers. Using fractional Langevin dynamics modeling, we found that, with high probability, DHJH elements reach a VH element within minutes. Spatial confinement emerged as the dominant parameter that determined the frequency of such encounters. We propose that the viscoelastic nature of the nuclear environment causes coding elements and regulatory elements to bounce back and forth in a spring-like fashion until specific genomic interactions are established and that spatial confinement of topological domains largely controls first-passage times for genomic interactions. PMID:24998931

  6. Cloning and DNA sequence of the gene coding for Clostridium thermocellum cellulase Ss (CelS), a major cellulosome component.

    PubMed Central

    Wang, W K; Kruus, K; Wu, J H

    1993-01-01

    Clostridium thermocellum ATCC 27405 produces an extracellular cellulase system capable of hydrolyzing crystalline cellulose. The enzyme system involves a multicomponent protein aggregate (the cellulosome) with a total molecular weight in the millions, impeding mechanistic studies. However, two major components of the aggregate, SS (M(r) = 82,000) and SL (M(r) = 250,000), which act synergistically to hydrolyze crystalline cellulose, have been identified (J. H. D. Wu, W. H. Orme-Johnson, and A. L. Demain, Biochemistry 27:1703-1709, 1988). To further study this synergism, we cloned and sequenced the gene (celS) coding for the SS (CelS) protein by using a degenerate, inosine-containing oligonucleotide probe whose sequence was derived from the N-terminal amino acid sequence of the CelS protein. The open reading frame of celS consisted of 2,241 bp encoding 741 amino acid residues. It encoded the N-terminal amino acid sequence and two internal peptide sequences determined for the native CelS protein. A putative ribosome binding site was identified at the 5' end of the gene. A putative signal peptide of 27 amino acid residues was adjacent to the N terminus of the CelS protein. The predicted molecular weight of the secreted protein was 80,670. The celS gene contained a conserved reiterated sequence encoding 24 amino acid residues found in proteins encoded by many other clostridial cel or xyn genes. A palindromic structure was found downstream from the open reading frame. The celS gene is unique among the known cel genes of C. thermocellum. However, it is highly homologous to the partial open reading frame found in C. cellulolyticum and in Caldocellum saccharolyticum, indicating that these genes belong to a new family of cel genes. Images PMID:8444792

  7. DNA nucleoside composition and methylation in several species of microalgae

    SciTech Connect

    Jarvis, E.E.; Dunahay, T.G.; Brown, L.M. )

    1992-06-01

    Total DNA was isolated from 10 species of microalgae, including representatives of the Chlorophyceae (Chlorella ellipsoidea, Chlamydomonas reinhardtii, and Monoraphidium minutum), Bacillariophyceae (Cyclotella cryptica, Navicula saprophila, Nitzschia pusilla, and Phaeodactylum tricornutum), Charophyceae (Stichococcus sp.), Dinophyceae (Crypthecodinium cohnii), and Prasinophyceae (Tetraselmis suecica). Control samples of Escherichia coli and calf thymus DNA were also analyzed. The nucleoside base composition of each DNA sample was determined by reversed-phase high performance liquid chromatography. All samples contained 5-methyldeoxycytidine, although at widely varying levels. In M. minutum, about one-third of the cytidine residues were methylated. Restriction analysis supported this high degree of methylation in M. minutum and suggested that methylation is biased toward 5[prime]-CG dinucleotides. The guanosine + cytosine (GC) contents of the green algae were, with the exception of Stichococcus sp., consistently higher than those of the diatoms. Monoraphidium minutum exhibited an extremely high GC content of 71%. Such a value is rare among eukaryotic organisms and might indicate an unusual codon usage. This work is important for developing strategies for transformation and gene cloning in these algae. 46 refs., 1 fig., 2 tabs.

  8. Testing the use of ITS rDNA and protein-coding genes in the generic and species delimitation of the lichen genus Usnea (Parmeliaceae, Ascomycota).

    PubMed

    Truong, Camille; Divakar, Pradeep K; Yahr, Rebecca; Crespo, Ana; Clerc, Philippe

    2013-08-01

    In lichen-forming fungi, traditional taxonomical concepts are frequently in conflict with molecular data, and identifying appropriate taxonomic characters to describe phylogenetic clades remains challenging in many groups. The selection of suitable markers for the reconstruction of solid phylogenetic hypotheses is therefore fundamental. The lichen genus Usnea is highly diverse, with more than 350 estimated species, distributed in polar, temperate and tropical regions. The phylogeny and classification of Usnea have been a matter of debate, given the lack of phenotypic characters to describe phylogenetic clades and the low degree of resolution of phylogenetic trees. In this study, we investigated the phylogenetic relationships of 52 Usnea species from across the genus, based on ITS rDNA, nuLSU, and two protein-coding genes RPB1 and MCM7. ITS comprised several highly variable regions, containing substantial genetic signal, but also susceptible to causing bias in the generation of the alignment. We compared several methods of alignment of ITS and found that a simultaneous optimization of alignment and phylogeny (using BAli-phy) improved significantly both the topology and the resolution of the phylogenetic tree. However the resolution was even better when using protein-coding genes, especially RPB1 although it is less variable. The phylogeny based on the concatenated dataset revealed that the genus Usnea is subdivided into four highly-supported clades, corresponding to the traditionally circumscribed subgenera Eumitria, Dolichousnea, Neuropogon and Usnea. However, characters that have been used to describe these clades are often homoplasious within the phylogeny and their parallel evolution is suggested. On the other hand, most of the species were reconstructed as monophyletic, indicating that combinations of phenotypic characters are suitable discriminators for delimitating species, but are inadequate to describe generic subdivisions. PMID:23603312

  9. Rheostatic Regulation of the SERCA/Phospholamban Membrane Protein Complex Using Non-Coding RNA and Single-Stranded DNA oligonucleotides

    PubMed Central

    Soller, Kailey J.; Verardi, Raffaello; Jing, Meng; Abrol, Neha; Yang, Jing; Walsh, Naomi; Vostrikov, Vitaly V.; Robia, Seth L.; Bowser, Michael T.; Veglia, Gianluigi

    2015-01-01

    The membrane protein complex between sarco(endo)plasmic reticulum Ca2+-ATPase (SERCA) and phospholamban (PLN) is a prime therapeutic target for reversing cardiac contractile dysfunctions caused by calcium mishandling. So far, however, efforts to develop drugs specific for this protein complex have failed. Here, we show that non-coding RNAs and single-stranded DNAs (ssDNAs) interact with and regulate the function of the SERCA/PLN complex in a tunable manner. Both in HEK cells expressing the SERCA/PLN complex, as well as in cardiac sarcoplasmic reticulum preparations, these short oligonucleotides bind and reverse PLN’s inhibitory effects on SERCA, increasing the ATPase’s apparent Ca2+ affinity. Solid-state NMR experiments revealed that ssDNA interacts with PLN specifically, shifting the conformational equilibrium of the SERCA/PLN complex from an inhibitory to a non-inhibitory state. Importantly, we achieved rheostatic control of SERCA function by modulating the length of ssDNAs. Since restoration of Ca2+ flux to physiological levels represents a viable therapeutic avenue for cardiomyopathies, our results suggest that oligonucleotide-based drugs could be used to fine-tune SERCA function to counterbalance the extent of the pathological insults. PMID:26292938

  10. Lichenase and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  11. Short unligated sticky ends enable the observation of circularised DNA by atomic force and electron microscopies.

    PubMed

    Révet, B; Fourcade, A

    1998-05-01

    A comparative study of the stabilisation of DNA sticky ends by divalent cations was carried out by atomic force microscopy (AFM), electron microscopy and agarose gel electrophoresis. At room temperature, molecules bearing such extremities are immediately oligomerised or circularised by addition of Mg2+or Ca2+. This phenomenon, more clearly detected by AFM, requires the presence of uranyl salt, which stabilises the structures induced by Mg2+or Ca2+. DNA fragments were obtained by restriction enzymes producing sticky ends of 2 or 4 nucleotides (nt) in length with different guanine plus cytosine (GC) contents. The stability of the pairing is high when ends of 4 nt display a 100% GC-content. In that case, 95% of DNA fragments are maintained circular by the divalent cations, although 2 nt GC-sticky ends are sufficient for a stable pairing. DNA fragments with one blunt end and the other sticky appear as dimers in the presence of Mg2+. Dimerisation was analysed by varying the lengths and concentrations of DNA fragments, the base composition of the sticky ends, and also the temperature. Our observation provides a new powerful tool for construction of inverted dimers, and circularisation, ligation analysis or short bases sequence interaction studies. PMID:9547265

  12. Analysis of the complete DNA sequence of murine cytomegalovirus.

    PubMed Central

    Rawlinson, W D; Farrell, H E; Barrell, B G

    1996-01-01

    The complete DNA sequence of the Smith strain of murine cytomegalovirus (MCMV) was determined from virion DNA by using a whole-genome shotgun approach. The genome has an overall G+C content of 58.7%, consists of 230,278 bp, and is arranged as a single unique sequence with short (31-bp) terminal direct repeats and several short internal repeats. Significant similarity to the genome of the sequenced human cytomegalovirus (HCMV) strain AD169 is evident, particularly for 78 open reading frames encoded by the central part of the genome. There is a very similar distribution of G+C content across the two genomes. Sequences toward the ends of the MCMV genome encode tandem arrays of homologous glycoproteins (gps) arranged as two gene families. The left end encodes 15 gps that represent one family, and the right end encodes a different family of 11 gps. A homolog (m144) of cellular major histocompatibility complex (MHC) class I genes is located at the end of the genome opposite the HCMV MHC class I homolog (UL18). G protein-coupled receptor (GCR) homologs (M33 and M78) occur in positions congruent with two (UL33 and UL78) of the four putative HCMV GCR homologs. Counterparts of all of the known enzyme homologs in HCMV are present in the MCMV genome, including the phosphotransferase gene (M97), whose product phosphorylates ganciclovir in HCMV-infected cells, and the assembly protein (M80). PMID:8971012

  13. Time scale for cyclostome evolution inferred with a phylogenetic diagnosis of hagfish and lamprey cDNA sequences.

    PubMed

    Kuraku, Shigehiro; Kuratani, Shigeru

    2006-12-01

    The Cyclostomata consists of the two orders Myxiniformes (hagfishes) and Petromyzoniformes (lampreys), and its monophyly has been unequivocally supported by recent molecular phylogenetic studies. Under this updated vertebrate phylogeny, we performed in silico evolutionary analyses using currently available cDNA sequences of cyclostomes. We first calculated the GC-content at four-fold degenerate sites (GC(4)), which revealed that an extremely high GC-content is shared by all the lamprey species we surveyed, whereas no striking pattern in GC-content was observed in any of the hagfish species surveyed. We then estimated the timing of diversification in cyclostome evolution using nucleotide and amino acid sequences. We obtained divergence times of 470-390 million years ago (Mya) in the Ordovician-Silurian-Devonian Periods for the interordinal split between Myxiniformes and Petromyzoniformes; 90-60 Mya in the Cretaceous-Tertiary Periods for the split between the two hagfish subfamilies, Myxininae and Eptatretinae; 280-220 Mya in the Permian-Triassic Periods for the split between the two lamprey subfamilies, Geotriinae and Petromyzoninae; and 30-10 Mya in the Tertiary Period for the split between the two lamprey genera, Petromyzon and Lethenteron. This evolutionary configuration indicates that Myxiniformes and Petromyzoniformes diverged shortly after the common ancestor of cyclostomes split from the future gnathostome lineage. Our results also suggest that intra-subfamilial diversification in hagfish and lamprey lineages (especially those distributed in the northern hemisphere) occurred in the Cretaceous or Tertiary Periods. PMID:17261918

  14. Lactococcus lactis carrying the pValac DNA expression vector coding for IL-10 reduces inflammation in a murine model of experimental colitis

    PubMed Central

    2014-01-01

    Background Inflammatory bowel diseases (IBD) are intestinal disorders characterized by inflammation in the gastrointestinal tract. Interleukin-10 is one of the most important anti-inflammatory cytokines involved in the intestinal immune system and because of its role in downregulating inflammatory cascades, its potential for IBD therapy is under study. We previously presented the development of an invasive strain of Lactococcus lactis (L. lactis) producing Fibronectin Binding Protein A (FnBPA) which was capable of delivering, directly to host cells, a eukaryotic DNA expression vector coding for IL-10 of Mus musculus (pValac:il-10) and diminish inflammation in a trinitrobenzene sulfonic acid (TNBS)-induced mouse model of intestinal inflammation. As a new therapeutic strategy against IBD, the aim of this work was to evaluate the therapeutic effect of two L. lactis strains (the same invasive strain evaluated previously and the wild-type strain) carrying the therapeutic pValac:il-10 plasmid in the prevention of inflammation in a dextran sodium sulphate (DSS)-induced mouse model. Results Results obtained showed that not only delivery of the pValac:il-10 plasmid by the invasive strain L. lactis MG1363 FnBPA+, but also by the wild-type strain L. lactis MG1363, was effective at diminishing intestinal inflammation (lower inflammation scores and higher IL-10 levels in the intestinal tissues, accompanied by decrease of IL-6) in the DSS-induced IBD mouse model. Conclusions Administration of both L. lactis strains carrying the pValac:il-10 plasmid was effective at diminishing inflammation in this murine model of experimental colitis, showing their potential for therapeutic intervention of IBD. PMID:25106058

  15. The influence of protein coding sequences on protein folding rates of all-β proteins.

    PubMed

    Li, Rui Fang; Li, Hong

    2011-06-01

    It is currently believed that the protein folding rate is related to the protein structures and its amino acid sequence. However, few studies have been done on the problem that whether the protein folding rate is influenced by its corresponding mRNA sequence. In this paper, we analyzed the possible relationship between the protein folding rates and the corresponding mRNA sequences. The content of guanine and cytosine (GC content) of palindromes in protein coding sequence was introduced as a new parameter and added in the Gromiha's model of predicting protein folding rates to inspect its effect in protein folding process. The multiple linear regression analysis and jack-knife test show that the new parameter is significant. The linear correlation coefficient between the experimental and the predicted values of the protein folding rates increased significantly from 0.96 to 0.99, and the population variance decreased from 0.50 to 0.24 compared with Gromiha's results. The results show that the GC content of palindromes in the corresponding protein coding sequence really influences the protein folding rate. Further analysis indicates that this kind of effect mostly comes from the synonymous codon usage and from the information of palindrome structure itself, but not from the translation information from codons to amino acids. PMID:21613670

  16. DNA sequences, recombinant DNA molecules and processes producing human phospholipase inhibitor polypeptides

    SciTech Connect

    Wallner, B.P.; Pepinsky, R.B.; Garwin, J.L.

    1989-11-07

    This patent describes a recombinant DNA molecule. In comprises a DNA sequence coding for a phospholopase inhibitor polypeptide and being selected from the group consisting of: the cDNA insert of ALC, DNA sequences which code on expression for a phospholopase inhibitor, and DNA sequences which are degenerate as a result of the genetic code to either of the foregoing DNA sequences and which code on expression for a phospholipase inhibitor.

  17. The Cipher Code of Simple Sequence Repeats in "Vampire Pathogens".

    PubMed

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-01-01

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like "vampire pathogens" (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation. PMID:26215592

  18. Deciphering the Combinatorial DNA-binding Code of the CCAAT-binding Complex and the Iron-regulatory Basic Region Leucine Zipper (bZIP) Transcription Factor HapX*

    PubMed Central

    Hortschansky, Peter; Ando, Eriko; Tuppatsch, Katja; Arikawa, Hisashi; Kobayashi, Tetsuo; Kato, Masashi; Haas, Hubertus; Brakhage, Axel A.

    2015-01-01

    The heterotrimeric CCAAT-binding complex (CBC) is evolutionarily conserved in eukaryotic organisms, including fungi, plants, and mammals. The CBC consists of three subunits, which are named in the filamentous fungus Aspergillus nidulans HapB, HapC, and HapE. HapX, a fourth CBC subunit, was identified exclusively in fungi, except for Saccharomyces cerevisiae and the closely related Saccharomycotina species. The CBC-HapX complex acts as the master regulator of iron homeostasis. HapX belongs to the class of basic region leucine zipper transcription factors. We demonstrated that the CBC and HapX bind cooperatively to bipartite DNA motifs with a general HapX/CBC/DNA 2:1:1 stoichiometry in a class of genes that are repressed by HapX-CBC in A. nidulans during iron limitation. This combinatorial binding mode requires protein-protein interaction between the N-terminal domain of HapE and the N-terminal CBC binding domain of HapX as well as sequence-specific DNA binding of both the CBC and HapX. Initial binding of the CBC to CCAAT boxes is mandatory for DNA recognition of HapX. HapX specifically targets the minimal motif 5′-GAT-3′, which is located at a distance of 11–12 bp downstream of the respective CCAAT box. Single nucleotide substitutions at the 5′- and 3′-end of the GAT motif as well as different spacing between the CBC and HapX DNA-binding sites revealed a remarkable promiscuous DNA-recognition mode of HapX. This flexible DNA-binding code may have evolved as a mechanism for fine-tuning the transcriptional activity of CBC-HapX at distinct target promoters. PMID:25589790

  19. Genome size and DNA base composition of geophytes: the mirror of phenology and ecology?

    PubMed Central

    Veselý, Pavel; Bureš, Petr; Šmarda, Petr; Pavlíček, Tomáš

    2012-01-01

    Background and Aims Genome size is known to affect various plant traits such as stomatal size, seed mass, and flower or shoot phenology. However, these associations are not well understood for species with very large genomes, which are laregly represented by geophytic plants. No detailed associations are known between DNA base composition and genome size or species ecology. Methods Genome sizes and GC contents were measured in 219 geophytes together with tentative morpho-anatomical and ecological traits. Key Results Increased genome size was associated with earliness of flowering and tendency to grow in humid conditions, and there was a positive correlation between an increase in stomatal size in species with extremely large genomes. Seed mass of geophytes was closely related to their ecology, but not to genomic parameters. Genomic DNA GC content showed a unimodal relationship with genome size but no relationship with species ecology. Conclusions Evolution of genome size in geophytes is closely related to their ecology and phenology and is also associated with remarkable changes in DNA base composition. Although geophytism together with producing larger cells appears to be an advantageous strategy for fast development of an organism in seasonal habitats, the drought sensitivity of large stomata may restrict the occurrence of geophytes with very large genomes to regions not subject to water stress. PMID:22021815

  20. A robust two-step PCR method of template DNA production for high-throughput cell-free protein synthesis.

    PubMed

    Yabuki, Takashi; Motoda, Yoko; Hanada, Kazuharu; Nunokawa, Emi; Saito, Miyuki; Seki, Eiko; Inoue, Makoto; Kigawa, Takanori; Yokoyama, Shigeyuki

    2007-12-01

    A two-step PCR method has been developed for the robust, high-throughput production of linear templates ready for cell-free protein synthesis. The construct made from the cDNA expresses a target protein region with N- and/or C-terminal tags. The procedure consists only of mixing, dilution, and PCR steps, and is free from cloning and purification steps. In the first step of the two-step PCR, a target region within the coding sequence is amplified using two gene-specific forward and reverse primers, which contain the linker sequences and the terminal sequences of the target region. The second PCR concatenates the first PCR product with the N- and C-terminal double-stranded fragments, which contain the linker sequences as well as the sequences for the tag(s) and the initiation and termination, respectively, for T7 transcription and ribosomal translation, and amplifies it with the universal primer. Proteins can be fused with a variety of tags, such as natural poly-histidine, glutathione-S-transferase, maltose-binding protein, and/or streptavidin-binding peptide. The two-step PCR method was successfully applied to 42 human target protein regions with various GC contents (38-77%). The robustness of the two-step PCR method against possible fluctuations of experimental conditions in practical use was explored. The second PCR product was obtained at 60-120 microg/ml, and was used without purification as a template at a concentration of 2-4 microg/ml in an Escherichia coli coupled transcription-translation system. This combination of two-step PCR with cell-free protein synthesis is suitable for the rapid production of proteins in milligram quantities for genome-scale studies. PMID:18167031

  1. Molecular cloning of amyloid cDNA derived from mRNA of the Alzheimer disease brain: coding and noncoding regions of the fetal precursor mRNA are expressed in the cortex

    SciTech Connect

    Zain, S.B.; Salim, M.; Chou, W.G.; Sajdel-Sulkowska, E.M.; Majocha, R.E.; Marotta, C.A.

    1988-02-01

    To gain insight into factors associated with the excessive accumulation of ..beta..-amyloid in the Alzheimer disease (AD) brain, the present studies were initiated to distinguish between a unique primary structure of the AD-specific amyloid precursor mRNA vis a vis other determinants that may affect amyloid levels. Previous molecular cloning experiments focused on amyloid derived from sources other than AD cases. In the present work, the authors cloned and characterized amyloid cDNA derived directly from AD brain mRNA. Poly(A)/sup +/ RNA from AD cortices was used for the preparation of lambdagt11 recombinant cDNA libraries. An insert of 1564 nucleotides was isolated that included the ..beta..-amyloid domain and corresponded to 75% of the coding region and approx. = 70% of the 3'-noncoding region of the fetal precursor amyloid cDNA reported by others. On RNA blots, the AD amyloid mRNA consisted of a doublet of 3.2 and 3.4 kilobases. In control and AD cases, the amyloid mRNA levels were nonuniform and were independent of glial-specific mRNA levels. Based on the sequence analysis data, they conclude that a segment of the amyloid gene is expressed in the AD cortex as a high molecular weight precursor mRNA with major coding and 3'-noncoding regions that are identical to the fetal brain gene product.

  2. Molecular cloning and characterization of the cDNA coding for the biotin-containing subunit of 3-methylcrotonoyl-CoA carboxylase: identification of the biotin carboxylase and biotin-carrier domains.

    PubMed Central

    Song, J; Wurtele, E S; Nikolau, B J

    1994-01-01

    Soybean genomic clones were isolated based on hybridization to probes that code for the conserved biotinylation domain of biotin-containing enzymes. The corresponding cDNA was isolated and expressed in Escherichia coli through fusion to the bacterial trpE gene. The resulting chimeric protein was biotinylated in E. coli. Antibodies raised against the chimeric protein reacted specifically with an 85-kDa biotin-containing polypeptide from soybean and inhibited 3-methylcrotonoyl-CoA carboxylase (EC 6.4.1.4) activity in cell-free extracts of soybean leaves. Thus, the isolated soybean gene and corresponding cDNA code for the 85-kDa biotin-containing subunit of 3-methylcrotonoyl-CoA carboxylase. The nucleotide sequence of the cDNA and portions of the genomic clones was determined. Comparison of the deduced amino acid sequence of the biotin-containing subunit of 3-methylcrotonoyl-CoA carboxylase with sequences of other biotin enzymes suggests that this subunit contains the functional domains for the first half-reaction catalyzed by all biotin-dependent carboxylases--namely, the carboxylation of biotin. These domains are arranged serially on the polypeptide, with the biotin carboxylase domain at the amino terminus and the biotin-carboxyl carrier domain at the carboxyl terminus. Images PMID:8016064

  3. Clinical coding. Code breakers.

    PubMed

    Mathieson, Steve

    2005-02-24

    --The advent of payment by results has seen the role of the clinical coder pushed to the fore in England. --Examinations for a clinical coding qualification began in 1999. In 2004, approximately 200 people took the qualification. --Trusts are attracting people to the role by offering training from scratch or through modern apprenticeships. PMID:15768716

  4. Compositional changes in RNA, DNA and proteins for bacterial adaptation to higher and lower temperatures.

    PubMed

    Nakashima, Hiroshi; Fukuchi, Satoshi; Nishikawa, Ken

    2003-04-01

    It is known that in thermophiles the G+C content of ribosomal RNA linearly correlates with growth temperature, while that of genomic DNA does not. Although the G+C contents (singlet) of the genomic DNAs of thermophiles and methophiles do not differ significantly, the dinucleotide (doublet) compositions of the two bacterial groups clearly do. The average amino acid compositions of proteins of the two groups are also distinct. Based on these facts, we here analyzed the DNA and protein compositions of various bacteria in terms of the optimal growth temperature (OGT). Regression analyses of the sequence data for thermophilic, mesophilic and psychrophilic bacteria revealed good linear relationships between OGT and the dinucleotide compositions of DNA, and between OGT and the amino acid compositions of proteins. Together with the above-mentioned linear relationship between ribosomal RNA and OGT, the DNA and protein compositions can be regarded as thermostability measures for RNA, DNA and proteins, covering a wide range of temperatures. Both the DNA and proteins of psychrophiles apparently exhibit characteristics diametrically opposite to those of thermophiles. The physicochemical parameters of dinucleotides suggested that supercoiling of DNA is relevant to its thermostability. Protein stability in thermophiles is realized primarily through global changes that increase charged residues (i.e., Glu, Arg, and Lys) on the molecular surface of all proteins. This kind of global change is attainable through a change in the amino acid composition coupled with alterations in the DNA base composition. The general strategies of thermophiles and psychrophiles for adaptation to higher and lower temperatures, respectively, that are suggested by the present study are discussed. PMID:12761299

  5. Virus-coded origin of a 32,000-dalton protein from avian retrovirus cores: structural relatedness of p32 and the beta polypeptide of the avian retrovirus DNA polymerase.

    PubMed Central

    Schiff, R D; Grandgenett, D P

    1978-01-01

    A 32,000-dalton protein (p32) located in avian retrovirus cores was immunoprecipitated from [35S]methionine-labeled avian myeloblastosis virus (AMV) propagated in cultured chicken embryo fibroblast cells by an antiserum preparation (sarc III) derived from tumor-bearing hamsters injected with cloned and passaged cells from an avian sarcoma virus-induced primary hamster tumor. Since sarc III serum apparently contained antibodies only to virus-coded proteins and not to chicken cellular proteins, the immunoprecipitation of p32 from AMV by sarc III serum strongly suggested that p32 is virus coded. The origin of p32 was more definitively established by demonstrating the existence of a structural relationship between p32 and the AMV DNA polymerase. AMV p32 cross-reacted with the beta polypeptide of AMV alphabeta DNA polymerase in radioimmunoprecipitation and radioimmunoprecipitation inhibition assays, indicating that p32 and beta share common antigenic determinants. This relationship was clarified by sodium do-decyl sulfate-polyacrylamide gel electrophoretic analysis of the peptides generated by limited proteolysis of 125I-labeled AMV DNA polymerase polypeptides and of 125I-labeled AMV p32 by chymotrypsin or Staphylococcus aureus V-8 protease. The peptides which appeared during proteolytic digestion of p32 were a subset of those produced by digestion of the beta polypeptide; however, p32 had no discernible peptides in common with the alpha polypeptide. Further, all of the peptides produced by limited proteolysis of beta were present in the digests of either p32 or alpha. Our findings suggest that p32 is apparently derived by cleavage of the beta polypeptide of AMV DNA polymerase, presumably at a site near or identical to that at which alpha is generated from beta by proteolytic cleavage. Images PMID:81316

  6. ITS1: a DNA barcode better than ITS2 in eukaryotes?

    PubMed

    Wang, Xin-Cun; Liu, Chang; Huang, Liang; Bengtsson-Palme, Johan; Chen, Haimei; Zhang, Jian-Hui; Cai, Dayong; Li, Jian-Qin

    2015-05-01

    A DNA barcode is a short piece of DNA sequence used for species determination and discovery. The internal transcribed spacer (ITS/ITS2) region has been proposed as the standard DNA barcode for fungi and seed plants and has been widely used in DNA barcoding analyses for other biological groups, for example algae, protists and animals. The ITS region consists of both ITS1 and ITS2 regions. Here, a large-scale meta-analysis was carried out to compare ITS1 and ITS2 from three aspects: PCR amplification, DNA sequencing and species discrimination, in terms of the presence of DNA barcoding gaps, species discrimination efficiency, sequence length distribution, GC content distribution and primer universality. In total, 85 345 sequence pairs in 10 major groups of eukaryotes, including ascomycetes, basidiomycetes, liverworts, mosses, ferns, gymnosperms, monocotyledons, eudicotyledons, insects and fishes, covering 611 families, 3694 genera, and 19 060 species, were analysed. Using similarity-based methods, we calculated species discrimination efficiencies for ITS1 and ITS2 in all major groups, families and genera. Using Fisher's exact test, we found that ITS1 has significantly higher efficiencies than ITS2 in 17 of the 47 families and 20 of the 49 genera, which are sample-rich. By in silico PCR amplification evaluation, primer universality of the extensively applied ITS1 primers was found superior to that of ITS2 primers. Additionally, shorter length of amplification product and lower GC content was discovered to be two other advantages of ITS1 for sequencing. In summary, ITS1 represents a better DNA barcode than ITS2 for eukaryotic species. PMID:25187125

  7. Preparation of Proper Immunogen by Cloning and Stable Expression of cDNA coding for Human Hematopoietic Stem Cell Marker CD34 in NIH-3T3 Mouse Fibroblast Cell Line

    PubMed Central

    Shafaghat, Farzaneh; Abbasi-Kenarsari, Hajar; Majidi, Jafar; Movassaghpour, Ali Akbar; Shanehbandi, Dariush; Kazemi, Tohid

    2015-01-01

    Purpose: Transmembrane CD34 glycoprotein is the most important marker for identification, isolation and enumeration of hematopoietic stem cells (HSCs). We aimed in this study to clone the cDNA coding for human CD34 from KG1a cell line and stably express in mouse fibroblast cell line NIH-3T3. Such artificial cell line could be useful as proper immunogen for production of mouse monoclonal antibodies. Methods: CD34 cDNA was cloned from KG1a cell line after total RNA extraction and cDNA synthesis. Pfu DNA polymerase-amplified specific band was ligated to pGEMT-easy TA-cloning vector and sub-cloned in pCMV6-Neo expression vector. After transfection of NIH-3T3 cells using 3 μg of recombinant construct and 6 μl of JetPEI transfection reagent, stable expression was obtained by selection of cells by G418 antibiotic and confirmed by surface flow cytometry. Results: 1158 bp specific band was aligned completely to reference sequence in NCBI database corresponding to long isoform of human CD34. Transient and stable expression of human CD34 on transfected NIH-3T3 mouse fibroblast cells was achieved (25% and 95%, respectively) as shown by flow cytometry. Conclusion: Cloning and stable expression of human CD34 cDNA was successfully performed and validated by standard flow cytometric analysis. Due to murine origin of NIH-3T3 cell line, CD34-expressing NIH-3T3 cells could be useful as immunogen in production of diagnostic monoclonal antibodies against human CD34. This approach could bypass the need for purification of recombinant proteins produced in eukaryotic expression systems. PMID:25789221

  8. DNA polymorphism in morels: complete sequences of the internal transcribed spacer of genes coding for rRNA in Morchella esculenta (yellow morel) and Morchella conica (black morel).

    PubMed Central

    Wipf, D; Munch, J C; Botton, B; Buscot, F

    1996-01-01

    The internal transcribed spacer (ITS) of the gene coding for rRNA was sequenced in both directions with the gene walking technique in a black morel (Morchella conica) and a yellow morel (M. esculenta) to elucidate the ITS length discrepancy between the two species groups (750-bp ITS in black morels and 1,150-bp ITS in yellow morels. PMID:8795250

  9. Speech coding

    NASA Astrophysics Data System (ADS)

    Gersho, Allen

    1990-05-01

    Recent advances in algorithms and techniques for speech coding now permit high quality voice reproduction at remarkably low bit rates. The advent of powerful single-ship signal processors has made it cost effective to implement these new and sophisticated speech coding algorithms for many important applications in voice communication and storage. Some of the main ideas underlying the algorithms of major interest today are reviewed. The concept of removing redundancy by linear prediction is reviewed, first in the context of predictive quantization or DPCM. Then linear predictive coding, adaptive predictive coding, and vector quantization are discussed. The concepts of excitation coding via analysis-by-synthesis, vector sum excitation codebooks, and adaptive postfiltering are explained. The main idea of vector excitation coding (VXC) or code excited linear prediction (CELP) are presented. Finally low-delay VXC coding and phonetic segmentation for VXC are described.

  10. GC-Rich Extracellular DNA Induces Oxidative Stress, Double-Strand DNA Breaks, and DNA Damage Response in Human Adipose-Derived Mesenchymal Stem Cells

    PubMed Central

    Kostyuk, Svetlana; Smirnova, Tatiana; Kameneva, Larisa; Porokhovnik, Lev; Speranskij, Anatolij; Ershova, Elizaveta; Stukalov, Sergey; Izevskaya, Vera; Veiko, Natalia

    2015-01-01

    Background. Cell free DNA (cfDNA) circulates throughout the bloodstream of both healthy people and patients with various diseases. CfDNA is substantially enriched in its GC-content as compared with human genomic DNA. Principal Findings. Exposure of haMSCs to GC-DNA induces short-term oxidative stress (determined with H2DCFH-DA) and results in both single- and double-strand DNA breaks (comet assay and γH2AX, foci). As a result in the cells significantly increases the expression of repair genes (BRCA1 (RT-PCR), PCNA (FACS)) and antiapoptotic genes (BCL2 (RT-PCR and FACS), BCL2A1, BCL2L1, BIRC3, and BIRC2 (RT-PCR)). Under the action of GC-DNA the potential of mitochondria was increased. Here we show that GC-rich extracellular DNA stimulates adipocyte differentiation of human adipose-derived mesenchymal stem cells (haMSCs). Exposure to GC-DNA leads to an increase in the level of RNAPPARG2 and LPL (RT-PCR), in the level of fatty acid binding protein FABP4 (FACS analysis) and in the level of fat (Oil Red O). Conclusions. GC-rich fragments in the pool of cfDNA can potentially induce oxidative stress and DNA damage response and affect the direction of mesenchymal stem cells differentiation in human adipose—derived mesenchymal stem cells. Such a response may be one of the causes of obesity or osteoporosis. PMID:26273425

  11. Uplink Coding

    NASA Technical Reports Server (NTRS)

    Pollara, Fabrizio; Hamkins, Jon; Dolinar, Sam; Andrews, Ken; Divsalar, Dariush

    2006-01-01

    This viewgraph presentation reviews uplink coding. The purpose and goals of the briefing are (1) Show a plan for using uplink coding and describe benefits (2) Define possible solutions and their applicability to different types of uplink, including emergency uplink (3) Concur with our conclusions so we can embark on a plan to use proposed uplink system (4) Identify the need for the development of appropriate technology and infusion in the DSN (5) Gain advocacy to implement uplink coding in flight projects Action Item EMB04-1-14 -- Show a plan for using uplink coding, including showing where it is useful or not (include discussion of emergency uplink coding).

  12. Promoter-restricted histone code, not the differentially methylated DNA regions or antisense transcripts, marks the imprinting status of IGF2R in human and mouse.

    PubMed

    Vu, Thanh H; Li, Tao; Hoffman, Andrew R

    2004-10-01

    Imprinting of the mouse Igf2r depends upon an intronic differentially methylated DNA region (DMR) and the presence of the Air antisense transcript. However, biallelic expression of mouse Igf2r in brain occurs despite the presence of Air, and biallelic expression of human IGF2R in peripheral tissues occurs despite the presence of an intronic DMR. We examined histone modifications throughout the mouse and human Igf2r/IGF2R using chromatin immuno-precipitation (ChIP) assays in combination with quantitative real time PCR. Methylation of Lys4 and Lys9 of histone H3 in the promoter regions marks the active and silenced alleles, respectively. We measured di- and tri-methyl Lys4 and Lys9 across the Igf2r and Air promoters. While both di- and tri-methyl Lys4 marked the active Igf2r and the active Air allele, tri-methyl Lys9, but not di-methyl Lys9, marked the suppressed Air allele. We show here that enrichment of parental allele-specific histone modifications in the promoter region, rather than the presence of DNA methylation or antisense transcription, correctly identifies the tissue- and species- specific imprinting status of Igf2r/IGF2R. We discuss these findings in light of recent progress in identifying specific components of the epigenetic marks in imprinted genes. PMID:15294879

  13. The coding region of the UFGT gene is a source of diagnostic SNP markers that allow single-locus DNA genotyping for the assessment of cultivar identity and ancestry in grapevine (Vitis vinifera L.)

    PubMed Central

    2013-01-01

    Background Vitis vinifera L. is one of society’s most important agricultural crops with a broad genetic variability. The difficulty in recognizing grapevine genotypes based on ampelographic traits and secondary metabolites prompted the development of molecular markers suitable for achieving variety genetic identification. Findings Here, we propose a comparison between a multi-locus barcoding approach based on six chloroplast markers and a single-copy nuclear gene sequencing method using five coding regions combined with a character-based system with the aim of reconstructing cultivar-specific haplotypes and genotypes to be exploited for the molecular characterization of 157 V. vinifera accessions. The analysis of the chloroplast target regions proved the inadequacy of the DNA barcoding approach at the subspecies level, and hence further DNA genotyping analyses were targeted on the sequences of five nuclear single-copy genes amplified across all of the accessions. The sequencing of the coding region of the UFGT nuclear gene (UDP-glucose: flavonoid 3-0-glucosyltransferase, the key enzyme for the accumulation of anthocyanins in berry skins) enabled the discovery of discriminant SNPs (1/34 bp) and the reconstruction of 130 V. vinifera distinct genotypes. Most of the genotypes proved to be cultivar-specific, and only few genotypes were shared by more, although strictly related, cultivars. Conclusion On the whole, this technique was successful for inferring SNP-based genotypes of grapevine accessions suitable for assessing the genetic identity and ancestry of international cultivars and also useful for corroborating some hypotheses regarding the origin of local varieties, suggesting several issues of misidentification (synonymy/homonymy). PMID:24298902

  14. Cloning and sequence analysis of the coding sequence of β-actin cDNA from the Chinese alligator and suitable internal reference primers from the β-actin gene.

    PubMed

    Zhu, H N; Zhang, S Z; Zhou, Y K; Wang, C L; Wu, X B

    2015-01-01

    β-Actin is an essential component of the cytoskeleton and is stably expressed in various tissues of animals, thus, it is commonly used as an internal reference for gene expression studies. In this study, a 1731-bp fragment of β-actin cDNA from Alligator sinensis was obtained using the homology cloning technique. Sequence analysis showed that this fragment contained the complete coding sequence of the β-actin gene (1128 bp), encoding 375 amino acids. The amino acid sequence of β-actin is highly conserved and its nucleotide sequence is slightly variable. Multiple alignment analyses showed that the nucleotide sequence of the β-actin gene from A. sinensis is very similar to sequences from birds, with 94-95% identity. Ten pairs of primers with different product sizes and different annealing temperatures were screened by PCR amplification, agarose gel electrophoresis, and DNA sequencing, and could be used as internal reference primers in gene expression studies. This study expands our knowledge of β-actin gene phylogenetic evolution and provides a basis for quantitative gene expression studies in A. sinensis. PMID:26505364

  15. Phylogenetic footprinting of non-coding RNA: hammerhead ribozyme sequences in a satellite DNA family of Dolichopoda cave crickets (Orthoptera, Rhaphidophoridae)

    PubMed Central

    2010-01-01

    Background The great variety in sequence, length, complexity, and abundance of satellite DNA has made it difficult to ascribe any function to this genome component. Recent studies have shown that satellite DNA can be transcribed and be involved in regulation of chromatin structure and gene expression. Some satellite DNAs, such as the pDo500 sequence family in Dolichopoda cave crickets, have a catalytic hammerhead (HH) ribozyme structure and activity embedded within each repeat. Results We assessed the phylogenetic footprints of the HH ribozyme within the pDo500 sequences from 38 different populations representing 12 species of Dolichopoda. The HH region was significantly more conserved than the non-hammerhead (NHH) region of the pDo500 repeat. In addition, stems were more conserved than loops. In stems, several compensatory mutations were detected that maintain base pairing. The core region of the HH ribozyme was affected by very few nucleotide substitutions and the cleavage position was altered only once among 198 sequences. RNA folding of the HH sequences revealed that a potentially active HH ribozyme can be found in most of the Dolichopoda populations and species. Conclusions The phylogenetic footprints suggest that the HH region of the pDo500 sequence family is selected for function in Dolichopoda cave crickets. However, the functional role of HH ribozymes in eukaryotic organisms is unclear. The possible functions have been related to trans cleavage of an RNA target by a ribonucleoprotein and regulation of gene expression. Whether the HH ribozyme in Dolichopoda is involved in similar functions remains to be investigated. Future studies need to demonstrate how the observed nucleotide changes and evolutionary constraint have affected the catalytic efficiency of the hammerhead. PMID:20047671

  16. Influence of the sequence on elastic properties of long DNA chains

    NASA Astrophysics Data System (ADS)

    Vaillant, C.; Audit, B.; Thermes, C.; Arnéodo, A.

    2003-03-01

    We revisit the results of single-molecule DNA stretching experiments using a rodlike chain (RLC) model that explicitly includes some intrinsic structural disorder induced by the sequence. The investigation of artificial and real genomic sequences shows that the wormlike chain model reproduces quite well the data but with an effective bend stiffness Aeff, which underestimates the true elastic bend stiffness A, independently of the elastic twist stiffness C. Mainly dominated by the amplitude of the structural disorder, this correction seems rather insensitive to the presence of long-range correlations. This RLC model is shown to remarkably fit the experimental data for λ-DNA when considering A≃70±10 nm (>Aeff≃50 nm), in good agreement with previous experimental estimates of the “dynamic” persistent length. From the analysis of large human contigs, we speculate about the possible dependence of Aeff and/or A upon the (G+C) content of the considered sequence.

  17. DNA as information.

    PubMed

    Wills, Peter R

    2016-03-13

    This article reviews contributions to this theme issue covering the topic 'DNA as information' in relation to the structure of DNA, the measure of its information content, the role and meaning of information in biology and the origin of genetic coding as a transition from uninformed to meaningful computational processes in physical systems. PMID:26857666

  18. FY05 LDRD Fianl Report Investigation of AAA+ protein machines that participate in DNA replication, recombination, and in response to DNA damage LDRD Project Tracking Code: 04-LW-049

    SciTech Connect

    Sawicka, D; de Carvalho-Kavanagh, M S; Barsky, D; Venclovas, C

    2006-12-04

    The AAA+ proteins are remarkable macromolecules that are able to self-assemble into nanoscale machines. These protein machines play critical roles in many cellular processes, including the processes that manage a cell's genetic material, but the mechanism at the molecular level has remained elusive. We applied computational molecular modeling, combined with advanced sequence analysis and available biochemical and genetic data, to structurally characterize eukaryotic AAA+ proteins and the protein machines they form. With these models we have examined intermolecular interactions in three-dimensions (3D), including both interactions between the components of the AAA+ complexes and the interactions of these protein machines with their partners. These computational studies have provided new insights into the molecular structure and the mechanism of action for AAA+ protein machines, thereby facilitating a deeper understanding of processes involved in DNA metabolism.

  19. The Cipher Code of Simple Sequence Repeats in “Vampire Pathogens”

    PubMed Central

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W.; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-01-01

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like “vampire pathogens” (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation. PMID:26215592

  20. DNA Nanotechnology-- Architectures Designed with DNA

    NASA Astrophysics Data System (ADS)

    Han, Dongran

    As the genetic information storage vehicle, deoxyribonucleic acid (DNA) molecules are essential to all known living organisms and many viruses. It is amazing that such a large amount of information about how life develops can be stored in these tiny molecules. Countless scientists, especially some biologists, are trying to decipher the genetic information stored in these captivating molecules. Meanwhile, another group of researchers, nanotechnologists in particular, have discovered that the unique and concise structural features of DNA together with its information coding ability can be utilized for nano-construction efforts. This idea culminated in the birth of the field of DNA nanotechnology which is the main topic of this dissertation. The ability of rationally designed DNA strands to self-assemble into arbitrary nanostructures without external direction is the basis of this field. A series of novel design principles for DNA nanotechnology are presented here, from topological DNA nanostructures to complex and curved DNA nanostructures, from pure DNA nanostructures to hybrid RNA/DNA nanostructures. As one of the most important and pioneering fields in controlling the assembly of materials (both DNA and other materials) at the nanoscale, DNA nanotechnology is developing at a dramatic speed and as more and more construction approaches are invented, exciting advances will emerge in ways that we may or may not predict.

  1. Computer Code

    NASA Technical Reports Server (NTRS)

    1985-01-01

    COSMIC MINIVER, a computer code developed by NASA for analyzing aerodynamic heating and heat transfer on the Space Shuttle, has been used by Marquardt Company to analyze heat transfer on Navy/Air Force missile bodies. The code analyzes heat transfer by four different methods which can be compared for accuracy. MINIVER saved Marquardt three months in computer time and $15,000.

  2. Cloning and sequence analysis of partial genomic DNA coding for HtrA-type serine protease of Wolbachia from human lymphatic filarial parasite, Wuchereria bancrofti

    PubMed Central

    Dhamodharan, R; Hoti, SL; Sivapragasam, G; Das, MK

    2011-01-01

    Background: Periplasmic serine proteases of HtrA type of Wolbachia have been shown to play a role in the pathogenesis of filarial disease. Aims: This study was aimed to sequence Wb-HtrA serine protease and analyze its phylogenetic position by comparing with other filarial and non-filarial nematode homologs. Materials and Methods: Partial HtrA gene fragment was amplified from DNA isolated from periodic and sub-periodic Wuchereria bancrofti parasites collected from Pondicherry and Nicobar islands, respectively. The amplicons were sequenced, and sequence homology and phylogenetic relationship with other filarial and non-filarial nematodes were analyzed. Results: Partial orthologue of HtrA-type serine protease from Wolbachia of W. bancrofti was amplified, cloned and sequenced. The deduced amino acid sequence exhibited 87%, 81% and 74% identity with the homologous Wolbachia proteases identified from Brugia malayi, Onchocerca volvulus and Drosophila melanogaster, respectively. The Wb-HtrA has arthologues in several proteobacteria with very high homology and hence is highly conserved not only among Wolbachia of filarial parasites but also across proteobacteria. The phylogenetic tree constructed using Neighbor-Joining method showed two main clusters: cluster-I containing bacteria that dwell in diverse habitats such as soil, fresh and marine waters and plants and cluster-II comprising Anaplasma sp. and Erlichia, and Wolbachia endosymbionts of insects and nematodes, in distinct groups. Conclusions: HtrA-type serine protease from Wolbachia of W. bancrofti is highly conserved among filarial parasites. It will be of interest to know whether filarial Wolbachia HtrA type of serine protease might influence apoptosis and lymphatic epithelium, thereby playing a role in the filarial pathogenesis. Such information will be useful for identifying targets for the development of newer drugs for filariasis treatment, especially for preventing lymphatic pathology. PMID:23508470

  3. Genome-wide profiling of yeast DNA:RNA hybrid prone sites with DRIP-chip.

    PubMed

    Chan, Yujia A; Aristizabal, Maria J; Lu, Phoebe Y T; Luo, Zongli; Hamza, Akil; Kobor, Michael S; Stirling, Peter C; Hieter, Philip

    2014-04-01

    DNA:RNA hybrid formation is emerging as a significant cause of genome instability in biological systems ranging from bacteria to mammals. Here we describe the genome-wide distribution of DNA:RNA hybrid prone loci in Saccharomyces cerevisiae by DNA:RNA immunoprecipitation (DRIP) followed by hybridization on tiling microarray. These profiles show that DNA:RNA hybrids preferentially accumulated at rDNA, Ty1 and Ty2 transposons, telomeric repeat regions and a subset of open reading frames (ORFs). The latter are generally highly transcribed and have high GC content. Interestingly, significant DNA:RNA hybrid enrichment was also detected at genes associated with antisense transcripts. The expression of antisense-associated genes was also significantly altered upon overexpression of RNase H, which degrades the RNA in hybrids. Finally, we uncover mutant-specific differences in the DRIP profiles of a Sen1 helicase mutant, RNase H deletion mutant and Hpr1 THO complex mutant compared to wild type, suggesting different roles for these proteins in DNA:RNA hybrid biology. Our profiles of DNA:RNA hybrid prone loci provide a resource for understanding the properties of hybrid-forming regions in vivo, extend our knowledge of hybrid-mitigating enzymes, and contribute to models of antisense-mediated gene regulation. A summary of this paper was presented at the 26th International Conference on Yeast Genetics and Molecular Biology, August 2013. PMID:24743342

  4. Genome-Wide Profiling of Yeast DNA:RNA Hybrid Prone Sites with DRIP-Chip

    PubMed Central

    Lu, Phoebe Y. T.; Luo, Zongli; Hamza, Akil; Kobor, Michael S.; Stirling, Peter C.; Hieter, Philip

    2014-01-01

    DNA:RNA hybrid formation is emerging as a significant cause of genome instability in biological systems ranging from bacteria to mammals. Here we describe the genome-wide distribution of DNA:RNA hybrid prone loci in Saccharomyces cerevisiae by DNA:RNA immunoprecipitation (DRIP) followed by hybridization on tiling microarray. These profiles show that DNA:RNA hybrids preferentially accumulated at rDNA, Ty1 and Ty2 transposons, telomeric repeat regions and a subset of open reading frames (ORFs). The latter are generally highly transcribed and have high GC content. Interestingly, significant DNA:RNA hybrid enrichment was also detected at genes associated with antisense transcripts. The expression of antisense-associated genes was also significantly altered upon overexpression of RNase H, which degrades the RNA in hybrids. Finally, we uncover mutant-specific differences in the DRIP profiles of a Sen1 helicase mutant, RNase H deletion mutant and Hpr1 THO complex mutant compared to wild type, suggesting different roles for these proteins in DNA:RNA hybrid biology. Our profiles of DNA:RNA hybrid prone loci provide a resource for understanding the properties of hybrid-forming regions in vivo, extend our knowledge of hybrid-mitigating enzymes, and contribute to models of antisense-mediated gene regulation. A summary of this paper was presented at the 26th International Conference on Yeast Genetics and Molecular Biology, August 2013. PMID:24743342

  5. Direct Sequencing from the Minimal Number of DNA Molecules Needed to Fill a 454 Picotiterplate

    PubMed Central

    Martínez-Priego, Llúcia; D’Auria, Giussepe; Calafell, Francesc; Moya, Andrés

    2014-01-01

    The large amount of DNA needed to prepare a library in next generation sequencing protocols hinders direct sequencing of small DNA samples. This limitation is usually overcome by the enrichment of such samples with whole genome amplification (WGA), mostly by multiple displacement amplification (MDA) based on φ29 polymerase. However, this technique can be biased by the GC content of the sample and is prone to the development of chimeras as well as contamination during enrichment, which contributes to undesired noise during sequence data analysis, and also hampers the proper functional and/or taxonomic assignments. An alternative to MDA is direct DNA sequencing (DS), which represents the theoretical gold standard in genome sequencing. In this work, we explore the possibility of sequencing the genome of Escherichia coli from the minimum number of DNA molecules required for pyrosequencing, according to the notion of one-bead-one-molecule. Using an optimized protocol for DS, we constructed a shotgun library containing the minimum number of DNA molecules needed to fill a selected region of a picotiterplate. We gathered most of the reference genome extension with uniform coverage. We compared the DS method with MDA applied to the same amount of starting DNA. As expected, MDA yielded a sparse and biased read distribution, with a very high amount of unassigned and unspecific DNA amplifications. The optimized DS protocol allows unbiased sequencing to be performed from samples with a very small amount of DNA. PMID:24887077

  6. In silico prediction of long intergenic non-coding RNAs in sheep.

    PubMed

    Bakhtiarizadeh, Mohammad Reza; Hosseinpour, Batool; Arefnezhad, Babak; Shamabadi, Narges; Salami, Seyed Alireza

    2016-04-01

    Long non-coding RNAs (lncRNAs) are transcribed RNA molecules >200 nucleotides in length that do not encode proteins and serve as key regulators of diverse biological processes. Recently, thousands of long intergenic non-coding RNAs (lincRNAs), a type of lncRNAs, have been identified in mammalians using massive parallel large sequencing technologies. The availability of the genome sequence of sheep (Ovis aries) has allowed us genomic prediction of non-coding RNAs. This is the first study to identify lincRNAs using RNA-seq data of eight different tissues of sheep, including brain, heart, kidney, liver, lung, ovary, skin, and white adipose. A computational pipeline was employed to characterize 325 putative lincRNAs with high confidence from eight important tissues of sheep using different criteria such as GC content, exon number, gene length, co-expression analysis, stability, and tissue-specific scores. Sixty-four putative lincRNAs displayed tissues-specific expression. The highest number of tissues-specific lincRNAs was found in skin and brain. All novel lincRNAs that aligned to the human and mouse lincRNAs had conserved synteny. These closest protein-coding genes were enriched in 11 significant GO terms such as limb development, appendage development, striated muscle tissue development, and multicellular organismal development. The findings reported here have important implications for the study of sheep genome. PMID:27002388

  7. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  8. Statistical properties of DNA sequences

    NASA Astrophysics Data System (ADS)

    Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-02-01

    We review evidence supporting the idea that the DNA sequence in genese containing non-coding regions is correlated, and that the correlation is remarkably long range - indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the “non-stationarity” feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33 301 coding and 29 453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  9. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  10. Investigating the dynamics of surface-immobilized DNA nanomachines.

    PubMed

    Dunn, Katherine E; Trefzer, Martin A; Johnson, Steven; Tyrrell, Andy M

    2016-01-01

    Surface-immobilization of molecules can have a profound influence on their structure, function and dynamics. Toehold-mediated strand displacement is often used in solution to drive synthetic nanomachines made from DNA, but the effects of surface-immobilization on the mechanism and kinetics of this reaction have not yet been fully elucidated. Here we show that the kinetics of strand displacement in surface-immobilized nanomachines are significantly different to those of the solution phase reaction, and we attribute this to the effects of intermolecular interactions within the DNA layer. We demonstrate that the dynamics of strand displacement can be manipulated by changing strand length, concentration and G/C content. By inserting mismatched bases it is also possible to tune the rates of the constituent displacement processes (toehold-binding and branch migration) independently, and information can be encoded in the time-dependence of the overall reaction. Our findings will facilitate the rational design of surface-immobilized dynamic DNA nanomachines, including computing devices and track-based motors. PMID:27387252

  11. Investigating the dynamics of surface-immobilized DNA nanomachines

    PubMed Central

    Dunn, Katherine E.; Trefzer, Martin A.; Johnson, Steven; Tyrrell, Andy M.

    2016-01-01

    Surface-immobilization of molecules can have a profound influence on their structure, function and dynamics. Toehold-mediated strand displacement is often used in solution to drive synthetic nanomachines made from DNA, but the effects of surface-immobilization on the mechanism and kinetics of this reaction have not yet been fully elucidated. Here we show that the kinetics of strand displacement in surface-immobilized nanomachines are significantly different to those of the solution phase reaction, and we attribute this to the effects of intermolecular interactions within the DNA layer. We demonstrate that the dynamics of strand displacement can be manipulated by changing strand length, concentration and G/C content. By inserting mismatched bases it is also possible to tune the rates of the constituent displacement processes (toehold-binding and branch migration) independently, and information can be encoded in the time-dependence of the overall reaction. Our findings will facilitate the rational design of surface-immobilized dynamic DNA nanomachines, including computing devices and track-based motors. PMID:27387252

  12. DNA structure and function.

    PubMed

    Travers, Andrew; Muskhelishvili, Georgi

    2015-06-01

    The proposal of a double-helical structure for DNA over 60 years ago provided an eminently satisfying explanation for the heritability of genetic information. But why is DNA, and not RNA, now the dominant biological information store? We argue that, in addition to its coding function, the ability of DNA, unlike RNA, to adopt a B-DNA structure confers advantages both for information accessibility and for packaging. The information encoded by DNA is both digital - the precise base specifying, for example, amino acid sequences - and analogue. The latter determines the sequence-dependent physicochemical properties of DNA, for example, its stiffness and susceptibility to strand separation. Most importantly, DNA chirality enables the formation of supercoiling under torsional stress. We review recent evidence suggesting that DNA supercoiling, particularly that generated by DNA translocases, is a major driver of gene regulation and patterns of chromosomal gene organization, and in its guise as a promoter of DNA packaging enables DNA to act as an energy store to facilitate the passage of translocating enzymes such as RNA polymerase. PMID:25903461

  13. High resolution melting (HRM) analysis of DNA--its role and potential in food analysis.

    PubMed

    Druml, Barbara; Cichna-Markl, Margit

    2014-09-01

    DNA based methods play an increasing role in food safety control and food adulteration detection. Recent papers show that high resolution melting (HRM) analysis is an interesting approach. It involves amplification of the target of interest in the presence of a saturation dye by the polymerase chain reaction (PCR) and subsequent melting of the amplicons by gradually increasing the temperature. Since the melting profile depends on the GC content, length, sequence and strand complementarity of the product, HRM analysis is highly suitable for the detection of single-base variants and small insertions or deletions. The review gives an introduction into HRM analysis, covers important aspects in the development of an HRM analysis method and describes how HRM data are analysed and interpreted. Then we discuss the potential of HRM analysis based methods in food analysis, i.e. for the identification of closely related species and cultivars and the identification of pathogenic microorganisms. PMID:24731338

  14. QR Codes

    ERIC Educational Resources Information Center

    Lai, Hsin-Chih; Chang, Chun-Yen; Li, Wen-Shiane; Fan, Yu-Lin; Wu, Ying-Tien

    2013-01-01

    This study presents an m-learning method that incorporates Integrated Quick Response (QR) codes. This learning method not only achieves the objectives of outdoor education, but it also increases applications of Cognitive Theory of Multimedia Learning (CTML) (Mayer, 2001) in m-learning for practical use in a diverse range of outdoor locations. When…

  15. Sequence-dependent nanometer-scale conformational dynamics of individual RecBCD–DNA complexes

    PubMed Central

    Carter, Ashley R.; Seaberg, Maasa H.; Fan, Hsiu-Fang; Sun, Gang; Wilds, Christopher J.; Li, Hung-Wen; Perkins, Thomas T.

    2016-01-01

    RecBCD is a multifunctional enzyme that possesses both helicase and nuclease activities. To gain insight into the mechanism of its helicase function, RecBCD unwinding at low adenosine triphosphate (ATP) (2–4 μM) was measured using an optical-trapping assay featuring 1 base-pair (bp) precision. Instead of uniformly sized steps, we observed forward motion convolved with rapid, large-scale (∼4 bp) variations in DNA length. We interpret this motion as conformational dynamics of the RecBCD–DNA complex in an unwinding-competent state, arising, in part, by an enzyme-induced, back-and-forth motion relative to the dsDNA that opens and closes the duplex. Five observations support this interpretation. First, these dynamics were present in the absence of ATP. Second, the onset of the dynamics was coupled to RecBCD entering into an unwinding-competent state that required a sufficiently long 5′ strand to engage the RecD helicase. Third, the dynamics were modulated by the GC-content of the dsDNA. Fourth, the dynamics were suppressed by an engineered interstrand cross-link in the dsDNA that prevented unwinding. Finally, these dynamics were suppressed by binding of a specific non-hydrolyzable ATP analog. Collectively, these observations show that during unwinding, RecBCD binds to DNA in a dynamic mode that is modulated by the nucleotide state of the ATP-binding pocket. PMID:27220465

  16. Determination of 5-methylcytosine from plant DNA by high-performance liquid chromatography.

    PubMed

    Wagner, I; Capesius, I

    1981-06-26

    The relative amounts of the five nucleosides (deoxycytidine, 5-methyldeoxycytidine, deoxyadenosine, deoxyguanosine and thymidine) in the DNA of nine plant species, one plant satellite DNA, and one animal species were determined by high performance liquid chromatography. The method allows the clean separation of the nucleosides from 10 microgram samples with 15 min. The following values for the proportion of methylated cytosines among all cytosines were obtained: Lobularia maritima 18.5%, Nicotiana tabacum 32.6%, Pisum sativum 23.2%, Rhinanthus minor 29.2%, Sinapsis alba 12.2%, Vicia faba 30.5%, Viscum album 23.2%, Cymbidium pumilum 18.8%, Cymbidium pumilum AT-rich satellite DNA 15.8%, Triticum aestivum 22.4%. DNA of an animal, the gerbil, Meriones unguiculatus, had a methylation percentage of 3.1%. An estimate of the GC content based on the buoyant density of DNA tends to be lower than the actual value, an estimate based on the melting temperature tends to be higher. This supports the finding by other authors that DNA methylation decreases the buoyant density and may increase the melting temperature at high m5C concentration. PMID:7272310

  17. Sequence-dependent nanometer-scale conformational dynamics of individual RecBCD-DNA complexes.

    PubMed

    Carter, Ashley R; Seaberg, Maasa H; Fan, Hsiu-Fang; Sun, Gang; Wilds, Christopher J; Li, Hung-Wen; Perkins, Thomas T

    2016-07-01

    RecBCD is a multifunctional enzyme that possesses both helicase and nuclease activities. To gain insight into the mechanism of its helicase function, RecBCD unwinding at low adenosine triphosphate (ATP) (2-4 μM) was measured using an optical-trapping assay featuring 1 base-pair (bp) precision. Instead of uniformly sized steps, we observed forward motion convolved with rapid, large-scale (∼4 bp) variations in DNA length. We interpret this motion as conformational dynamics of the RecBCD-DNA complex in an unwinding-competent state, arising, in part, by an enzyme-induced, back-and-forth motion relative to the dsDNA that opens and closes the duplex. Five observations support this interpretation. First, these dynamics were present in the absence of ATP. Second, the onset of the dynamics was coupled to RecBCD entering into an unwinding-competent state that required a sufficiently long 5' strand to engage the RecD helicase. Third, the dynamics were modulated by the GC-content of the dsDNA. Fourth, the dynamics were suppressed by an engineered interstrand cross-link in the dsDNA that prevented unwinding. Finally, these dynamics were suppressed by binding of a specific non-hydrolyzable ATP analog. Collectively, these observations show that during unwinding, RecBCD binds to DNA in a dynamic mode that is modulated by the nucleotide state of the ATP-binding pocket. PMID:27220465

  18. Chilean Pitavia more closely related to Oceania and Old World Rutaceae than to Neotropical groups: evidence from two cpDNA non-coding regions, with a new subfamilial classification of the family

    PubMed Central

    Groppo, Milton; Kallunki, Jacquelyn A.; Pirani, José Rubens; Antonelli, Alexandre

    2012-01-01

    Abstract The position of the plant genus Pitavia within an infrafamilial phylogeny of Rutaceae (rue, or orange family) was investigated with the use of two non-coding regions from cpDNA, the trnL-trnF region and the rps16 intron. The only species of the genus, Pitavia punctata Molina, is restricted to the temperate forests of the Coastal Cordillera of Central-Southern Chile and threatened by loss of habitat. The genus traditionally has been treated as part of tribe Zanthoxyleae (subfamily Rutoideae) where it constitutes the monogeneric tribe Pitaviinae. This tribe and genus are characterized by fruits of 1 to 4 fleshy drupelets, unlike the dehiscent fruits typical of the subfamily. Fifty-five taxa of Rutaceae, representing 53 genera (nearly one-third of those in the family) and all subfamilies, tribes, and almost all subtribes of the family were included. Parsimony and Bayesian inference were used to infer the phylogeny; six taxa of Meliaceae, Sapindaceae, and Simaroubaceae, all members of Sapindales, were also used as out-groups. Results from both analyses were congruent and showed Pitavia as sister to Flindersia and Lunasia, both genera with species scattered through Australia, Philippines, Moluccas, New Guinea and the Malayan region, and phylogenetically far from other Neotropical Rutaceae, such as the Galipeinae (Galipeeae, Rutoideae) and Pteleinae (Toddalieae, former Toddalioideae). Additionally, a new circumscription of the subfamilies of Rutaceae is presented and discussed. Only two subfamilies (both monophyletic) are recognized: Cneoroideae (including Dictyolomatoideae, Spathelioideae, Cneoraceae, and Ptaeroxylaceae) and Rutoideae (including not only traditional Rutoideae but also Aurantioideae, Flindersioideae, and Toddalioideae). As a consequence, Aurantioideae (Citrus and allies) is reduced to tribal rank as Aurantieae. PMID:23717188

  19. Recombinant DNA means and method

    SciTech Connect

    Alford, B.L.; Mao, J.I.; Moir, D.T.; Taunton-Rigby, A.; Vovis, G.F.

    1987-05-19

    This patent describes a transformed living cell selected from the group consisting of fungi, yeast and bacteria, and containing genetic material derived from recombinant DNA material and coding for bovine rennin.

  20. Genomics dataset of unidentified disclosed isolates.

    PubMed

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. PMID:27408929

  1. Complete Genome Sequence for Treponema sp. OMZ 838 (ATCC 700772, DSM 16789), Isolated from a Necrotizing Ulcerative Gingivitis Lesion

    PubMed Central

    Chan, Yuki; Ma, Angel P. Y.; Lacap-Bugler, Donnabella C.; Huo, Yong-Biao; Keung Leung, W.

    2014-01-01

    The oral treponeme bacterium Treponema sp. OMZ 838 was originally isolated from a human necrotizing ulcerative gingivitis (NUG) lesion. Its taxonomic status remains uncertain. The complete genome sequence length was determined to be 2,708,067 bp, with a G+C content of 44.58%, and 2,236 predicted coding DNA sequences (CDS). PMID:25540346

  2. Draft Genome Sequence of the Bacteriocinogenic Strain Enterococcus faecalis DBH18, Isolated from Mallard Ducks (Anas platyrhynchos)

    PubMed Central

    Arbulu, Sara; Jimenez, Juan J.; Borrero, Juan; Sánchez, Jorge; Frantzen, Cyril; Herranz, Carmen; Nes, Ingolf F.; Cintas, Luis M.; Diep, Dzung B.

    2016-01-01

    Here, we report the draft genome sequence of Enterococcus faecalis DBH18, a bacteriocinogenic lactic acid bacterium (LAB) isolated from mallard ducks (Anas platyrhynchos). The assembly contains 2,836,724 bp, with a G+C content of 37.6%. The genome is predicted to contain 2,654 coding DNA sequences (CDSs) and 50 RNAs. PMID:27417838

  3. Draft Genome Sequence of the Bacteriocinogenic Strain Enterococcus faecalis DBH18, Isolated from Mallard Ducks (Anas platyrhynchos).

    PubMed

    Arbulu, Sara; Jimenez, Juan J; Borrero, Juan; Sánchez, Jorge; Frantzen, Cyril; Herranz, Carmen; Nes, Ingolf F; Cintas, Luis M; Diep, Dzung B; Hernández, Pablo E

    2016-01-01

    Here, we report the draft genome sequence of Enterococcus faecalis DBH18, a bacteriocinogenic lactic acid bacterium (LAB) isolated from mallard ducks (Anas platyrhynchos). The assembly contains 2,836,724 bp, with a G+C content of 37.6%. The genome is predicted to contain 2,654 coding DNA sequences (CDSs) and 50 RNAs. PMID:27417838

  4. Diversity and distribution of single-stranded DNA phages in the North Atlantic Ocean.

    PubMed

    Tucker, Kimberly P; Parsons, Rachel; Symonds, Erin M; Breitbart, Mya

    2011-05-01

    Knowledge of marine phages is highly biased toward double-stranded DNA (dsDNA) phages; however, recent metagenomic surveys have also identified single-stranded DNA (ssDNA) phages in the oceans. Here, we describe two complete ssDNA phage genomes that were reconstructed from a viral metagenome from 80 m depth at the Bermuda Atlantic Time-series Study (BATS) site in the northwestern Sargasso Sea and examine their spatial and temporal distributions. Both genomes (SARssφ1 and SARssφ2) exhibited similarity to known phages of the Microviridae family in terms of size, GC content, genome organization and protein sequence. PCR amplification of the replication initiation protein (Rep) gene revealed narrow and distinct depth distributions for the newly described ssDNA phages within the upper 200 m of the water column at the BATS site. Comparison of Rep gene sequences obtained from the BATS site over time revealed changes in the diversity of ssDNA phages over monthly time scales, although some nearly identical sequences were recovered from samples collected 4 years apart. Examination of ssDNA phage diversity along transects through the North Atlantic Ocean revealed a positive correlation between genetic distance and geographic distance between sampling sites. Together, the data suggest fundamental differences between the distribution of these ssDNA phages and the distribution of known marine dsDNA phages, possibly because of differences in host range, host distribution, virion stability, or viral evolution mechanisms and rates. Future work needs to elucidate the host ranges for oceanic ssDNA phages and determine their ecological roles in the marine ecosystem. PMID:21124487

  5. Improving the performance of true single molecule sequencing for ancient DNA

    PubMed Central

    2012-01-01

    Background Second-generation sequencing technologies have revolutionized our ability to recover genetic information from the past, allowing the characterization of the first complete genomes from past individuals and extinct species. Recently, third generation Helicos sequencing platforms, which perform true Single-Molecule DNA Sequencing (tSMS), have shown great potential for sequencing DNA molecules from Pleistocene fossils. Here, we aim at improving even further the performance of tSMS for ancient DNA by testing two novel tSMS template preparation methods for Pleistocene bone fossils, namely oligonucleotide spiking and treatment with DNA phosphatase. Results We found that a significantly larger fraction of the horse genome could be covered following oligonucleotide spiking however not reproducibly and at the cost of extra post-sequencing filtering procedures and skewed %GC content. In contrast, we showed that treating ancient DNA extracts with DNA phosphatase improved the amount of endogenous sequence information recovered per sequencing channel by up to 3.3-fold, while still providing molecular signatures of endogenous ancient DNA damage, including cytosine deamination and fragmentation by depurination. Additionally, we confirmed the existence of molecular preservation niches in large bone crystals from which DNA could be preferentially extracted. Conclusions We propose DNA phosphatase treatment as a mechanism to increase sequence coverage of ancient genomes when using Helicos tSMS as a sequencing platform. Together with mild denaturation temperatures that favor access to endogenous ancient templates over modern DNA contaminants, this simple preparation procedure can improve overall Helicos tSMS performance when damaged DNA templates are targeted. PMID:22574620

  6. Genomic and cDNA sequence tags of the hyperthermophilic archaeon Pyrobaculum aerophilum.

    PubMed Central

    Völkl, P; Markiewicz, P; Baikalov, C; Fitz-Gibbon, S; Stetter, K O; Miller, J H

    1996-01-01

    The hyperthermophilic archaeum, Pyrobaculum aerophilum, grows optimally at 100 degrees C with a doubling time of 180 min. It is a member of the phylogenetically ancient Thermoproteales order, but differs significantly from all other members by its facultatively aerobic metabolism. Due to its simple cultivation requirements and its nearly 100% plating efficiency, it was chosen as a model organism for studying the genome organization of hyperthermophilic ancient archaea. By a G+C content of the DNA of 52 mol%, sequence analysis was easily possible. At least some of the mRNA of P. aerophilum carried poly-A tails facilitating the construction of a cDNA library. 245 sequence tags of a poly-A primed cDNA library and 55 sequence tags from a 1-2 kb Sau3AI-fragment containing genomic library were analyzed and the corresponding amino acid sequences compared with protein sequences from databases. Fourteen percent of the cDNA and >9% of genomic DNA sequence tags revealed significant similarities to proteins in the databases. Matches were obtained to proteins from archaeal, bacterial and eukaryal sources. Some sequences showed greatest similarity to eukaryal rather than to bacterial versions of proteins, other matches were found to proteins which had previously only been found in eukaryotes. PMID:8948626

  7. Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

    PubMed Central

    Vuyisich, Momchilo; Arefin, Ayesha; Davenport, Karen; Feng, Shihai; Gleasner, Cheryl; McMurry, Kim; Parson-Quintana, Beverly; Price, Jennifer; Scholz, Matthew; Chain, Patrick

    2014-01-01

    Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing and de novo assembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing and de novo assembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderia spp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing and de novo assembly is not decreased when only 10 ng of input genomic DNA is used. PMID:25478564

  8. DNA-based watermarks using the DNA-Crypt algorithm

    PubMed Central

    Heider, Dominik; Barnekow, Angelika

    2007-01-01

    Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434

  9. Structural diversity of supercoiled DNA

    NASA Astrophysics Data System (ADS)

    Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn

    2015-10-01

    By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function.

  10. Structural diversity of supercoiled DNA

    PubMed Central

    Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn

    2015-01-01

    By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function. PMID:26455586