Science.gov

Sample records for gc-content dna codes

  1. Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.

    PubMed

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2014-12-19

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins.

  2. Biased Gene Conversion and GC-Content Evolution in the Coding Sequences of Reptiles and Vertebrates

    PubMed Central

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  3. GC content evolution in coding regions of angiosperm genomes: a unifying hypothesis.

    PubMed

    Glémin, Sylvain; Clément, Yves; David, Jacques; Ressayre, Adrienne

    2014-07-01

    In angiosperms (as in other species), GC content varies along and between genes, within a genome, and between genomes of different species, but the reason for this distribution is still an open question. Grass genomes are particularly intriguing because they exhibit a strong bimodal distribution of genic GC content and a sharp 5'-3' decreasing GC content gradient along most genes. Here, we propose a unifying model to explain the main patterns of GC content variation at the gene and genome scale. We argue that GC content patterns could be mainly determined by the interactions between gene structure, recombination patterns, and GC-biased gene conversion. Recent studies on fine-scale recombination maps in angiosperms support this hypothesis and previous results also fit this model. We propose that our model could be used as a null hypothesis to search for additional forces that affect GC content in angiosperms.

  4. Large-scale evaluation of experimentally determined DNA G+C contents with whole genome sequences of prokaryotes.

    PubMed

    Kim, Mincheol; Park, Sang-Cheol; Baek, Inwoo; Chun, Jongsik

    2015-03-01

    Historically, DNA G+C content has played a critical role in the description of bacterial and archaeal species. Despite its importance in prokaryote taxonomy, its accuracy has been questioned due to methodological heterogeneity and measurement errors of conventional methods. Here we investigated the extent of accuracy of experimentally determined DNA G+C contents by comparing the reference values calculated from whole genome sequences. The large-scale comparison revealed that G+C contents determined by high-performance liquid chromatography and buoyant density centrifugation methods were more similar to the genome-derived reference values than those generated by thermal denaturation method. However, there was a substantial degree of discrepancy in DNA G+C contents between values obtained by conventional methods and genome-derived reference values. The majority of the differences between them fell out of the acceptable range (i.e. 1 mol% G+C content difference) for species delimitation of prokaryotes. In contrast, when average nucleotide identity (ANI) was correlated to G+C difference among genomes, most G+C difference was confined to less than 1% within species. Therefore, erroneous conventional methods are not meaningful in the description of bacterial and archaeal species. For taxonomic purposes, DNA G+C content should be determined by calculating directly from high-quality genome sequences with at least 16× or higher sequencing depth of coverage.

  5. Relevance of GC content to the conservation of DNA polymerase III/mismatch repair system in Gram-positive bacteria

    PubMed Central

    Akashi, Motohiro; Yoshikawa, Hirofumi

    2013-01-01

    The mechanism of DNA replication is one of the driving forces of genome evolution. Bacterial DNA polymerase III, the primary complex of DNA replication, consists of PolC and DnaE. PolC is conserved in Gram-positive bacteria, especially in the Firmicutes with low GC content, whereas DnaE is widely conserved in most Gram-negative and Gram-positive bacteria. PolC contains two domains, the 3′-5′exonuclease domain and the polymerase domain, while DnaE only possesses the polymerase domain. Accordingly, DnaE does not have the proofreading function; in Escherichia coli, another enzyme DnaQ performs this function. In most bacteria, the fidelity of DNA replication is maintained by 3′-5′ exonuclease and a mismatch repair (MMR) system. However, we found that most Actinobacteria (a group of Gram-positive bacteria with high GC content) appear to have lost the MMR system and chromosomes may be replicated by DnaE-type DNA polymerase III with DnaQ-like 3′-5′ exonuclease. We tested the mutation bias of Bacillus subtilis, which belongs to the Firmicutes and found that the wild type strain is AT-biased while the mutS-deletant strain is remarkably GC-biased. If we presume that DnaE tends to make mistakes that increase GC content, these results can be explained by the mutS deletion (i.e., deletion of the MMR system). Thus, we propose that GC content is regulated by DNA polymerase and MMR system, and the absence of polC genes, which participate in the MMR system, may be the reason for the increase of GC content in Gram-positive bacteria such as Actinobacteria. PMID:24062730

  6. Diversity in isochore structure among cold-blooded vertebrates based on GC content of coding and non-coding sequences.

    PubMed

    Fortes, Gloria G; Bouza, Carmen; Martínez, Paulino; Sánchez, Laura

    2007-03-01

    To review the general consideration about the different compositional structure of warm and cold-blooded vertebrates genomes, we used of the increasing number of genetic sequences, including coding (exons) and non-coding (introns) regions, that have been deposited on the databases throughout last years. The nucleotide distributions of the third codon positions (GC3) have been analyzed in 1510 coding sequences (CDS) of fish, 1414 CDS of amphibians and 320 CDS of reptiles. Also, the relationship between GC content of 74, 56 and 25 CDS of fish, amphibians and reptiles, respectively and that of their corresponding introns (GCI) have been considerated. In accordance with recent data, sequence analysis showed the presence of very GC3-rich CDS in these poikilotherm vertebrates. However, very high diversity in compositional patterns among different orders of fish, amphibians and reptiles was found. Significant positive correlations between GC3 and GCI was also confirmed for the genes analyzed. Nevertheless, introns resulted to be poorer in GC than their corresponding CDS, this difference being larger than in human genome. Because the limited number of available sequences including exons and introns we must be cautious about the results derived from them. However, the indicious of higher GC richness of coding sequences than of their corresponding introns could aid to understand the discrepancy of sequence analysis with the ultracentrifugation studies in cold-blooded vertebrates that did not predict the existence of GC-rich isochores.

  7. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents

    PubMed Central

    Liu, Sophia S.; Hockenberry, Adam J.; Lancichinetti, Andrea; Jewett, Michael C.

    2016-01-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems. PMID:27835644

  8. DNA G+C content of the third codon position and codon usage biases of human genes.

    PubMed

    Sueoka, N; Kawanishi, Y

    2000-12-30

    The human genome, as in other eukaryotes, has a wide heterogeneity in the DNA base composition. The evolutionary basis for this heterogeneity has been unknown. A previous study of the human genome (846 genes analyzed) has shown that, in the major range of the G+C content in the third codon position (0.25-0.75), biases from the Parity Rule 2 (PR2) among the synonymous codons of the four-codon amino acids are similar except in the highest G+C range (Sueoka, N., 1999. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 238, 53-58.). PR2 is an intra-strand rule where A=T and G=C are expected when there are no biases between the two complementary strands of DNA in mutation and selection rates (substitution rates). In this study, 14,026 human genes were analyzed. In addition, the third codon positions of two-codon amino acids were analyzed. New results show the following: (a) The G+C contents of the third codon position of human genes are scattered in the G+C range of 0.22-0.96 in the third codon position. (b) The PR2 biases are similar in the range of 0.25-0.75, whereas, in the high G+C range (0.75-0.96; 13% of the genes), the PR2-bias fingerprints are different from those of the major range. (c) Unlike the PR2 biases, the G+C contents of the third codon position for both four-codon and two-codon amino acids are all correlated almost perfectly with the G+C content of the third codon position over the total G+C ranges. These results support the notion that the directional mutation pressure, rather than the directional selection pressure, is mainly responsible for the heterogeneity of the G+C content of the third codon position.

  9. Two aspects of DNA base composition: G+C content and translation-coupled deviation from intra-strand rule of A = T and G = C.

    PubMed

    Sueoka, N

    1999-07-01

    The relative contribution of mutation and selection to the G+C content of DNA was analyzed in bacterial species having widely different G+C contents. The analysis used two methods that were developed previously. The first method was to plot the average G+C content of a set of nucleotides against the G+C content of the third codon position for each gene. This method was used to present the G+C distribution of the third codon position and to assess the relative neutrality of a set of nucleotides to that of the G+C content of the third codon position. The second method was to plot the intrastrand bias of the third codon position from Parity Rule 2 (PR2), where A = T and G = C. It was found that whereas intragenomic distributions of the DNA G+C content of these bacteria are narrow in the majority of species, in some species the G+C content of the minor class of genes distributes over wider ranges than the major class of genes. On the other hand, ubiquitous PR2 biases are amino acid specific and independent of the G+C content of DNA, so that when averaged over the amino acids, the biases are small and not correlated with the DNA G+C content. Therefore, translation coupled PR2-biases are unlikely to explain the wide range of G+C contents among different species. Considering all data available, it was concluded that the amino acid-specific PR2 bias has only a minor effect, if any, on the average G+C content. In addition, PR2 bias patterns of different species show phylogenetic relationships, and the pattern can be as a taxal fingerprint.

  10. Modified ‘one amino acid-one codon’ engineering of high GC content TaqII-coding gene from thermophilic Thermus aquaticus results in radical expression increase

    PubMed Central

    2014-01-01

    Background An industrial approach to protein production demands maximization of cloned gene expression, balanced with the recombinant host’s viability. Expression of toxic genes from thermophiles poses particular difficulties due to high GC content, mRNA secondary structures, rare codon usage and impairing the host’s coding plasmid replication. TaqII belongs to a family of bifunctional enzymes, which are a fusion of the restriction endonuclease (REase) and methyltransferase (MTase) activities in a single polypeptide. The family contains thermostable REases with distinct specificities: TspGWI, TaqII, Tth111II/TthHB27I, TspDTI and TsoI and a few enzymes found in mesophiles. While not being isoschizomers, the enzymes exhibit amino acid (aa) sequence homologies, having molecular sizes of ~120 kDa share common modular architecture, resemble Type-I enzymes, cleave DNA 11/9 nt from the recognition sites, their activity is affected by S-adenosylmethionine (SAM). Results We describe the taqIIRM gene design, cloning and expression of the prototype TaqII. The enzyme amount in natural hosts is extremely low. To improve expression of the taqIIRM gene in Escherichia coli (E. coli), we designed and cloned a fully synthetic, low GC content, low mRNA secondary structure taqIIRM, codon-optimized gene under a bacteriophage lambda (λ) P R promoter. Codon usage based on a modified ‘one amino acid–one codon’ strategy, weighted towards low GC content codons, resulted in approximately 10-fold higher expression of the synthetic gene. 718 codons of total 1105 were changed, comprising 65% of the taqIIRM gene. The reason for we choose a less effective strategy rather than a resulting in high expression yields ‘codon randomization’ strategy, was intentional, sub-optimal TaqII in vivo production, in order to decrease the high ‘toxicity’ of the REase-MTase protein. Conclusions Recombinant wt and synthetic taqIIRM gene were cloned and expressed in E. coli. The modified

  11. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position.

    PubMed

    Sueoka, N

    1999-09-30

    The genome of higher eukaryotes consists of genes having a widely heterogeneous base composition at the third codon position. Ubiquitous variability of the DNA base composition has the following two aspects: intragenomic heterogeneity of the G+C content and the amino-acid-specific translation-coupled biases from the Parity Rule 2 (PR2). PR2 is an intrastrand rule where A = T and G = C are expected if there is no bias in mutation and selection between the two complementary strands of DNA. To examine whether or not the biases from PR2 are responsible for the wide heterogeneity of the DNA G+C content in human, the third codon position of 846 human genes was analyzed. Genes were separated into six groups according to their G+C content of the third codon position, and each group was examined for the translation-coupled PR2 biases in the nucleotide composition of the third codon position for two- and four-codon amino acids. The results show that genes in the different G+C content groups have similar PR2 biases, indicating that the intragenomic heterogeneity of the G+C content is not correlated with translation-coupled biases from the PR2. Therefore, the heterogeneity of the G+C content is likely to be determined by some other mechanism (e.g. locally variable directional mutation pressures) than amino-acid-specific selections for the codon preference.

  12. MITOCHONDRIAL DNA IN THE OOGAMOCHLAMYS CLADE (CHLOROPHYCEAE): HIGH GC CONTENT AND UNIQUE GENOME ARCHITECTURE FOR GREEN ALGAE(1).

    PubMed

    Borza, Tudor; Redmond, Erin K; Laflamme, Mark; Lee, Robert W

    2009-12-01

    Most mitochondrial genomes in the green algal phylum Chlorophyta are AT-rich, circular-mapping DNA molecules. However, mitochondrial genomes from the Reinhardtii clade of the Chlorophyceae lineage are linear and sometimes fragmented into subgenomic forms. Moreover, Polytomella capuana, from the Reinhardtii clade, has an elevated GC content (57.2%). In the present study, we examined mitochondrial genome conformation and GC bias in the Oogamochlamys clade of the Chlorophyceae, which phylogenetic data suggest is closely related to the Reinhardtii clade. Total DNA from selected Oogamochlamys taxa, including four Lobochlamys culleus (H. Ettl) Pröschold, B. Marin, U. G. Schlöss. et Melkonian strains, Lobochlamys segnis (H. Ettl) Pröschold, B. Marin, U. G. Schlöss. et Melkonian, and Oogamochlamys gigantea (O. Dill) Pröschold, B. Marin, U. G. Schlöss. et Melkonian, was subjected to Southern blot analyses with cob and cox1 probes, and the results suggest that the mitochondrial genome of these taxa is represented by multiple-sized linear DNA fragments with overlapping homologies. On the basis of these data, we propose that linear mitochondrial DNA with a propensity to become fragmented arose in an ancestor common to the Reinhardtii and Oogamochlamys clades or even earlier in the evolutionary history of the Chlorophyceae. Analyses of partial cob and cox1 sequences from these Oogamochlamys taxa revealed an unusually high GC content (49.9%-65.1%) and provided evidence for the accumulation of cob and cox1 pseudogenes and truncated sequences in the mitochondrial genome of all L. culleus strains examined.

  13. Correlation of Inter-Locus Polyglutamine Toxicity with CAG•CTG Triplet Repeat Expandability and Flanking Genomic DNA GC Content

    PubMed Central

    Nestor, Colm E.; Monckton, Darren G.

    2011-01-01

    Dynamic expansions of toxic polyglutamine (polyQ)-encoding CAG repeats in ubiquitously expressed, but otherwise unrelated, genes cause a number of late-onset progressive neurodegenerative disorders, including Huntington disease and the spinocerebellar ataxias. As polyQ toxicity in these disorders increases with repeat length, the intergenerational expansion of unstable CAG repeats leads to anticipation, an earlier age-at-onset in successive generations. Crucially, disease associated alleles are also somatically unstable and continue to expand throughout the lifetime of the individual. Interestingly, the inherited polyQ length mediating a specific age-at-onset of symptoms varies markedly between disorders. It is widely assumed that these inter-locus differences in polyQ toxicity are mediated by protein context effects. Previously, we demonstrated that the tendency of expanded CAG•CTG repeats to undergo further intergenerational expansion (their ‘expandability’) also differs between disorders and these effects are strongly correlated with the GC content of the genomic flanking DNA. Here we show that the inter-locus toxicity of the expanded polyQ tracts of these disorders also correlates with both the expandability of the underlying CAG repeat and the GC content of the genomic DNA flanking sequences. Inter-locus polyQ toxicity does not correlate with properties of the mRNA or protein sequences, with polyQ location within the gene or protein, or steady state transcript levels in the brain. These data suggest that the observed inter-locus differences in polyQ toxicity are not mediated solely by protein context effects, but that genomic context is also important, an effect that may be mediated by modifying the rate at which somatic expansion of the DNA delivers proteins to their cytotoxic state. PMID:22163004

  14. GC Content Increased at CpG Flanking Positions of Fish Genes Compared with Sea Squirt Orthologs as a Mechanism for Reducing Impact of DNA Methylation

    PubMed Central

    Wang, Yong; Leung, Frederick C. C.

    2008-01-01

    Background Fractional DNA methylation in sea squirts evolved to global DNA methylation in fish. The impact of global DNA methylation is reflected by more CpG depletions and/or more A/T to G/C changes at CpG flanking positions due to context-dependent mutations of methylated CpG sites. Methods and Findings In this report, we demonstrate that the sea squirt genes have undergone more CpG to TpG/CpA substitutions than the fish orthologs using homologous fragments from orthologous genes among Ciona intestinalis, Ciona savignyi, fugufish and zebrafish. To avoid premature transcription, the TGA sites derived from CGA were largely converted to TGG in sea squirt genes. By contrast, a significant increment of GC content at CpG flanking positions was shown in fish genes. The positively selected A/T to G/C substitutions, in combination with the CpG to TpG/CpA substitutions, are the sources of the extremely low CpG observed/expected ratios in vertebrates. The nonsynonymous substitutions caused by the GC content increase have resulted in frequent amino acid replacements in the directions that were not noticed previously. Conclusion The increased GC content at CpG flanking positions can reduce CpG loss in fish genes and attenuate the impact of DNA methylation on CpG-containing codons, probably accounting for evolution towards vertebrates. PMID:19005573

  15. GC content increased at CpG flanking positions of fish genes compared with sea squirt orthologs as a mechanism for reducing impact of DNA methylation.

    PubMed

    Wang, Yong; Leung, Frederick C C

    2008-01-01

    Fractional DNA methylation in sea squirts evolved to global DNA methylation in fish. The impact of global DNA methylation is reflected by more CpG depletions and/or more A/T to G/C changes at CpG flanking positions due to context-dependent mutations of methylated CpG sites. In this report, we demonstrate that the sea squirt genes have undergone more CpG to TpG/CpA substitutions than the fish orthologs using homologous fragments from orthologous genes among Ciona intestinalis, Ciona savignyi, fugufish and zebrafish. To avoid premature transcription, the TGA sites derived from CGA were largely converted to TGG in sea squirt genes. By contrast, a significant increment of GC content at CpG flanking positions was shown in fish genes. The positively selected A/T to G/C substitutions, in combination with the CpG to TpG/CpA substitutions, are the sources of the extremely low CpG observed/expected ratios in vertebrates. The nonsynonymous substitutions caused by the GC content increase have resulted in frequent amino acid replacements in the directions that were not noticed previously. The increased GC content at CpG flanking positions can reduce CpG loss in fish genes and attenuate the impact of DNA methylation on CpG-containing codons, probably accounting for evolution towards vertebrates.

  16. Easy DNA extraction method and optimisation of PCR-Temporal Temperature Gel Electrophoresis to identify the predominant high and low GC-content bacteria from dairy products.

    PubMed

    Parayre, Sandrine; Falentin, Hélène; Madec, Marie-Noëlle; Sivieri, Katia; Le Dizes, Anne-Sophie; Sohier, Danièle; Lortal, Sylvie

    2007-06-01

    Molecular fingerprinting of bacterial ecosystems has recently increased in food microbiology. The aim of this work was to develop a rapid and easy method to extract DNA from various cheeses, and to optimize the separation of low and high GC-content bacteria by PCR-Temporal Temperature Gel Electrophoresis (PCR-TTGE). Seventy six strains belonging to 50 of the most frequently encountered bacterial species in dairy products were used to construct a database. Specific PCR-TTGE ladders containing 17 species forming a regular scale were created. Amplicons of these species were sequenced and the GC-content plotted against the migration distance: the correlation coefficients obtained were r(2)=0.97 and r(2)=0.99, respectively for high and low GC-contents. The extraction method developed did not use any harmful solvent such as phenol/chloroform. The concentrations of DNA extracted from hard cooked and pressed cheeses, quantified by picogreen molecular probes, were between 0.7 and 6 microg/g for core samples and 8 to 30 microg/g for rind samples. Experimental as well as commercial dairy products were analysed using the developed method and the reproducibility of the profiles was 89%. The method appears to be particularly efficient in the characterization of the ecosystem of cheese rinds.

  17. DNA codes

    SciTech Connect

    Torney, D. C.

    2001-01-01

    We have begun to characterize a variety of codes, motivated by potential implementation as (quaternary) DNA n-sequences, with letters denoted A, C The first codes we studied are the most reminiscent of conventional group codes. For these codes, Hamming similarity was generalized so that the score for matched letters takes more than one value, depending upon which letters are matched [2]. These codes consist of n-sequences satisfying an upper bound on the similarities, summed over the letter positions, of distinct codewords. We chose similarity 2 for matches of letters A and T and 3 for matches of the letters C and G, providing a rough approximation to double-strand bond energies in DNA. An inherent novelty of DNA codes is 'reverse complementation'. The latter may be defined, as follows, not only for alphabets of size four, but, more generally, for any even-size alphabet. All that is required is a matching of the letters of the alphabet: a partition into pairs. Then, the reverse complement of a codeword is obtained by reversing the order of its letters and replacing each letter by its match. For DNA, the matching is AT/CG because these are the Watson-Crick bonding pairs. Reversal arises because two DNA sequences form a double strand with opposite relative orientations. Thus, as will be described in detail, because in vitro decoding involves the formation of double-stranded DNA from two codewords, it is reasonable to assume - for universal applicability - that the reverse complement of any codeword is also a codeword. In particular, self-reverse complementary codewords are expressly forbidden in reverse-complement codes. Thus, an appropriate distance between all pairs of codewords must, when large, effectively prohibit binding between the respective codewords: to form a double strand. Only reverse-complement pairs of codewords should be able to bind. For most applications, a DNA code is to be bi-partitioned, such that the reverse-complementary pairs are separated

  18. rDNA mapping, heterochromatin characterization and AT/GC content of Agapanthus africanus (L.) Hoffmanns (Agapanthaceae).

    PubMed

    Reis, Aryane C; Franco, Ana Luiza; Campos, Victória R; Souza, Flávia R; Zorzatto, Cristiane; Viccini, Lyderson F; Sousa, Saulo M

    2016-01-01

    Agapanthus (Agapanthaceae) has 10 species described. However, most taxonomists differ respect to this number because the great phenotypic plasticity of the species. The cytogenetic has been an important tool to aid the plant taxon identification, and to date, all taxa of Agapanthus L'Héritier studied cytologically, presented 2n = 30. Although the species possess large chromosomes, the group is karyologically little explored. This work aimed to increase the cytogenetic knowledge of Agapanthus africanus (L.) Hoffmanns by utilization of chromosome banding techniques with DAPI / CMA3 and Fluorescent in situ Hybridization (FISH). In addition, flow cytometry was used for determination of DNA content and the percentage of AT / GC nitrogenous bases. Plants studied showed 2n = 30 chromosomes, ranging from 4.34 - 8.55 µm, with the karyotype formulae (KF) = 10m + 5sm. Through FISH, one 45S rDNA signal was observed proximally to centromere of the chromosome 7, while for 5S rDNA sites we observed one signal proximally to centromere of chromosome 9. The 2C DNA content estimated for the species was 2C = 24.4 with 59% of AT and 41% of GC. Our data allowed important upgrade for biology and cytotaxonomy of Agapanthus africanus (L.) Hoffmanns.

  19. Ecological and evolutionary significance of genomic GC content diversity in monocots

    PubMed Central

    Šmarda, Petr; Bureš, Petr; Horová, Lucie; Leitch, Ilia J.; Mucina, Ladislav; Pacini, Ettore; Tichý, Lubomír; Grulich, Vít; Rotreklová, Olga

    2014-01-01

    Genomic DNA base composition (GC content) is predicted to significantly affect genome functioning and species ecology. Although several hypotheses have been put forward to address the biological impact of GC content variation in microbial and vertebrate organisms, the biological significance of GC content diversity in plants remains unclear because of a lack of sufficiently robust genomic data. Using flow cytometry, we report genomic GC contents for 239 species representing 70 of 78 monocot families and compare them with genomic characters, a suite of life history traits and climatic niche data using phylogeny-based statistics. GC content of monocots varied between 33.6% and 48.9%, with several groups exceeding the GC content known for any other vascular plant group, highlighting their unusual genome architecture and organization. GC content showed a quadratic relationship with genome size, with the decreases in GC content in larger genomes possibly being a consequence of the higher biochemical costs of GC base synthesis. Dramatic decreases in GC content were observed in species with holocentric chromosomes, whereas increased GC content was documented in species able to grow in seasonally cold and/or dry climates, possibly indicating an advantage of GC-rich DNA during cell freezing and desiccation. We also show that genomic adaptations associated with changing GC content might have played a significant role in the evolution of the Earth’s contemporary biota, such as the rise of grass-dominated biomes during the mid-Tertiary. One of the major selective advantages of GC-rich DNA is hypothesized to be facilitating more complex gene regulation. PMID:25225383

  20. On the molecular mechanism of GC content variation among eubacterial genomes

    PubMed Central

    2012-01-01

    Background As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes. Results Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group. Conclusion Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such

  1. Analytical Biases Associated with GC-Content in Molecular Evolution

    PubMed Central

    Romiguier, Jonathan; Roux, Camille

    2017-01-01

    Molecular evolution is being revolutionized by high-throughput sequencing allowing an increased amount of genome-wide data available for multiple species. While base composition summarized by GC-content is one of the first metrics measured in genomes, its genomic distribution is a frequently neglected feature in downstream analyses based on DNA sequence comparisons. Here, we show how base composition heterogeneity among loci and taxa can bias common molecular evolution analyses such as phylogenetic tree reconstruction, detection of natural selection and estimation of codon usage. We then discuss the biological, technical and methodological causes of these GC-associated biases and suggest approaches to overcome them. PMID:28261263

  2. High GC content causes orphan proteins to be intrinsically disordered.

    PubMed

    Basile, Walter; Sachenkova, Oxana; Light, Sara; Elofsson, Arne

    2017-03-01

    De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

  3. GC Content Heterogeneity Transition of Conserved Noncoding Sequences Occurred at the Emergence of Vertebrates

    PubMed Central

    Hettiarachchi, Nilmini; Saitou, Naruya

    2016-01-01

    Conserved non-coding sequences (CNSs) of Eukaryotes are known to be significantly enriched in regulatory sequences. CNSs of diverse lineages follow different patterns in abundance, sequence composition, and location. Here, we report a thorough analysis of CNSs in diverse groups of Eukaryotes with respect to GC content heterogeneity. We examined 24 fungi, 19 invertebrates, and 12 non-mammalian vertebrates so as to find lineage specific features of CNSs. We found that fungi and invertebrate CNSs are predominantly GC rich as in plants we previously observed, whereas vertebrate CNSs are GC poor. This result suggests that the CNS GC content transition occurred from the ancestral GC rich state of Eukaryotes to GC poor in the vertebrate lineage due to the enrollment of GC poor transcription factor binding sites that are lineage specific. CNS GC content is closely linked with the nucleosome occupancy that determines the location and structural architecture of DNAs. PMID:28040773

  4. Mutational pressure is a cause of inter- and intragenomic differences in GC-content of simplex and varicello viruses.

    PubMed

    Khrustalev, Vladislav Victorovich; Barkovsky, Eugene Victorovich

    2009-08-01

    Total GC-content (G+C), GC-content in codon positions and 0-fold, 2-fold and 4-fold degenerated sites in all coding districts from 10 completely sequenced genomes of simplex and varicello viruses have been calculated by the original "Coding Genome Scanner" algorithm. The low coefficient of correlation (R<0.5) between 3GC and G+C in all coding districts from unique regions (UL and US) of alphaherpesvirus genome is a new criterion of the strong mutational pressure that is the process of increasing the rates of nonsynonymous mutations because of the extreme saturation (GC-pressure) or desaturation (AT-pressure) of third (liberal) codon positions with G and C. Unique regions of HSV1, HSV2, CeHV1, CeHV2, CeHV16 and BoHV5 are under the influence of strong GC-pressure caused mostly by AT to GC transversions. Unique regions of EqHV1 are under the influence of weak GC-pressure. In unique regions of CeHV9 AT-pressure is strong; in EqHV4 and VZV unique regions AT-pressure is weak. Mutational AT-pressure in CeHV9 and VZV is caused mostly by transitions, while in EqHV4 it is caused mostly by transversions. The level of 3GC in coding districts situated in long terminal inverted repeats (LTR) of all these viruses is much higher than in coding districts from UL and US. Higher GC-content does not seem to depend on the gene itself, but it does depend on its location. V67 gene of EqHV1 is situated in LTR (3GC=0.853), while V67 gene of EqHV4 is situated in US (3GC=0.397). Higher rates of AT to GC transversions in coding districts situated in LTR should be due to the "anatomy" of long terminal inverted repeats. The process of AT to GC transversions is thought to take place only in doublestranded DNA. Indeed, in the potential secondary structure formed by singlestranded genomic DNA of alphaherpesviruses only joined inverted repeats should be doublestranded.

  5. Insights from the GC content analysis of 76genome survey sequences (GSS) from Elaeisoleiferaψ

    PubMed Central

    Bhore, Subhash J; Kassim, Amelia; Shah, Farida H

    2010-01-01

    South American oil-palm (Elaeis oleifera) is not cultivated in tropical countries like Malaysia on large scale due to low yield of palm oil derived from its fruit mesocarp. However, its fruit mesocarp oil contains about 68.6 % oleic acid (C18:1) which is more than double in comparison to commercially cultivated oilpalm, E. guineensis Jacq Tenera (hybrid of Dura (♀) x Pisifera (♂)). It is also known that E. oleifera is a good source of tocotrienols and carotenoids. Therefore, it is of interest to know the genome sequence of E. oleifera. The objective of this study is to generate genome survey sequences (GSS) to get GC content insight in the E. oleifera genome. The nuclear genomic DNA isolated from young leaf‐tissues was digested with EcoRI and NdeI/DraI restriction enzymes; and three genomic DNA libraries were constructed using Lambda ZAP‐II, pGEM®‐T Easy, and pDONR 222™ as cloning vectors. Generated 76 GSSs were analyzed by using Bioinformatics tools. The analysis result indicates that the adenine, cytosine, guanine and thymine content in generated GSSs are 30%, 20%, 20%, and 30% respectively. In conclusion, based on the precise GC content analysis of the randomly isolated 76 GSSs by using Bioinformatics tools we hypothesize that GC content in E. oleifera genome is 40%. The hypothesized 40% GC content in E. oleifera genome is expected to remain close to the GC content based on the whole genome analysis. ψThe nucleotide sequence data reported in this paper have been submitted to dbGSS division of the international DNA database (GenBank/DDBJ/EMBL) under accession numbers: DX575945- DX575972 and EI798032-EI798079. Abbreviations gDNA - Nuclear genomic DNA, GSSs - Genome survey sequences K12, SAOP - South American oil‐palm Db1 PMID:21364775

  6. Evolution of genome size and genomic GC content in carnivorous holokinetics (Droseraceae).

    PubMed

    Veleba, Adam; Šmarda, Petr; Zedek, František; Horová, Lucie; Šmerda, Jakub; Bureš, Petr

    2017-02-01

    Studies in the carnivorous family Lentibulariaceae in the last years resulted in the discovery of the smallest plant genomes and an unusual pattern of genomic GC content evolution. However, scarcity of genomic data in other carnivorous clades still prevents a generalization of the observed patterns. Here the aim was to fill this gap by mapping genome evolution in the second largest carnivorous family, Droseraceae, where this evolution may be affected by chromosomal holokinetism in Drosera METHODS: The genome size and genomic GC content of 71 Droseraceae species were measured by flow cytometry. A dated phylogeny was constructed, and the evolution of both genomic parameters and their relationship to species climatic niches were tested using phylogeny-based statistics. The 2C genome size of Droseraceae varied between 488 and 10 927 Mbp, and the GC content ranged between 37·1 and 44·7 %. The genome sizes and genomic GC content of carnivorous and holocentric species did not differ from those of their non-carnivorous and monocentric relatives. The genomic GC content positively correlated with genome size and annual temperature fluctuations. The genome size and chromosome numbers were inversely correlated in the Australian clade of Drosera CONCLUSIONS: Our results indicate that neither carnivory (nutrient scarcity) nor the holokinetism have a prominent effect on size and DNA base composition of Droseraceae genomes. However, the holokinetic drive seems to affect karyotype evolution in one of the major clades of Drosera Our survey confirmed that the evolution of GC content is tightly connected with the evolution of genome size and also with environmental conditions. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  7. DNA: Polymer and molecular code

    NASA Astrophysics Data System (ADS)

    Shivashankar, G. V.

    1999-10-01

    The thesis work focusses upon two aspects of DNA, the polymer and the molecular code. Our approach was to bring single molecule micromanipulation methods to the study of DNA. It included a home built optical microscope combined with an atomic force microscope and an optical tweezer. This combined approach led to a novel method to graft a single DNA molecule onto a force cantilever using the optical tweezer and local heating. With this method, a force versus extension assay of double stranded DNA was realized. The resolution was about 10 picoN. To improve on this force measurement resolution, a simple light backscattering technique was developed and used to probe the DNA polymer flexibility and its fluctuations. It combined the optical tweezer to trap a DNA tethered bead and the laser backscattering to detect the beads Brownian fluctuations. With this technique the resolution was about 0.1 picoN with a millisecond access time, and the whole entropic part of the DNA force-extension was measured. With this experimental strategy, we measured the polymerization of the protein RecA on an isolated double stranded DNA. We observed the progressive decoration of RecA on the l DNA molecule, which results in the extension of l , due to unwinding of the double helix. The dynamics of polymerization, the resulting change in the DNA entropic elasticity and the role of ATP hydrolysis were the main parts of the study. A simple model for RecA assembly on DNA was proposed. This work presents a first step in the study of genetic recombination. Recently we have started a study of equilibrium binding which utilizes fluorescence polarization methods to probe the polymerization of RecA on single stranded DNA. In addition to the study of material properties of DNA and DNA-RecA, we have developed experiments for which the code of the DNA is central. We studied one aspect of DNA as a molecular code, using different techniques. In particular the programmatic use of template specificity makes

  8. A tailing genome walking method suitable for genomes with high local GC content.

    PubMed

    Liu, Taian; Fang, Yongxiang; Yao, Wenjuan; Guan, Qisai; Bai, Gang; Jing, Zhizhong

    2013-10-15

    The tailing genome walking strategies are simple and efficient. However, they sometimes can be restricted due to the low stringency of homo-oligomeric primers. Here we modified their conventional tailing step by adding polythymidine and polyguanine to the target single-stranded DNA (ssDNA). The tailed ssDNA was then amplified exponentially with a specific primer in the known region and a primer comprising 5' polycytosine and 3' polyadenosine. The successful application of this novel method for identifying integration sites mediated by φC31 integrase in goat genome indicates that the method is more suitable for genomes with high complexity and local GC content.

  9. Genome Size and GC Content Evolution of Festuca: Ancestral Expansion and Subsequent Reduction

    PubMed Central

    Šmarda, Petr; Bureš, Petr; Horová, Lucie; Foggi, Bruno; Rossi, Graziano

    2008-01-01

    Background and Aims Plant evolution is well known to be frequently associated with remarkable changes in genome size and composition; however, the knowledge of long-term evolutionary dynamics of these processes still remains very limited. Here a study is made of the fine dynamics of quantitative genome evolution in Festuca (fescue), the largest genus in Poaceae (grasses). Methods Using flow cytometry (PI, DAPI), measurements were made of DNA content (2C-value), monoploid genome size (Cx-value), average chromosome size (C/n-value) and cytosine + guanine (GC) content of 101 Festuca taxa and 14 of their close relatives. The results were compared with the existing phylogeny based on ITS and trnL-F sequences. Key Results The divergence of the fescue lineage from related Poeae was predated by about a 2-fold monoploid genome and chromosome size enlargement, and apparent GC content enrichment. The backward reduction of these parameters, running parallel in both main evolutionary lineages of fine-leaved and broad-leaved fescues, appears to diverge among the existing species groups. The most dramatic reductions are associated with the most recently and rapidly evolving groups which, in combination with recent intraspecific genome size variability, indicate that the reduction process is probably ongoing and evolutionarily young. This dynamics may be a consequence of GC-rich retrotransposon proliferation and removal. Polyploids derived from parents with a large genome size and high GC content (mostly allopolyploids) had smaller Cx- and C/n-values and only slightly deviated from parental GC content, whereas polyploids derived from parents with small genome and low GC content (mostly autopolyploids) generally had a markedly increased GC content and slightly higher Cx- and C/n-values. Conclusions The present study indicates the high potential of general quantitative characters of the genome for understanding the long-term processes of genome evolution, testing evolutionary

  10. Superimposed Code Theorectic Analysis of DNA Codes and DNA Computing

    DTIC Science & Technology

    2010-03-01

    that the hybridization that occurs between a DNA strand and its Watson - Crick complement can be used to perform mathematical computation. This research... Watson - Crick (WC) duplex, e.g., TCGCA TCGCA . Note that non-WC duplexes can form and such a formation is called a cross-hybridization. Cross...5’GAAAGTCGCGTA3’ Watson Crick (WC) Duplexes TACGCGACTTTC Cross Hybridized (CH) Duplexes ATTTTTGCGTTA GAAAAAGAAGAA Coding Strands for Ligation

  11. Dissecting the contributions of GC content and codon usage to gene expression in the model alga Chlamydomonas reinhardtii

    PubMed Central

    Barahimipour, Rouhollah; Strenkert, Daniela; Neupert, Juliane; Schroda, Michael; Merchant, Sabeeha S.; Bock, Ralph

    2015-01-01

    Summary The efficiency of gene expression in all organisms depends on the nucleotide composition of the coding region. GC content and codon usage are the two key sequence features known to influence gene expression, but the underlying molecular mechanisms are not entirely clear. Here we have determined the relative contributions of GC content and codon usage to the efficiency of nuclear gene expression in the unicellular green alga Chlamydomonas reinhardtii. By comparing gene variants that encode an identical amino acid sequence but differ in their GC content and/or codon usage, we show that codon usage is the key factor determining translational efficiency and, surprisingly, also mRNA stability. By contrast, unfavorable GC content affects gene expression at the level of the chromatin structure by triggering heterochromatinization. We further show that mutant algal strains that permit high-level transgene expression are less susceptible to epigenetic transgene suppression and do not establish a repressive chromatin structure at the transgenic locus. Our data disentangle the relationship between GC content and codon usage, and suggest simple strategies to overcome the transgene expression problem in Chlamydomonas. PMID:26402748

  12. The decline of isochores in mammals: an assessment of the GC content variation along the mammalian phylogeny.

    PubMed

    Belle, Elise M S; Duret, Laurent; Galtier, Nicolas; Eyre-Walker, Adam

    2004-06-01

    Whether isochores, the large-scale variation of the GC content in mammalian genomes, are being maintained has recently been questioned. It has been suggested that GC-rich isochores originated in the ancestral amniote genome but that whatever force gave rise to them is no longer effective and that isochores are now disappearing from mammalian genomes. Here we investigated the evolution of the GC content of 41 coding genes in 6 to 66 species of mammals by estimating the ancestral GC content using a method which allows for different rates of substitution between sites. We found a highly significant decrease in the GC content during early mammalian evolution, as well as a weaker but still significant decrease in the GC content of GC-rich genes later in at least three groups of mammals: primates, rodents, and carnivores. These results are of interest because they confirm the recently suggested disappearance of GC-rich isochores in some mammalian genomes, and more importantly, they suggest that this disappearance started very early in mammalian evolution.

  13. Random Coding Bounds for DNA Codes Based on Fibonacci Ensembles of DNA Sequences

    DTIC Science & Technology

    2008-07-01

    COVERED (From - To) 6 Jul 08 – 11 Jul 08 4. TITLE AND SUBTITLE RANDOM CODING BOUNDS FOR DNA CODES BASED ON FIBONACCI ENSEMBLES OF DNA SEQUENCES ... sequences which are generalizations of the Fibonacci sequences . 15. SUBJECT TERMS DNA Codes, Fibonacci Ensembles, DNA Computing, Code Optimization 16...coding bound on the rate of DNA codes is proved. To obtain the bound, we use some ensembles of DNA sequences which are generalizations of the Fibonacci

  14. GC Content-Based Pan-Pox Universal PCR Assays for Poxvirus Detection▿

    PubMed Central

    Li, Yu; Meyer, Hermann; Zhao, Hui; Damon, Inger K.

    2010-01-01

    Chordopoxviruses of the subfamily Chordopoxvirinae, family Poxviridae, infect vertebrates and consist of at least eight genera with broad host ranges. For most chordopoxviruses, the number of viral genes and their relative order are highly conserved in the central region. The GC content of chordopoxvirus genomes, however, evolved into two distinct types: those with genome GC content of more than 60% and those with a content of less than 40% GC. Two standard PCR assays were developed to identify chordopoxviruses based on whether the target virus has a low or high GC content. In design of the assays, the genus Avipoxvirus, which encodes major rearrangements of gene clusters, was excluded. These pan-pox assays amplify DNA from more than 150 different isolates and strains, including from primary clinical materials, from all seven targeted genera of chordopoxviruses and four unclassified new poxvirus species. The pan-pox assays represent an important advance for the screening and diagnosis of human and animal poxvirus infections, and the technology used is accessible to many laboratories worldwide. PMID:19906902

  15. Complete chloroplast genome sequences of Drimys, Liriodendron, andPiper: Implications for the phylogeny of magnoliids and the evolution ofGC content

    SciTech Connect

    Zhengqiu, C.; Penaflor, C.; Kuehl, J.V.; Leebens-Mack, J.; Carlson, J.; dePamphilis, C.W.; Boore, J.L.; Jansen, R.K.

    2006-06-01

    the inverted repeat due to the presence of rRNA genes and lowest in the small single copy region where most NADH genes are located. Phylogenetic analyses using maximum parsimony and maximum likelihood methods were performed on DNA sequences of 61 protein-coding genes. Trees from both analyses provided strong support for the monophyly of magnoliids and two strongly supported groups were identified, the Canellales/Piperales and the Laurales/Magnoliales. The phylogenies also provided moderate to strong support for the basal position of Amborella, and a sister relationship of magnoliids to a clade that includes monocots and eudicots. The complete sequences of three magnoliid chloroplast genomes provide new data from the largest basal angiosperm clade. Evolutionary comparisons of these new genome sequences, combined with other published angiosperm genome, confirm that GC content is unevenly distributed across the genome by location, codon position, and functional group. Furthermore, phylogenetic analyses provide the strongest support so far for the hypothesis that the magnoliids are sister to a large clade that includes both monocots and eudicots.

  16. Superimposed Code Theoretic Analysis of DNA Codes and DNA Computing

    DTIC Science & Technology

    2008-01-01

    complements of one another and the DNA duplex formed is a Watson - Crick (WC) duplex. However, there are many instances when the formation of non-WC...that the user’s requirements for probe selection are met based on the Watson - Crick probe locality within a target. The second type, called

  17. [Analysis of correlation of local GC level in human protein coding genes].

    PubMed

    Chen, Xiang-Gui; Hu, Jun; Yang, Xiao

    2008-09-01

    GC level is an important feature of genomic composition, which significantly improve our understanding of structure, function and evolution of genes. In this paper, the nonredundant DNA sequence of 7,992 human protein coding genes were retrieved from public database and the local GC level of different sequence regions and correlation between GC levels were analyzed.. The results showed that the GC levels of different sequence regions were strikingly nonuniform. 5' untranslated regions were of richest GC, with average GC content being 62.5%. 3'-untranslated regions were of poorest GC, with average GC content being 43.97%. GC contents of 3' flanking sequences profoundly matched the GC levels of DNA large fragments where the genes were located. Although the GC contents of open reading frames (ORFs) were higher than that of intron, 3' non-translated region and 3' flanking sequences, high correlation existed among the GC contents of the four regions. Average GC content of the third codon position (GC3) was 58.9%, higher than that of the fist and second position, and showed high correlation to GC contents of ORFs, with correlation coefficients being 0.91, besides of its significant association with GC contents of intron, 3'-untranslated region and 3' flanking sequences. Moreover, the linear regression of GC3 against GC contents of 3' flanking sequences yielded a slope of 1.25. Thus, GC3 was a sensitive indicator for GC change of local genome. As for 5' flanking sequences, 5' untranslated regions, fist and second codon position, however, their GC level exhibited weaker correlation with that of other regions. These results suggest that the third codon positions, introns, 3'-untranslated regions and 3' flanking sequences may evolve similarly while first and second codon positions, 5' flanking sequences and 5' untranslated region were expected to bear more selective stress for holding their functions.

  18. Incorporating DNA Methylation Dynamics Into Epigenetic Codes

    PubMed Central

    Szulwach, Keith E.; Jin, Peng

    2014-01-01

    Summary Genomic function is dictated by a combination of DNA sequence and the molecular mechanisms controlling access to genetic information. Access to DNA can be determined by the interpretation of covalent modifications that influence the packaging of DNA into chromatin, including DNA methylation and histone modifications. These modifications are believed to be forms of “epigenetic codes” that exist in discernable combinations that reflect cellular phenotype. Although DNA methylation is known to play important roles in gene regulation and genomic function, its contribution to the encoding of epigenetic information is just beginning to emerge. Here we discuss paradigms associated with the various components of DNA methylation/demethylation and recent advances in the understanding of its dynamic regulation in the genome, integrating these mechanisms into a framework to explain how DNA methylation could contribute to epigenetic codes. PMID:24242211

  19. GC-Content of Synonymous Codons Profoundly Influences Amino Acid Usage

    PubMed Central

    Li, Jing; Zhou, Jun; Wu, Ying; Yang, Sihai; Tian, Dacheng

    2015-01-01

    Amino acids typically are encoded by multiple synonymous codons that are not used with the same frequency. Codon usage bias has drawn considerable attention, and several explanations have been offered, including variation in GC-content between species. Focusing on a simple parameter—combined GC proportion of all the synonymous codons for a particular amino acid, termed GCsyn—we try to deepen our understanding of the relationship between GC-content and amino acid/codon usage in more details. We analyzed 65 widely distributed representative species and found a close association between GCsyn, GC-content, and amino acids usage. The overall usages of the four amino acids with the greatest GCsyn and the five amino acids with the lowest GCsyn both vary with the regional GC-content, whereas the usage of the remaining 11 amino acids with intermediate GCsyn is less variable. More interesting, we discovered that codon usage frequencies are nearly constant in regions with similar GC-content. We further quantified the effects of regional GC-content variation (low to high) on amino acid usage and found that GC-content determines the usage variation of amino acids, especially those with extremely high GCsyn, which accounts for 76.7% of the changed GC-content for those regions. Our results suggest that GCsyn correlates with GC-content and has impact on codon/amino acid usage. These findings suggest a novel approach to understanding the role of codon and amino acid usage in shaping genomic architecture and evolutionary patterns of organisms. PMID:26248983

  20. The mutation spectrum in genomic late replication domains shapes mammalian GC content

    PubMed Central

    Kenigsberg, Ephraim; Yehuda, Yishai; Marjavaara, Lisette; Keszthelyi, Andrea; Chabes, Andrei; Tanay, Amos; Simon, Itamar

    2016-01-01

    Genome sequence compositions and epigenetic organizations are correlated extensively across multiple length scales. Replication dynamics, in particular, is highly correlated with GC content. We combine genome-wide time of replication (ToR) data, topological domains maps and detailed functional epigenetic annotations to study the correlations between replication timing and GC content at multiple scales. We find that the decrease in genomic GC content at large scale late replicating regions can be explained by mutation bias favoring A/T nucleotide, without selection or biased gene conversion. Quantification of the free dNTP pool during the cell cycle is consistent with a mechanism involving replication-coupled mutation spectrum that favors AT nucleotides at late S-phase. We suggest that mammalian GC content composition is shaped by independent forces, globally modulating mutation bias and locally selecting on functional element. Deconvoluting these forces and analyzing them on their native scales is important for proper characterization of complex genomic correlations. PMID:27085808

  1. Telomeres, histone code, and DNA damage response.

    PubMed

    Misri, S; Pandita, S; Kumar, R; Pandita, T K

    2008-01-01

    Genomic stability is maintained by telomeres, the end terminal structures that protect chromosomes from fusion or degradation. Shortening or loss of telomeric repeats or altered telomere chromatin structure is correlated with telomere dysfunction such as chromosome end-to-end associations that could lead to genomic instability and gene amplification. The structure at the end of telomeres is such that its DNA differs from DNA double strand breaks (DSBs) to avoid nonhomologous end-joining (NHEJ), which is accomplished by forming a unique higher order nucleoprotein structure. Telomeres are attached to the nuclear matrix and have a unique chromatin structure. Whether this special structure is maintained by specific chromatin changes is yet to be thoroughly investigated. Chromatin modifications implicated in transcriptional regulation are thought to be the result of a code on the histone proteins (histone code). This code, involving phosphorylation, acetylation, methylation, ubiquitylation, and sumoylation of histones, is believed to regulate chromatin accessibility either by disrupting chromatin contacts or by recruiting non-histone proteins to chromatin. The histone code in which distinct histone tail-protein interactions promote engagement may be the deciding factor for choosing specific DSB repair pathways. Recent evidence suggests that such mechanisms are involved in DNA damage detection and repair. Altered telomere chromatin structure has been linked to defective DNA damage response (DDR), and eukaryotic cells have evolved DDR mechanisms utilizing proficient DNA repair and cell cycle checkpoints in order to maintain genomic stability. Recent studies suggest that chromatin modifying factors play a critical role in the maintenance of genomic stability. This review will summarize the role of DNA damage repair proteins specifically ataxia-telangiectasia mutated (ATM) and its effectors and the telomere complex in maintaining genome stability.

  2. Telomeres, histone code, and DNA damage response

    PubMed Central

    Misri, S.; Pandita, S.; Kumar, R.; Pandita, T.K.

    2009-01-01

    Genomic stability is maintained by telomeres, the end terminal structures that protect chromosomes from fusion or degradation. Shortening or loss of telomeric repeats or altered telomere chromatin structure is correlated with telomere dysfunction such as chromosome end-to-end associations that could lead to genomic instability and gene amplification. The structure at the end of telomeres is such that its DNA differs from DNA double strand breaks (DSBs) to avoid nonhomologous end-joining (NHEJ), which is accomplished by forming a unique higher order nucleoprotein structure. Telomeres are attached to the nuclear matrix and have a unique chromatin structure. Whether this special structure is maintained by specific chromatin changes is yet to be thoroughly investigated. Chromatin modifications implicated in transcriptional regulation are thought to be the result of a code on the histone proteins (histone code). This code, involving phosphorylation, acetylation, methylation, ubiquitylation, and sumoylation of histones, is believed to regulate chromatin accessibility either by disrupting chromatin contacts or by recruiting non-histone proteins to chromatin. The histone code in which distinct histone tail-protein interactions promote engagement may be the deciding factor for choosing specific DSB repair pathways. Recent evidence suggests that such mechanisms are involved in DNA damage detection and repair. Altered telomere chromatin structure has been linked to defective DNA damage response (DDR), and eukaryotic cells have evolved DDR mechanisms utilizing proficient DNA repair and cell cycle checkpoints in order to maintain genomic stability. Recent studies suggest that chromatin modifying factors play a critical role in the maintenance of genomic stability. This review will summarize the role of DNA damage repair proteins specifically ataxia-telangiectasia mutated (ATM) and its effectors and the telomere complex in maintaining genome stability. PMID:19188699

  3. MicroRNA Stability in FFPE Tissue Samples: Dependence on GC Content

    PubMed Central

    Kakimoto, Yu; Tanaka, Masayuki; Kamiguchi, Hiroshi; Ochiai, Eriko; Osawa, Motoki

    2016-01-01

    MicroRNAs (miRNAs) are small non-coding RNAs responsible for fine-tuning of gene expression at post-transcriptional level. The alterations in miRNA expression levels profoundly affect human health and often lead to the development of severe diseases. Currently, high throughput analyses, such as microarray and deep sequencing, are performed in order to identify miRNA biomarkers, using archival patient tissue samples. MiRNAs are more robust than longer RNAs, and resistant to extreme temperatures, pH, and formalin-fixed paraffin-embedding (FFPE) process. Here, we have compared the stability of miRNAs in FFPE cardiac tissues using next-generation sequencing. The mode read length in FFPE samples was 11 nucleotides (nt), while that in the matched frozen samples was 22 nt. Although the read counts were increased 1.7-fold in FFPE samples, compared with those in the frozen samples, the average miRNA mapping rate decreased from 32.0% to 9.4%. These results indicate that, in addition to the fragmentation of longer RNAs, miRNAs are to some extent degraded in FFPE tissues as well. The expression profiles of total miRNAs in two groups were highly correlated (0.88 GC content (p<0.0001). The unequal degradation of each miRNA affected the abundance ranking in the library, and miR-133a was shown to be the most abundant in FFPE cardiac tissues instead of miR-1, which was predominant before fixation. Subsequent quantitative PCR (qPCR) analyses revealed that miRNAs with GC content of less than 40% are more degraded than GC-rich miRNAs (p<0.0001). We showed that deep sequencing data obtained using FFPE samples cannot be directly compared with that of fresh frozen samples. The combination of miRNA deep sequencing and other quantitative analyses, such as qPCR, may improve the utility of archival FFPE tissue samples. PMID:27649415

  4. The evolution of genomic GC content undergoes a rapid reversal within the genus Plasmodium.

    PubMed

    Nikbakht, Hamid; Xia, Xuhua; Hickey, Donal A

    2014-09-01

    The genome of the malarial parasite Plasmodium falciparum is extremely AT rich. This bias toward a low GC content is a characteristic of several, but not all, species within the genus Plasmodium. We compared 4283 orthologous pairs of protein-coding sequences between Plasmodium falciparum and the less AT-biased Plasmodium vivax. Our results indicate that the common ancestor of these two species was also extremely AT rich. This means that, although there was a strong bias toward A+T during the early evolution of the ancestral Plasmodium lineage, there was a subsequent reversal of this trend during the more recent evolution of some species, such as P. vivax. Moreover, we show that not only is the P. vivax genome losing its AT richness, it is actually gaining a very significant degree of GC richness. This example illustrates the potential volatility of nucleotide content during the course of molecular evolution. Such reversible fluxes in nucleotide content within lineages could have important implications for phylogenetic reconstruction based on molecular sequence data.

  5. A convolutional code-based sequence analysis model and its application.

    PubMed

    Liu, Xiao; Geng, Xiaoli

    2013-04-16

    A new approach for encoding DNA sequences as input for DNA sequence analysis is proposed using the error correction coding theory of communication engineering. The encoder was designed as a convolutional code model whose generator matrix is designed based on the degeneracy of codons, with a codon treated in the model as an informational unit. The utility of the proposed model was demonstrated through the analysis of twelve prokaryote and nine eukaryote DNA sequences having different GC contents. Distinct differences in code distances were observed near the initiation and termination sites in the open reading frame, which provided a well-regulated characterization of the DNA sequences. Clearly distinguished period-3 features appeared in the coding regions, and the characteristic average code distances of the analyzed sequences were approximately proportional to their GC contents, particularly in the selected prokaryotic organisms, presenting the potential utility as an added taxonomic characteristic for use in studying the relationships of living organisms.

  6. DNA-guided establishment of nucleosome patterns within coding regions of a eukaryotic genome

    PubMed Central

    Beh, Leslie Y.; Müller, Manuel M.; Muir, Tom W.; Kaplan, Noam; Landweber, Laura F.

    2015-01-01

    A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream from transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes and affects codon usage and amino acid composition in genes. Given that the AT-rich Tetrahymena genome is intrinsically unfavorable for nucleosome formation, we propose that these “seed” nucleosomes—together with trans-acting factors—may facilitate the establishment of nucleosome arrays within genes in vivo, while minimizing changes to the underlying coding sequences. PMID:26330564

  7. DNA-guided establishment of nucleosome patterns within coding regions of a eukaryotic genome.

    PubMed

    Beh, Leslie Y; Müller, Manuel M; Muir, Tom W; Kaplan, Noam; Landweber, Laura F

    2015-11-01

    A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream from transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes and affects codon usage and amino acid composition in genes. Given that the AT-rich Tetrahymena genome is intrinsically unfavorable for nucleosome formation, we propose that these "seed" nucleosomes--together with trans-acting factors--may facilitate the establishment of nucleosome arrays within genes in vivo, while minimizing changes to the underlying coding sequences.

  8. DNA Code Validation Using Experimental Fluorescence Measurements and Thermodynamic Calculations

    DTIC Science & Technology

    2004-03-01

    1 SUMMARY A DNA code is a collection of single-stranded DNA molecules. In DNA hybridization assays, the formation of any Watson - Crick ...combinations represent the canonical Watson - Crick pairings. To obtain the reverse complement of a strand of DNA , one must first reverse the order of the... DNA codes. Using software designed by A.Macula and V. Rykov, (Macula, 2003), a set of 13 pairs, (X, WC(X)), of Watson - Crick reverse complementary

  9. Ligation-mediated PCR with a back-to-back adapter reduces amplification bias resulting from variations in GC content.

    PubMed

    Ishihara, Satoru; Kotomura, Naoe; Yamamoto, Naoki; Ochiai, Hiroshi

    2017-08-15

    Ligation-mediated polymerase chain reaction (LM-PCR) is a common technique for amplification of a pool of DNA fragments. Here, a double-stranded oligonucleotide consisting of two primer sequences in back-to-back orientation was designed as an adapter for LM-PCR. When DNA fragments were ligated with this adapter, the fragments were sandwiched between two adapters in random orientations. In the ensuing PCR, ligation products linked at each end to an opposite side of the adapter, i.e. to a distinct primer sequence, were preferentially amplified compared with products linked at each end to an identical primer sequence. The use of this adapter in LM-PCR reduced the impairment of PCR by substrate DNA with a high GC content, compared with the use of traditional LM-PCR adapters. This result suggested that our method has the potential to contribute to reduction of the amplification bias that is caused by an intrinsic property of the sequence context in substrate DNA. A DNA preparation obtained from a chromatin immunoprecipitation assay using pulldown of a specific form of histone H3 was successfully amplified using the modified LM-PCR, and the amplified products could be used as probes in a fluorescence in situ hybridization analysis. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. V(D)J recombination coding junction formation without DNA homology: processing of coding termini.

    PubMed Central

    Boubnov, N V; Wills, Z P; Weaver, D T

    1993-01-01

    Coding junction formation in V(D)J recombination generates diversity in the antigen recognition structures of immunoglobulin and T-cell receptor molecules by combining processes of deletion of terminal coding sequences and addition of nucleotides prior to joining. We have examined the role of coding end DNA composition in junction formation with plasmid substrates containing defined homopolymers flanking the recombination signal sequence elements. We found that coding junctions formed efficiently with or without terminal DNA homology. The extent of junctional deletion was conserved independent of coding ends with increased, partial, or no DNA homology. Interestingly, G/C homopolymer coding ends showed reduced deletion regardless of DNA homology. Therefore, DNA homology cannot be the primary determinant that stabilizes coding end structures for processing and joining. PMID:8413286

  11. GC content around splice sites affects splicing through pre-mRNA secondary structures

    PubMed Central

    2011-01-01

    Background Alternative splicing increases protein diversity by generating multiple transcript isoforms from a single gene through different combinations of exons or through different selections of splice sites. It has been reported that RNA secondary structures are involved in alternative splicing. Here we perform a genomic study of RNA secondary structures around splice sites in humans (Homo sapiens), mice (Mus musculus), fruit flies (Drosophila melanogaster), and nematodes (Caenorhabditis elegans) to further investigate this phenomenon. Results We observe that GC content around splice sites is closely associated with the splice site usage in multiple species. RNA secondary structure is the possible explanation, because the structural stability difference among alternative splice sites, constitutive splice sites, and skipped splice sites can be explained by the GC content difference. Alternative splice sites tend to be GC-enriched and exhibit more stable RNA secondary structures in all of the considered species. In humans and mice, splice sites of first exons and long exons tend to be GC-enriched and hence form more stable structures, indicating the special role of RNA secondary structures in promoter proximal splicing events and the splicing of long exons. In addition, GC-enriched exon-intron junctions tend to be overrepresented in tissue-specific alternative splice sites, indicating the functional consequence of the GC effect. Compared with regions far from splice sites and decoy splice sites, real splice sites are GC-enriched. We also found that the GC-content effect is much stronger than the nucleotide-order effect to form stable secondary structures. Conclusion All of these results indicate that GC content is related to splice site usage and it may mediate the splicing process through RNA secondary structures. PMID:21281513

  12. GC content around splice sites affects splicing through pre-mRNA secondary structures.

    PubMed

    Zhang, Jing; Kuo, C C Jay; Chen, Liang

    2011-01-31

    Alternative splicing increases protein diversity by generating multiple transcript isoforms from a single gene through different combinations of exons or through different selections of splice sites. It has been reported that RNA secondary structures are involved in alternative splicing. Here we perform a genomic study of RNA secondary structures around splice sites in humans (Homo sapiens), mice (Mus musculus), fruit flies (Drosophila melanogaster), and nematodes (Caenorhabditis elegans) to further investigate this phenomenon. We observe that GC content around splice sites is closely associated with the splice site usage in multiple species. RNA secondary structure is the possible explanation, because the structural stability difference among alternative splice sites, constitutive splice sites, and skipped splice sites can be explained by the GC content difference. Alternative splice sites tend to be GC-enriched and exhibit more stable RNA secondary structures in all of the considered species. In humans and mice, splice sites of first exons and long exons tend to be GC-enriched and hence form more stable structures, indicating the special role of RNA secondary structures in promoter proximal splicing events and the splicing of long exons. In addition, GC-enriched exon-intron junctions tend to be overrepresented in tissue-specific alternative splice sites, indicating the functional consequence of the GC effect. Compared with regions far from splice sites and decoy splice sites, real splice sites are GC-enriched. We also found that the GC-content effect is much stronger than the nucleotide-order effect to form stable secondary structures. All of these results indicate that GC content is related to splice site usage and it may mediate the splicing process through RNA secondary structures.

  13. Crossing Disciplinary Lines--Bar Codes and DNA Codes.

    ERIC Educational Resources Information Center

    Liao, Thomas T.

    1997-01-01

    Discusses strategies that enable students to learn ideas and concepts in the context of how modern communication technology is designed and operates. Describes a course that integrates the study of math, science, and technology into topics that are engaging to students. Presents an activity that introduces students to digital coding and compares…

  14. DNA codes and information: formal structures and relational causes.

    PubMed

    Sternberg, Richard V

    2008-09-01

    Recently the terms "codes" and "information" as used in the context of molecular biology have been the subject of much discussion. Here I propose that a variety of structural realism can assist us in rethinking the concepts of DNA codes and information apart from semantic criteria. Using the genetic code as a theoretical backdrop, a necessary distinction is made between codes qua symbolic representations and information qua structure that accords with data. Structural attractors are also shown to be entailed by the mapping relation that any DNA code is a part of (as the domain). In this framework, these attractors are higher-order informational structures that obviate any "DNA-centric" reductionism. In addition to the implications that are discussed, this approach validates the array of coding systems now recognized in molecular biology.

  15. Alu elements in primates are preferentially lost from areas of high GC content

    PubMed Central

    Brookfield, John FY

    2013-01-01

    The currently-accepted dogma when analysing human Alu transposable elements is that ‘young’ Alu elements are found in low GC regions and ‘old’ Alus in high GC regions. The correlation between high GC regions and high gene frequency regions make this observation particularly difficult to explain. Although a number of studies have tackled the problem, no analysis has definitively explained the reason for this trend. These observations have been made by relying on the subfamily as a proxy for age of an element. In this study, we suggest that this is a misleading assumption and instead analyse the relationship between the taxonomic distribution of an individual element and its surrounding GC environment. An analysis of 103906 Alu elements across 6 human chromosomes was carried out, using the presence of orthologous Alu elements in other primate species as a proxy for age. We show that the previously-reported effect of GC content correlating with subfamily age is not reflected by the ages of the individual elements. Instead, elements are preferentially lost from areas of high GC content over time. The correlation between GC content and subfamily may be due to a change in insertion bias in the young subfamilies. The link between Alu subfamily age and GC region was made due to an over-simplification of the data and is incorrect. We suggest that use of subfamilies as a proxy for age is inappropriate and that the analysis of ortholog presence in other primate species provides a deeper insight into the data. PMID:23717800

  16. BioCode: Two biologically compatible Algorithms for embedding data in non-coding and coding regions of DNA

    PubMed Central

    2013-01-01

    Background In recent times, the application of deoxyribonucleic acid (DNA) has diversified with the emergence of fields such as DNA computing and DNA data embedding. DNA data embedding, also known as DNA watermarking or DNA steganography, aims to develop robust algorithms for encoding non-genetic information in DNA. Inherently DNA is a digital medium whereby the nucleotide bases act as digital symbols, a fact which underpins all bioinformatics techniques, and which also makes trivial information encoding using DNA straightforward. However, the situation is more complex in methods which aim at embedding information in the genomes of living organisms. DNA is susceptible to mutations, which act as a noisy channel from the point of view of information encoded using DNA. This means that the DNA data embedding field is closely related to digital communications. Moreover it is a particularly unique digital communications area, because important biological constraints must be observed by all methods. Many DNA data embedding algorithms have been presented to date, all of which operate in one of two regions: non-coding DNA (ncDNA) or protein-coding DNA (pcDNA). Results This paper proposes two novel DNA data embedding algorithms jointly called BioCode, which operate in ncDNA and pcDNA, respectively, and which comply fully with stricter biological restrictions. Existing methods comply with some elementary biological constraints, such as preserving protein translation in pcDNA. However there exist further biological restrictions which no DNA data embedding methods to date account for. Observing these constraints is key to increasing the biocompatibility and in turn, the robustness of information encoded in DNA. Conclusion The algorithms encode information in near optimal ways from a coding point of view, as we demonstrate by means of theoretical and empirical (in silico) analyses. Also, they are shown to encode information in a robust way, such that mutations have isolated

  17. BioCode: two biologically compatible Algorithms for embedding data in non-coding and coding regions of DNA.

    PubMed

    Haughton, David; Balado, Félix

    2013-04-09

    In recent times, the application of deoxyribonucleic acid (DNA) has diversified with the emergence of fields such as DNA computing and DNA data embedding. DNA data embedding, also known as DNA watermarking or DNA steganography, aims to develop robust algorithms for encoding non-genetic information in DNA. Inherently DNA is a digital medium whereby the nucleotide bases act as digital symbols, a fact which underpins all bioinformatics techniques, and which also makes trivial information encoding using DNA straightforward. However, the situation is more complex in methods which aim at embedding information in the genomes of living organisms. DNA is susceptible to mutations, which act as a noisy channel from the point of view of information encoded using DNA. This means that the DNA data embedding field is closely related to digital communications. Moreover it is a particularly unique digital communications area, because important biological constraints must be observed by all methods. Many DNA data embedding algorithms have been presented to date, all of which operate in one of two regions: non-coding DNA (ncDNA) or protein-coding DNA (pcDNA). This paper proposes two novel DNA data embedding algorithms jointly called BioCode, which operate in ncDNA and pcDNA, respectively, and which comply fully with stricter biological restrictions. Existing methods comply with some elementary biological constraints, such as preserving protein translation in pcDNA. However there exist further biological restrictions which no DNA data embedding methods to date account for. Observing these constraints is key to increasing the biocompatibility and in turn, the robustness of information encoded in DNA. The algorithms encode information in near optimal ways from a coding point of view, as we demonstrate by means of theoretical and empirical (in silico) analyses. Also, they are shown to encode information in a robust way, such that mutations have isolated effects. Furthermore, the

  18. Chloroplast DNA codes for transfer RNA.

    PubMed Central

    McCrea, J M; Hershberger, C L

    1976-01-01

    Transfer RNA's were isolated from Euglena gracilis. Chloroplast cistrons for tRNA were quantitated by hybridizing tRNA to ct DNA. Species of tRNA hybridizing to ct DNA were partially purified by hybridization-chromatography. The tRNA's hybridizing to ct DNA and nuclear DNA appear to be different. Total cellular tRNA was hybridized to ct DNA to an equivalent of approximately 25 cistrons. The total cellular tRNA was also separated into 2 fractions by chromatography on dihydroxyboryl substituted amino ethyl cellulose. Fraction I hybridized to both nuclear and ct DNA. Hybridizations to ct DNA indicated approximately 18 cistrons. Fraction II-tRNA hybridized only to ct DNA, saturating at a level of approximately 7 cistrons. The tRNA from isolated chloroplasts hybridized to both chloroplast and nuclear DNA. The level of hybridization to ct DNA indicated approximately 18 cistrons. Fraction II-type tRNA could not be detected in the isolated chloroplasts. PMID:823529

  19. DNA Barcoding through Quaternary LDPC Codes

    PubMed Central

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10−2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10−9 at the expense of a rate of read losses just in the order of 10−6. PMID:26492348

  20. DNA Barcoding through Quaternary LDPC Codes.

    PubMed

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2) per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9) at the expense of a rate of read losses just in the order of 10(-6).

  1. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  2. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  3. DNA barcode goes two-dimensions: DNA QR code web server.

    PubMed

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  4. Protection of the genome and central protein-coding sequences by non-coding DNA against DNA damage from radiation.

    PubMed

    Qiu, Guo-Hua

    2015-01-01

    Non-coding DNA comprises a very large proportion of the total genomic content in higher organisms, but its function remains largely unclear. Non-coding DNA sequences constitute the majority of peripheral heterochromatin, which has been hypothesized to be the genome's 'bodyguard' against DNA damage from chemicals and radiation for almost four decades. The bodyguard protective function of peripheral heterochromatin in genome defense has been strengthened by the results from numerous recent studies, which are summarized in this review. These data have suggested that cells and/or organisms with a higher level of heterochromatin and more non-coding DNA sequences, including longer telomeric DNA and rDNAs, exhibit a lower frequency of DNA damage, higher radioresistance and longer lifespan after IR exposure. In addition, the majority of heterochromatin is peripherally located in the three-dimensional structure of genome organization. Therefore, the peripheral heterochromatin with non-coding DNA could play a protective role in genome defense against DNA damage from ionizing radiation by both absorbing the radicals from water radiolysis in the cytosol and reducing the energy of IR. However, the bodyguard protection by heterochromatin has been challenged by the observation that DNA damage is less frequently detected in peripheral heterochromatin than in euchromatin, which is inconsistent with the expectation and simulation results. Previous studies have also shown that the DNA damage in peripheral heterochromatin is rarely repaired and moves more quickly, broadly and outwardly to approach the nuclear pore complex (NPC). Additionally, it has been shown that extrachromosomal circular DNAs (eccDNAs) are formed in the nucleus, highly detectable in the cytoplasm (particularly under stress conditions) and shuttle between the nucleus and the cytoplasm. Based on these studies, this review speculates that the sites of DNA damage in peripheral heterochromatin could occur more

  5. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  6. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  7. Comparative Genomics of Burkholderia singularis sp. nov., a Low G+C Content, Free-Living Bacterium That Defies Taxonomic Dissection of the Genus Burkholderia

    PubMed Central

    Vandamme, Peter; Peeters, Charlotte; De Smet, Birgit; Price, Erin P.; Sarovich, Derek S.; Henry, Deborah A.; Hird, Trevor J.; Zlosnik, James E. A.; Mayo, Mark; Warner, Jeffrey; Baker, Anthony; Currie, Bart J.; Carlier, Aurélien

    2017-01-01

    Four Burkholderia pseudomallei-like isolates of human clinical origin were examined by a polyphasic taxonomic approach that included comparative whole genome analyses. The results demonstrated that these isolates represent a rare and unusual, novel Burkholderia species for which we propose the name B. singularis. The type strain is LMG 28154T (=CCUG 65685T). Its genome sequence has an average mol% G+C content of 64.34%, which is considerably lower than that of other Burkholderia species. The reduced G+C content of strain LMG 28154T was characterized by a genome wide AT bias that was not due to reduced GC-biased gene conversion or reductive genome evolution, but might have been caused by an altered DNA base excision repair pathway. B. singularis can be differentiated from other Burkholderia species by multilocus sequence analysis, MALDI-TOF mass spectrometry and a distinctive biochemical profile that includes the absence of nitrate reduction, a mucoid appearance on Columbia sheep blood agar, and a slowly positive oxidase reaction. Comparisons with publicly available whole genome sequences demonstrated that strain TSV85, an Australian water isolate, also represents the same species and therefore, to date, B. singularis has been recovered from human or environmental samples on three continents. PMID:28932212

  8. Comparative Genomics of Burkholderia singularis sp. nov., a Low G+C Content, Free-Living Bacterium That Defies Taxonomic Dissection of the Genus Burkholderia.

    PubMed

    Vandamme, Peter; Peeters, Charlotte; De Smet, Birgit; Price, Erin P; Sarovich, Derek S; Henry, Deborah A; Hird, Trevor J; Zlosnik, James E A; Mayo, Mark; Warner, Jeffrey; Baker, Anthony; Currie, Bart J; Carlier, Aurélien

    2017-01-01

    Four Burkholderia pseudomallei-like isolates of human clinical origin were examined by a polyphasic taxonomic approach that included comparative whole genome analyses. The results demonstrated that these isolates represent a rare and unusual, novel Burkholderia species for which we propose the name B. singularis. The type strain is LMG 28154(T) (=CCUG 65685(T)). Its genome sequence has an average mol% G+C content of 64.34%, which is considerably lower than that of other Burkholderia species. The reduced G+C content of strain LMG 28154(T) was characterized by a genome wide AT bias that was not due to reduced GC-biased gene conversion or reductive genome evolution, but might have been caused by an altered DNA base excision repair pathway. B. singularis can be differentiated from other Burkholderia species by multilocus sequence analysis, MALDI-TOF mass spectrometry and a distinctive biochemical profile that includes the absence of nitrate reduction, a mucoid appearance on Columbia sheep blood agar, and a slowly positive oxidase reaction. Comparisons with publicly available whole genome sequences demonstrated that strain TSV85, an Australian water isolate, also represents the same species and therefore, to date, B. singularis has been recovered from human or environmental samples on three continents.

  9. The correlation coefficient of GC content of the genome-wide genes is positively correlated with animal evolutionary relationships.

    PubMed

    Du, Hongli; Hu, Haofu; Meng, Yuhuan; Zheng, Weihao; Ling, Fei; Wang, Jufang; Zhang, Xiquan; Nie, Qinghua; Wang, Xiaoning

    2010-09-24

    In this study, we present a new method for evaluating animal evolutionary relationships. We used the GC% levels of genome-wide genes to determine the correlation between the GC% content and evolutionary relationship. The correlation coefficients of the GC% content of the orthologous genes of the paired animal species were calculated for a total of 21 species, and the evolutionary branching dates of these 21 species were derived from fossil records. The correlation coefficient of the GC% content of the orthologous genes of the species pair under study served as an indicator of their evolutionary relationship. Moreover, there was a decreasing linear relationship between the correlation coefficient and evolutionary branching date (R(2)=0.930).

  10. Nonextensive statistical approach to non-coding human DNA

    NASA Astrophysics Data System (ADS)

    Oikonomou, Th.; Provata, A.; Tirnakli, U.

    2008-04-01

    We use q-exponential distributions, which maximize the nonextensive entropy Sq (defined as Sq≡(1-∑ipiq)/(q-1)), to study the size distributions of non-coding DNA (including introns and intergenic regions) in all human chromosomes. We show that the value of the exponent q describing the non-coding size distributions is similar for all chromosomes and varies between 2≤q≤2.3 with the exception of chromosomes X and Y.

  11. Genes Translocated into the Plastid Inverted Repeat Show Decelerated Substitution Rates and Elevated GC Content

    PubMed Central

    Li, Fay-Wei; Kuo, Li-Yaung; Pryer, Kathleen M.; Rothfels, Carl J.

    2016-01-01

    Plant chloroplast genomes (plastomes) are characterized by an inverted repeat (IR) region and two larger single copy (SC) regions. Patterns of molecular evolution in the IR and SC regions differ, most notably by a reduced rate of nucleotide substitution in the IR compared to the SC region. In addition, the organization and structure of plastomes is fluid, and rearrangements through time have repeatedly shuffled genes into and out of the IR, providing recurrent natural experiments on how chloroplast genome structure can impact rates and patterns of molecular evolution. Here we examine four loci (psbA, ycf2, rps7, and rps12 exon 2–3) that were translocated from the SC into the IR during fern evolution. We use a model-based method, within a phylogenetic context, to test for substitution rate shifts. All four loci show a significant, 2- to 3-fold deceleration in their substitution rate following translocation into the IR, a phenomenon not observed in any other, nontranslocated plastid genes. Also, we show that after translocation, the GC content of the third codon position and of the noncoding regions is significantly increased, implying that gene conversion within the IR is GC-biased. Taken together, our results suggest that the IR region not only reduces substitution rates, but also impacts nucleotide composition. This finding highlights a potential vulnerability of correlating substitution rate heterogeneity with organismal life history traits without knowledge of the underlying genome structure. PMID:27401175

  12. Analysis of Ribosome-Associated mRNAs in Rice Reveals the Importance of Transcript Size and GC Content in Translation

    PubMed Central

    Zhao, Dongyan; Hamilton, John P.; Hardigan, Michael; Yin, Dongmei; He, Tao; Vaillancourt, Brieanne; Reynoso, Mauricio; Pauluzzi, Germain; Funkhouser, Scott; Cui, Yuehua; Bailey-Serres, Julia; Jiang, Jiming; Buell, C. Robin; Jiang, Ning

    2016-01-01

    Gene expression is controlled at transcriptional and post-transcriptional levels including decoding of messenger RNA (mRNA) into polypeptides via ribosome-mediated translation. Translational regulation has been intensively studied in the model dicot plant Arabidopsis thaliana, and in this study, we assessed the translational status [proportion of steady-state mRNA associated with ribosomes] of mRNAs by Translating Ribosome Affinity Purification followed by mRNA-sequencing (TRAP-seq) in rice (Oryza sativa), a model monocot plant and the most important food crop. A survey of three tissues found that most transcribed rice genes are translated whereas few transposable elements are associated with ribosomes. Genes with short and GC-rich coding regions are overrepresented in ribosome-associated mRNAs, suggesting that the GC-richness characteristic of coding sequences in grasses may be an adaptation that favors efficient translation. Transcripts with retained introns and extended 5′ untranslated regions are underrepresented on ribosomes, and rice genes belonging to different evolutionary lineages exhibited differential enrichment on the ribosomes that was associated with GC content. Genes involved in photosynthesis and stress responses are preferentially associated with ribosomes, whereas genes in epigenetic regulation pathways are the least enriched on ribosomes. Such variation is more dramatic in rice than that in Arabidopsis and is correlated with the wide variation of GC content of transcripts in rice. Taken together, variation in the translation status of individual transcripts reflects important mechanisms of gene regulation, which may have a role in evolution and diversification. PMID:27852012

  13. Contrasting GC-content dynamics across 33 mammalian genomes: Relationship with life-history traits and chromosome sizes

    PubMed Central

    Romiguier, Jonathan; Ranwez, Vincent; Douzery, Emmanuel J.P.; Galtier, Nicolas

    2010-01-01

    The origin, evolution, and functional relevance of genomic variations in GC content are a long-debated topic, especially in mammals. Most of the existing literature, however, has focused on a small number of model species and/or limited sequence data sets. We analyzed more than 1000 orthologous genes in 33 fully sequenced mammalian genomes, reconstructed their ancestral isochore organization in the maximum likelihood framework, and explored the evolution of third-codon position GC content in representatives of 16 orders and 27 families. We showed that the previously reported erosion of GC-rich isochores is not a general trend. Several species (e.g., shrew, microbat, tenrec, rabbit) have independently undergone a marked increase in GC content, with a widening gap between the GC-poorest and GC-richest classes of genes. The intensively studied apes and (especially) murids do not reflect the general placental pattern. We correlated GC-content evolution with species life-history traits and cytology. Significant effects of body mass and genome size were detected, with each being consistent with the GC-biased gene conversion model. PMID:20530252

  14. Preliminary analysis of length and GC content variation in the ribosomal first internal transcribed spacer (ITS1) of marine animals.

    PubMed

    Chow, S; Ueno, Y; Toyokawa, M; Oohara, I; Takeyama, H

    2009-01-01

    Length and guanine-cytosine (GC) content of the ribosomal first internal transcribed spacer (ITS1) were compared across a wide variety of marine animal species, and its phylogenetic utility was investigated. From a total of 773 individuals representing 599 species, we only failed to amplify the ITS1 sequence from 87 individuals by polymerase chain reaction with universal ITS1 primers. No species was found to have an ITS1 region shorter than 100 bp. In general, the ITS1 sequences of vertebrates were longer (318 to 2,318 bp) and richer in GC content (56.8% to 78%) than those of invertebrates (117 to 1,613 bp and 35.8% to 71.3%, respectively). Specifically, gelatinous animals (Cnidaria and Ctenophora) were observed to have short ITS1 sequences (118 to 422 bp) with lower GC content (35.8% to 61.7%) than the other animal taxa. Mollusca and Crustacea were diverse groups with respect to ITS1 length, ranging from 108 to 1,118 and 182 to 1,613 bp, respectively. No universal relationship between length and GC content was observed. Our data indicated that ITS1 has a limited utility for phylogenetic analysis as obtaining confident sequence alignment was often impossible between different genera of the same family and even between congeneric species.

  15. Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes.

    PubMed

    Romiguier, Jonathan; Ranwez, Vincent; Douzery, Emmanuel J P; Galtier, Nicolas

    2010-08-01

    The origin, evolution, and functional relevance of genomic variations in GC content are a long-debated topic, especially in mammals. Most of the existing literature, however, has focused on a small number of model species and/or limited sequence data sets. We analyzed more than 1000 orthologous genes in 33 fully sequenced mammalian genomes, reconstructed their ancestral isochore organization in the maximum likelihood framework, and explored the evolution of third-codon position GC content in representatives of 16 orders and 27 families. We showed that the previously reported erosion of GC-rich isochores is not a general trend. Several species (e.g., shrew, microbat, tenrec, rabbit) have independently undergone a marked increase in GC content, with a widening gap between the GC-poorest and GC-richest classes of genes. The intensively studied apes and (especially) murids do not reflect the general placental pattern. We correlated GC-content evolution with species life-history traits and cytology. Significant effects of body mass and genome size were detected, with each being consistent with the GC-biased gene conversion model.

  16. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  17. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  18. Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes

    DTIC Science & Technology

    2007-10-01

    reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07

  19. Structural Code for DNA Recognition Revealed in Crystal Structures of Papillomavirus E2-DNA Targets

    NASA Astrophysics Data System (ADS)

    Rozenberg, Haim; Rabinovich, Dov; Frolow, Felix; Hegde, Rashmi S.; Shakked, Zippora

    1998-12-01

    Transcriptional regulation in papillomaviruses depends on sequence-specific binding of the regulatory protein E2 to several sites in the viral genome. Crystal structures of bovine papillomavirus E2 DNA targets reveal a conformational variant of B-DNA characterized by a roll-induced writhe and helical repeat of 10.5 bp per turn. A comparison between the free and the protein-bound DNA demonstrates that the intrinsic structure of the DNA regions contacted directly by the protein and the deformability of the DNA region that is not contacted by the protein are critical for sequence-specific protein/DNA recognition and hence for gene-regulatory signals in the viral system. We show that the selection of dinucleotide or longer segments with appropriate conformational characteristics, when positioned at correct intervals along the DNA helix, can constitute a structural code for DNA recognition by regulatory proteins. This structural code facilitates the formation of a complementary protein-DNA interface that can be further specified by hydrogen bonds and nonpolar interactions between the protein amino acids and the DNA bases.

  20. Superimposed Code Theoretic Analysis of Deoxyribonucleic Acid (DNA) Codes and DNA Computing

    DTIC Science & Technology

    2010-01-01

    hybridization that occurs between a DNA strand and its Watson - Crick complement can be used to perform mathematical computation. This research addresses how the...are 5′→3′ and strands with strikethrough are 3′→5′. A dsDNA duplex formed between a strand and its reverse complement is called a Watson - Crick (WC...3’ 5’ 3’ 5’TACGCGACTTTC3’ 5’GAAAGTCGCGTA3’ ATCAAACGATGC GCATCGTTTGAT Watson Crick (WC) Duplexes TACGCGACTTTC

  1. Extra-coding RNAs regulate neuronal DNA methylation dynamics

    PubMed Central

    Savell, Katherine E.; Gallus, Nancy V. N.; Simon, Rhiana C.; Brown, Jordan A.; Revanna, Jasmin S.; Osborn, Mary Katherine; Song, Esther Y.; O'Malley, John J.; Stackhouse, Christian T.; Norvil, Allison; Gowher, Humaira; Sweatt, J. David; Day, Jeremy J.

    2016-01-01

    Epigenetic mechanisms such as DNA methylation are essential regulators of the function and information storage capacity of neurons. DNA methylation is highly dynamic in the developing and adult brain, and is actively regulated by neuronal activity and behavioural experiences. However, it is presently unclear how methylation status at individual genes is targeted for modification. Here, we report that extra-coding RNAs (ecRNAs) interact with DNA methyltransferases and regulate neuronal DNA methylation. Expression of ecRNA species is associated with gene promoter hypomethylation, is altered by neuronal activity, and is overrepresented at genes involved in neuronal function. Knockdown of the Fos ecRNA locus results in gene hypermethylation and mRNA silencing, and hippocampal expression of Fos ecRNA is required for long-term fear memory formation in rats. These results suggest that ecRNAs are fundamental regulators of DNA methylation patterns in neuronal systems, and reveal a promising avenue for therapeutic targeting in neuropsychiatric disease states. PMID:27384705

  2. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.

  3. An efficient and rapid method for cDNA cloning from difficult templates using codon optimization and SOE-PCR: with human RANK and TIMP2 gene as examples.

    PubMed

    Huang, Gang; Wen, Qianjun; Gao, Qiangguo; Zhang, Fang; Bai, Yun

    2011-10-01

    As gene cloning from difficult templates with regionalized high GC content is a long recognized problem, we have developed a novel and reliable method to clone such genes. Firstly, the high GC content region of the target cDNA was synthesized directly after codon optimization and the remaining cDNA fragment without high GC content was generated by routine RT-PCR. Then the entire redesigned coding sequence of the target gene was obtained by fusing the above available two cDNA fragments with SOE-PCR (splicing by overlapping extension-PCR). We have cloned the human RANK gene (ten exons; CDS 1851 bp) using this strategy. The redesigned cDNA was transfected into an eukaryotic expression system (A459 cells) to verify its expression. RT-PCR and western blotting confirmed this. To validate our method, we also successfully cloned human TIMP2 gene (five exons; CDS 660 bp) also having a regionalized high GC content. Our strategy for combining codon optimization and SOE-PCR to clone difficult genes is thus feasible and potentially universally applicable.

  4. Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Satish, B.; Srinivas, K.; Rao, P. Madhusudana; Manimaran, P.

    2015-10-01

    We propose a new approach combining the chaos game representation and the two dimensional multifractal detrended cross correlation analysis methods to examine multifractal behavior in power law cross correlation between any pair of nucleotide sequences of unequal lengths. In this work, we analyzed the characteristic behavior of coding and non-coding DNA sequences of eight prokaryotes. The results show the presence of strong multifractal nature between coding and non-coding sequences of all data sets. We found that this integrative approach helps us to consider complete DNA sequences for characterization, and further it may be useful for classification, clustering, identification of class affiliation of nucleotide sequences etc. with high precision.

  5. Imperfect DNA mirror repeats in E. coli TnsA and other protein-coding DNA.

    PubMed

    Lang, Dorothy M

    2005-09-01

    DNA imperfect mirror repeats (DNA-IMRs) are ubiquitous in protein-coding DNA. However, they overlap and often have different centers of symmetry, making it difficult to evaluate their relationship to each other and to specific DNA and protein motifs and structures. This paper describes a systematic method of determining a hierarchy for DNA-IMRs and evaluates their relationship to protein structural elements (PSEs)--helices, turns and beta-sheets. DNA-IMRs are identifed by two different methods--DNA-IMRs terminated by reverse dinucleotides (rd-IMRs) and DNA-IMRs terminated by a single (mono) matching nucleotide (m-IMRs). Both rd-IMRs and m-IMRs are evaluated in 17 proteins, and illustrated in detail for TnsA. For each of the proteins, Fisher's exact test (FET) is used to measure the coincidence between the terminal dinucleotides of rd-IMRs and the terminal amino acids of individual PSEs. A significant correlation over a span of about 3 nt was found for each protein. The correlation is robust and for most genes, all rd-IMRs16 nt contain approximately 88% of the potential functional motifs. The protein translation of the longest rd- and m-IMRs span sequences important to the protein's structure and function. In all 17 proteins studied, the population of rd-IMRs is substantially less than the expected number and the population of m-IMRs greater than the expected number, indicating strong selective pressures. The association of rd-IMRs with PSEs restricts their spatial distribution, and therefore, their number. The greater than predicted number of m-IMRs indicates that DNA symmetry exists throughout the entire protein-coding region and may stabilize the sequence.

  6. Context-dependent mutation rates may cause spurious signatures of a fixation bias favoring higher GC-content in humans.

    PubMed

    Hernandez, Ryan D; Williamson, Scott H; Zhu, Lan; Bustamante, Carlos D

    2007-10-01

    Understanding the proximate and ultimate causes underlying the evolution of nucleotide composition in mammalian genomes is of fundamental interest to the study of molecular evolution. Comparative genomics studies have revealed that many more substitutions occur from G and C nucleotides to A and T nucleotides than the reverse, suggesting that mammalian genomes are not at equilibrium for base composition. Analysis of human polymorphism data suggests that mutations that increase GC-content tend to be at much higher frequencies than those that decrease or preserve GC-content when the ancestral allele is inferred via parsimony using the chimpanzee genome. These observations have been interpreted as evidence for a fixation bias in favor of G and C alleles due to either positive natural selection or biased gene conversion. Here, we test the robustness of this interpretation to violations of the parsimony assumption using a data set of 21,488 noncoding single nucleotide polymorphisms (SNPs) discovered by the National Institute of Environmental Health Sciences (NIEHS) SNPs project via direct resequencing of n = 95 individuals. Applying standard nonparametric and parametric population genetic approaches, we replicate the signatures of a fixation bias in favor of G and C alleles when the ancestral base is assumed to be the base found in the chimpanzee outgroup. However, upon taking into account the probability of misidentifying the ancestral state of each SNP using a context-dependent mutation model, the corrected distribution of SNP frequencies for GC-content increasing SNPs are nearly indistinguishable from the patterns observed for other types of mutations, suggesting that the signature of fixation bias is a spurious artifact of the parsimony assumption.

  7. Coding DNA repeated throughout intergenic regions of the Arabidopsis thaliana genome: Evolutionary footprints of RNA silencing

    USDA-ARS?s Scientific Manuscript database

    Pyknons are non-random sequence patterns significantly repeated throughout non-coding genomic DNA that also appear at least once among genes. They are interesting because they portend an unforeseen connection between coding and non-coding DNA. Pyknons have only been discovered in the human genome,...

  8. Evolutionary analysis of DNA-protein-coding regions based on a genetic code cube metric.

    PubMed

    Sanchez, Robersy

    2014-01-01

    The right estimation of the evolutionary distance between DNA or protein sequences is the cornerstone of the current phylogenetic analysis based on distance methods. Herein, it is demonstrated that the Manhattan distance (dw), weighted by the evolutionary importance of the nucleotide bases in the codon, is a naturally derived metric in the standard genetic code cube inserted into the three-dimensional Euclidean space. Based on the application of distance dw, a novel evolutionary model is proposed. This model includes insertion/deletion mutations that are very important for cancer studies, but usually discarded in classical evolutionary models. In this study, the new evolutionary model was applied to the phylogenetic analysis of the DNA protein-coding regions of 13 mammal mitochondrial genomes and of four cancer genetic- susceptibility genes (ATM, BRCA1, BRCA2 and p53) from nine mammals. The opossum (a marsupial) was used as an out-group species for both sets of sequences. The new evolutionary model yielded the correct topology, while the current models failed to separate the evolutionarily distant species of mouse and opossum.

  9. Length and GC content variability of introns among teleostean genomes in the light of the metabolic rate hypothesis.

    PubMed

    Chaurasia, Ankita; Tarallo, Andrea; Bernà, Luisa; Yagi, Mitsuharu; Agnisola, Claudio; D'Onofrio, Giuseppe

    2014-01-01

    A comparative analysis of five teleostean genomes, namely zebrafish, medaka, three-spine stickleback, fugu and pufferfish was performed with the aim to highlight the nature of the forces driving both length and base composition of introns (i.e., bpi and GCi). An inter-genome approach using orthologous intronic sequences was carried out, analyzing independently both variables in pairwise comparisons. An average length shortening of introns was observed at increasing average GCi values. The result was not affected by masking transposable and repetitive elements harbored in the intronic sequences. The routine metabolic rate (mass specific temperature-corrected using the Boltzmann's factor) was measured for each species. A significant correlation held between average differences of metabolic rate, length and GC content, while environmental temperature of fish habitat was not correlated with bpi and GCi. Analyzing the concomitant effect of both variables, i.e., bpi and GCi, at increasing genomic GC content, a decrease of bpi and an increase of GCi was observed for the significant majority of the intronic sequences (from ∼ 40% to ∼ 90%, in each pairwise comparison). The opposite event, concomitant increase of bpi and decrease of GCi, was counter selected (from <1% to ∼ 10%, in each pairwise comparison). The results further support the hypothesis that the metabolic rate plays a key role in shaping genome architecture and evolution of vertebrate genomes.

  10. Length and GC Content Variability of Introns among Teleostean Genomes in the Light of the Metabolic Rate Hypothesis

    PubMed Central

    Chaurasia, Ankita; Tarallo, Andrea; Bernà, Luisa; Yagi, Mitsuharu; Agnisola, Claudio; D’Onofrio, Giuseppe

    2014-01-01

    A comparative analysis of five teleostean genomes, namely zebrafish, medaka, three-spine stickleback, fugu and pufferfish was performed with the aim to highlight the nature of the forces driving both length and base composition of introns (i.e., bpi and GCi). An inter-genome approach using orthologous intronic sequences was carried out, analyzing independently both variables in pairwise comparisons. An average length shortening of introns was observed at increasing average GCi values. The result was not affected by masking transposable and repetitive elements harbored in the intronic sequences. The routine metabolic rate (mass specific temperature-corrected using the Boltzmann's factor) was measured for each species. A significant correlation held between average differences of metabolic rate, length and GC content, while environmental temperature of fish habitat was not correlated with bpi and GCi. Analyzing the concomitant effect of both variables, i.e., bpi and GCi, at increasing genomic GC content, a decrease of bpi and an increase of GCi was observed for the significant majority of the intronic sequences (from ∼40% to ∼90%, in each pairwise comparison). The opposite event, concomitant increase of bpi and decrease of GCi, was counter selected (from <1% to ∼10%, in each pairwise comparison). The results further support the hypothesis that the metabolic rate plays a key role in shaping genome architecture and evolution of vertebrate genomes. PMID:25093416

  11. Non-extensive trends in the size distribution of coding and non-coding DNA sequences in the human genome

    NASA Astrophysics Data System (ADS)

    Oikonomou, Th.; Provata, A.

    2006-03-01

    We study the primary DNA structure of four of the most completely sequenced human chromosomes (including chromosome 19 which is the most dense in coding), using non-extensive statistics. We show that the exponents governing the spatial decay of the coding size distributions vary between 5.2 ≤r ≤5.7 for the short scales and 1.45 ≤q ≤1.50 for the large scales. On the contrary, the exponents governing the spatial decay of the non-coding size distributions in these four chromosomes, take the values 2.4 ≤r ≤3.2 for the short scales and 1.50 ≤q ≤1.72 for the large scales. These results, in particular the values of the tail exponent q, indicate the existence of correlations in the coding and non-coding size distributions with tendency for higher correlations in the non-coding DNA.

  12. In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy.

    PubMed

    Zhang, Jin; Zhang, Wenqing; Yang, Huijie

    2016-01-01

    Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE).

  13. A study of oligonucleotide occurrence distributions in DNA coding segments.

    PubMed

    Castrignanò, T; Colosimo, A; Morante, S; Parisi, V; Rossi, G C

    1997-02-21

    In this paper we present a general strategy designed to study the occurrence frequency distributions of oligonucleotides in DNA coding segments and to deal with the problem of detecting possible patterns of genomic compositional inhomogeneities and disuniformities. Identifying specific tendencies or peculiar deviations in the distributions of the effective occurrence frequencies of oligonucleotides, with respect to what can be a priori expected, is of the greatest importance in biology. Differences between expected and actual distributions may in fact suggest or confirm the existence of specific biological mechanisms related to them. Similarly, a marked deviation in the occurrence frequency of an oligonucleotide may suggest that it belongs to the class of so-called "DNA signal (target) sequences". The approach we have elaborated is innovative in various aspects. Firstly, the analysis of the genomic data is carried out in the light of the observation that the distribution of the four nucleotides along the coding regions of the genoma is biased by the existence of a well-defined "reading frame". Secondly, the "experimental" numbers found by counting the occurrences of the various oligonucleotide sequences are appropriately corrected for the many kinds of mistakes and redundancies present in the available genetic Data Bases. A methodologically significant further improvement of our approach over the existing searching strategies is represented by the fact that, in order to decide whether or not the (corrected) "experimental" value of the occurrence frequency of a given oligonucleotide is within statistical expectations, a measure of the strength of the selective pressure, having acted on it in the course of the evolution, is assigned to the sequence, in a way that takes into account both the value of the "experimental" occurrence frequency of the sequence and the magnitude of the probability that this number might be the result of statistical fluctuations. If the

  14. Integrative RNA-seq and microarray data analysis reveals GC content and gene length biases in the psoriasis transcriptome

    PubMed Central

    Xing, Xianying; Voorhees, John J.; Elder, James T.; Johnston, Andrew; Gudjonsson, Johann E.

    2014-01-01

    Gene expression profiling of psoriasis has driven research advances and may soon provide the basis for clinical applications. For expression profiling studies, RNA-seq is now a competitive technology, but RNA-seq results may differ from those obtained by microarray. We therefore compared findings obtained by RNA-seq with those from eight microarray studies of psoriasis. RNA-seq and microarray datasets identified similar numbers of differentially expressed genes (DEGs), with certain genes uniquely identified by each technology. Correspondence between platforms and the balance of increased to decreased DEGs was influenced by mRNA abundance, GC content, and gene length. Weakly expressed genes, genes with low GC content, and long genes were all biased toward decreased expression in psoriasis lesions. The strength of these trends differed among array datasets, most likely due to variations in RNA quality. Gene length bias was by far the strongest trend and was evident in all datasets regardless of the expression profiling technology. The effect was due to differences between lesional and uninvolved skin with respect to the genome-wide correlation between gene length and gene expression, which was consistently more negative in psoriasis lesions. These findings demonstrate the complementary nature of RNA-seq and microarray technology and show that integrative analysis of both data types can provide a richer view of the transcriptome than strict reliance on a single method alone. Our results also highlight factors affecting correspondence between technologies, and we have established that gene length is a major determinant of differential expression in psoriasis lesions. PMID:24844236

  15. Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions.

    PubMed Central

    Choo, Y; Klug, A

    1994-01-01

    In the preceding paper [Choo, Y. & Klug, A. (1994) Proc. Natl. Acad. Sci. USA 91, 11163-11167], we showed how selections from a library of zinc fingers displayed on phage yielded fingers able to bind to a number of DNA triplets. Here, we describe a technique to deal efficiently with the converse problem--namely, the selection of a DNA binding site for a given zinc finger. This is done by screening against libraries of DNA triplet binding sites randomized in two positions but having one base fixed in the third position. The technique is applied here to determine the specificity of fingers previously selected by phage display. We find that some of these fingers are able to specify a unique base in each position of the cognate triplet. This is further illustrated by examples of fingers which can discriminate between closely related triplets as measured by their respective equilibrium dissociation constants. Comparing the amino acid sequences of fingers which specify a particular base in a triplet, we infer that in most instances, sequence-specific binding of zinc fingers to DNA can be achieved by using a small set of amino acid-nucleotide base contacts amenable to a code. Images PMID:7972028

  16. Insights into corn genes derived from large-scale cDNA sequencing.

    PubMed

    Alexandrov, Nickolai N; Brover, Vyacheslav V; Freidin, Stanislav; Troukhan, Maxim E; Tatarinova, Tatiana V; Zhang, Hongyu; Swaller, Timothy J; Lu, Yu-Ping; Bouck, John; Flavell, Richard B; Feldmann, Kenneth A

    2009-01-01

    We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701-EU977132 (FLI cDNA) and FK944382-FL482108 (EST).

  17. Dualism of gene GC content and CpG pattern in regard to expression in the human genome: magnitude versus breadth.

    PubMed

    Vinogradov, Alexander E

    2005-12-01

    In this article, I show that, in the human genome, the GC content in genes (but not the CpG island in the promoter) is related to the maximum level of gene expression among tissues, whereas the promoter CpG island and gene CpG level are more strongly related to the breadth of expression among tissues. The relevance of gene GC content to expression cannot be a consequence (i.e. a byproduct) of transcription because it does not correlate with expression in the germline. The variation of GC content and CpG level can determine the characteristics of gene expression in a synergistic interplay with transcription-factor-binding sites (mediated by chromatin condensation).

  18. Evaluation of DHPLC analysis in mutational scanning of Notch3, a gene with a high G-C content.

    PubMed

    Escary, J L; Cécillon, M; Maciazek, J; Lathrop, M; Tournier-Lasserve, E; Joutel, A

    2000-12-01

    Notch3 mutations cause CADASIL, an increasingly recognized cause of subcortical ischemic stroke and vascular dementia in human adults. In the absence of any specific diagnostic criteria, CADASIL diagnosis is based on mutational scanning of Notch3, which is a large gene composed of 33 exons with a high G-C content. In this study we examined the sensitivity of denaturing high performance liquid chromatography (DHPLC). First we established the theoretical optimal parameters, then we examined a large collection of amplicons in which we had previously identified distinct pathogenic mutations or polymorphisms. We further performed Notch3 mutational scanning in five patients suspected of CADASIL diagnosis in which previous scanning, including SSCP and heteroduplexes analysis, failed to detect any pathogenic mutation. DHPLC resolved 97% of mutations previously detected by sequencing and allowed identification of two novel pathogenic mutations: R607C and F984C. These data indicate that DHPLC is a sensitive screening method particularly suitable for epidemio-genetic screening of CADASIL.

  19. Differences in codon bias and GC content contribute to the balanced expression of TLR7 and TLR9

    PubMed Central

    Newman, Zachary R.; Young, Janet M.; Ingolia, Nicholas T.; Barton, Gregory M.

    2016-01-01

    The innate immune system detects diverse microbial species with a limited repertoire of immune receptors that recognize nucleic acids. The cost of this immune surveillance strategy is the potential for inappropriate recognition of self-derived nucleic acids and subsequent autoimmune disease. The relative expression of two closely related receptors, Toll-like receptor (TLR) 7 and TLR9, is balanced to allow recognition of microbial nucleic acids while limiting recognition of self-derived nucleic acids. Situations that tilt this balance toward TLR7 promote inappropriate responses, including autoimmunity; therefore, tight control of expression is critical for proper homeostasis. Here we report that differences in codon bias limit TLR7 expression relative to TLR9. Codon optimization of Tlr7 increases protein levels as well as responses to ligands, but, unexpectedly, these changes only modestly affect translation. Instead, we find that much of the benefit attributed to codon optimization is actually the result of enhanced transcription. Our findings, together with other recent examples, challenge the dogma that codon optimization primarily increases translation. We propose that suboptimal codon bias, which correlates with low guanine-cytosine (GC) content, limits transcription of certain genes. This mechanism may establish low levels of proteins whose overexpression leads to particularly deleterious effects, such as TLR7. PMID:26903634

  20. What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

    NASA Astrophysics Data System (ADS)

    Liebovitch, Larry

    1998-03-01

    The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find

  1. Non-coding RNAs: an emerging player in DNA damage response.

    PubMed

    Zhang, Chunzhi; Peng, Guang

    2015-01-01

    Non-coding RNAs play a crucial role in maintaining genomic stability which is essential for cell survival and preventing tumorigenesis. Through an extensive crosstalk between non-coding RNAs and the canonical DNA damage response (DDR) signaling pathway, DDR-induced expression of non-coding RNAs can provide a regulatory mechanism to accurately control the expression of DNA damage responsive genes in a spatio-temporal manner. Mechanistically, DNA damage alters expression of a variety of non-coding RNAs at multiple levels including transcriptional regulation, post-transcriptional regulation, and RNA degradation. In parallel, non-coding RNAs can directly regulate cellular processes involved in DDR by altering expression of their targeting genes, with a particular emphasis on miRNAs and lncRNAs. MiRNAs are required for almost every aspect of cellular responses to DNA damage, including sensing DNA damage, transducing damage signals, repairing damaged DNA, activating cell cycle checkpoints, and inducing apoptosis. As for lncRNAs, they control transcription of DDR relevant gene by four different regulatory models, including signal, decoy, guide, and scaffold. In addition, we also highlight potential clinical applications of non-coding RNAs as biomarkers and therapeutic targets for anti-cancer treatments using DNA-damaging agents including radiation and chemotherapy. Although tremendous advances have been made to elucidate the role of non-coding RANs in genome maintenance, many key questions remain to be answered including mechanistically how non-coding RNA pathway and DNA damage response pathway is coordinated in response to genotoxic stress. Copyright © 2014 Elsevier B.V. All rights reserved.

  2. Sequences encoding identical peptides for the analysis and manipulation of coding DNA

    PubMed Central

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression. PMID:23861567

  3. Sequences encoding identical peptides for the analysis and manipulation of coding DNA.

    PubMed

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression.

  4. Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.

    PubMed

    Suyama, Mikita; Lathe, Warren C; Bork, Peer

    2005-10-10

    We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions.

  5. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA.

    PubMed

    Schmitz, Jonathan F; Bornberg-Bauer, Erich

    2017-01-01

    Over the last few years, there has been an increasing amount of evidence for the de novo emergence of protein-coding genes, i.e. out of non-coding DNA. Here, we review the current literature and summarize the state of the field. We focus specifically on open questions and challenges in the study of de novo protein-coding genes such as the identification and verification of de novo-emerged genes. The greatest obstacle to date is the lack of high-quality genomic data with very short divergence times which could help precisely pin down the location of origin of a de novo gene. We conclude that, while there is plenty of evidence from a genetics perspective, there is a lack of functional studies of bona fide de novo genes and almost no knowledge about protein structures and how they come about during the emergence of de novo protein-coding genes. We suggest that future studies should concentrate on the functional and structural characterization of de novo protein-coding genes as well as the detailed study of the emergence of functional de novo protein-coding genes.

  6. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA

    PubMed Central

    Schmitz, Jonathan F; Bornberg-Bauer, Erich

    2017-01-01

    Over the last few years, there has been an increasing amount of evidence for the de novo emergence of protein-coding genes, i.e. out of non-coding DNA. Here, we review the current literature and summarize the state of the field. We focus specifically on open questions and challenges in the study of de novo protein-coding genes such as the identification and verification of de novo-emerged genes. The greatest obstacle to date is the lack of high-quality genomic data with very short divergence times which could help precisely pin down the location of origin of a de novo gene. We conclude that, while there is plenty of evidence from a genetics perspective, there is a lack of functional studies of bona fide de novo genes and almost no knowledge about protein structures and how they come about during the emergence of de novo protein-coding genes. We suggest that future studies should concentrate on the functional and structural characterization of de novo protein-coding genes as well as the detailed study of the emergence of functional de novo protein-coding genes. PMID:28163910

  7. Cloning and expression of cDNA coding for bouganin.

    PubMed

    den Hartog, Marcel T; Lubelli, Chiara; Boon, Louis; Heerkens, Sijmie; Ortiz Buijsse, Antonio P; de Boer, Mark; Stirpe, Fiorenzo

    2002-03-01

    Bouganin is a ribosome-inactivating protein that recently was isolated from Bougainvillea spectabilis Willd. In this work, the cloning and expression of the cDNA encoding for bouganin is described. From the cDNA, the amino-acid sequence was deduced, which correlated with the primary sequence data obtained by amino-acid sequencing on the native protein. Bouganin is synthesized as a pro-peptide consisting of 305 amino acids, the first 26 of which act as a leader signal while the 29 C-terminal amino acids are cleaved during processing of the molecule. The mature protein consists of 250 amino acids. Using the cDNA sequence encoding the mature protein of 250 amino acids, a recombinant protein was expressed, purified and characterized. The recombinant molecule had similar activity in a cell-free protein synthesis assay and had comparable toxicity on living cells as compared to the isolated native bouganin.

  8. RNA-DNA sequence differences spell genetic code ambiguities

    PubMed Central

    Nielsen, Michael L.

    2011-01-01

    A recent paper in Science by Li et al. 20111 reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized.  PMID:22567189

  9. TOWARDS A PROBABILISTIC RECOGNITION CODE FOR PROTEIN-DNA INTERACTIONS

    SciTech Connect

    P. BENOS; ET AL

    2000-09-01

    We are investigating the rules that govern protein-DNA interactions, using a statistical mechanics based formalism that is related to the Boltzmann Machine of the neural net literature. Our approach is data-driven, in which probabilistic algorithms are used to model protein-DNA interactions, given SELEX and phage data as input. Under the ''one-to-one'' model for interactions (i.e. one amino acid contacts one base), we can successfully identify the wild-type binding sites of EGR and MIG protein families. The predictions using our method are the same or better than that of methods existing in the literature, however our methodology offers the potential to capitalize in quantitative detail on more data as it becomes available.

  10. Substantial Regional Variation in Substitution Rates in the Human Genome: Importance of GC Content, Gene Density, and Telomere-Specific Effects

    NASA Astrophysics Data System (ADS)

    Arndt, Peter F.; Hwa, Terence; Petrov, Dmitri A.

    2005-06-01

    This study presents the first global, 1 Mbp level analysis of patterns of nucleotide substitutions along the human lineage. The study is based on the analysis of a large amount of repetitive elements deposited into the human genome since the mammalian radiation, yielding a number of results that would have been difficult to obtain using the more conventional comparative method of analysis. This analysis revealed substantial and consistent variability of rates of substitution, with the variability ranging up to 2-fold among different regions. The rates of substitutions of C or G nucleotides with A or T nucleotides vary much more sharply than the reverse rates suggesting that much of that variation is due to differences in mutation rates rather than in the probabilities of fixation of C/G vs. A/T nucleotides across the genome. For all types of substitution we observe substantially more hotspots than coldspots, with hotspots showing substantial clustering over tens of Mbp's. Our analysis revealed that GC-content of surrounding sequences is the best predictor of the rates of substitution. The pattern of substitution appears very different near telomeres compared to the rest of the genome and cannot be explained by the genome-wide correlations of the substitution rates with GC content or exon density. The telomere pattern of substitution is consistent with natural selection or biased gene conversion acting to increase the GC-content of the sequences that are within 10-15 Mbp away from the telomere.

  11. A novel Lie algebra of the genetic code over the Galois field of four DNA bases.

    PubMed

    Sánchez, Robersy; Grau, Ricardo; Morgado, Eberto

    2006-07-01

    Starting from the four DNA bases order in the Boolean lattice, a novel Lie Algebra of the genetic code is proposed. Here, the main partitions of the genetic code table were obtained as equivalent classes of quotient spaces of the genetic code vector space over the Galois field of the four DNA bases. The new algebraic structure shows strong connections among algebraic relationships, codon assignments and physicochemical properties of amino acids. Moreover, a distance defined between codons expresses a physicochemical meaning. It was also noticed that the distance between wild type and mutant codons tends to be small in mutational variants of four genes: human phenylalanine hydroxylase, human beta-globin, HIV-1 protease and HIV-1 reverse transcriptase. These results strongly suggest that deterministic rules in genetic code origin must be involved.

  12. Genetic analysis of an aphid endosymbiont DNA fragment homologous to the rnpA-rpmH-dnaA-dnaN-gyrB region of eubacteria.

    PubMed

    Lai, C Y; Baumann, P

    1992-04-15

    Buchnera aphidicola is a Gram- eubacterium with a DNA G+C content of 28-30 mol%. This organism is an obligate intracellular symbiont of aphids. To determine its similarity to or difference from other eubacteria, a 4.9-kb DNA fragment from B. aphidicola containing the gene homologous to Escherichia coli dnaA (a gene involved in the initiation of chromosome replication) was cloned into E. coli and sequenced. The order of genes on this fragment, 60K-10K-rnpA-rpmH-dnaA-dnaN-gyrB, was similar to that found in other eubacteria. The sole difference was the absence of recF between dnaN and gyrB. The deduced amino acid sequence of these proteins resembled those of E. coli by a 41 to 83% identity. Except for E. coli, in all the eubacteria so far examined, dnaA is preceded by multiple 9-nucleotide repeats known as a DnaA boxes. No DnaA boxes were detected in the endosymbiont DNA. The possibility that this observation is a consequence of the low G+C content of this DNA fragment (14 mol% G+C) is unlikely since in Mycoplasma capricolum this fragment (19 mol% G+C) has eight DnaA boxes (Fujita et al., 1992). The presence of the sequence, GATC, recognized by the Dam methyl-transferase system, only within six regions coding for proteins suggests that methylation is not a factor in the regulation of the initiation of endosymbiont chromosome replication.

  13. Statistical analysis of nucleotide runs in coding and noncoding DNA sequences.

    PubMed

    Sprizhitsky YuA; Nechipurenko YuD; Alexandrov, A A; Volkenstein, M V

    1988-10-01

    A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.

  14. Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy

    PubMed Central

    Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.

    2015-01-01

    Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID

  15. [Cloning and insertion mutagenesis of DNA fragment coding for the luminescent system of Photobacterium leiognathi].

    PubMed

    Ptitsyn, L R; Gurevich, V B; Barsanova, T G; Shenderov, A N; Khaĭkinson, M Ia

    1988-10-01

    Fragments of DNA, obtained from the luminescent bacterium Photobacterium leiognathi and inserted into the plasmid pBR322, were found to code for the luminescence expressed in E. coli cells. The genetic functions necessary for light production in E. coli are localized on a DNA fragment of about 7 kbp. The insertion mutagenesis was used to define the luminescence functions encoded by the hybrid plasmid.

  16. Coding and noncoding plastid DNA in palm systematics.

    PubMed

    Asmussen, C B; Chase, M W

    2001-06-01

    Plastid DNA sequences evolve slowly in palms but show that the family is monophyletic and highly divergent relative to other major monocot clades. It is therefore difficult to place the root within the palms because faster evolving, length-variable sequences cannot be aligned with outgroup monocots, and length-conserved regions have been thought to give too few characters to resolve basal nodes. To solve this problem, we combined 94 ingroup and 24 outgroup sequences from the length-conserved rbcL gene with ingroup and alignable outgroup sequences from noncoding rps16 intron and trnL-trnF regions. The separate rps16 intron and trnL-trnF region contained about the same number of variable sites (autapomorphies not included) as rbcL, but gave higher retention indices and more clades with bootstrap support. In general, the strict consensus tree based on combined rbcL, rps16 intron, and trnL-trnF data showed more resolution towards the base of the palm family than previous hypotheses of relationships of the Arecaceae. An important result was the position of subfamily Calamoideae as sister to the rest of the palms, but this received <50% bootstrap support. Another result of systematic significance was the indication that subfamily Phytelephantoideae is related to two tribes from subfamily Ceroxyloideae, Cyclospatheae and Ceroxyleae.

  17. DNA methylation patterns of protein-coding genes and long non-coding RNAs in males with schizophrenia.

    PubMed

    Liao, Qi; Wang, Yunliang; Cheng, Jia; Dai, Dongjun; Zhou, Xingyu; Zhang, Yuzheng; Li, Jinfeng; Yin, Honglei; Gao, Shugui; Duan, Shiwei

    2015-11-01

    Schizophrenia (SCZ) is one of the most complex mental illnesses affecting ~1% of the population worldwide. SCZ pathogenesis is considered to be a result of genetic as well as epigenetic alterations. Previous studies have aimed to identify the causative genes of SCZ. However, DNA methylation of long non-coding RNAs (lncRNAs) involved in SCZ has not been fully elucidated. In the present study, a comprehensive genome-wide analysis of DNA methylation was conducted using samples from two male patients with paranoid and undifferentiated SCZ, respectively. Methyl-CpG binding domain protein-enriched genome sequencing was used. In the two patients with paranoid and undifferentiated SCZ, 1,397 and 1,437 peaks were identified, respectively. Bioinformatic analysis demonstrated that peaks were enriched in protein-coding genes, which exhibited nervous system and brain functions. A number of these peaks in gene promoter regions may affect gene expression and, therefore, influence SCZ-associated pathways. Furthermore, 7 and 20 lncRNAs, respectively, in the Refseq database were hypermethylated. According to the lncRNA dataset in the NONCODE database, ~30% of intergenic peaks overlapped with novel lncRNA loci. The results of the present study demonstrated that aberrant hypermethylation of lncRNA genes may be an important epigenetic factor associated with SCZ. However, further studies using larger sample sizes are required.

  18. Synonymous codon bias and functional constraint on GC3-related DNA backbone dynamics in the prokaryotic nucleoid.

    PubMed

    Babbitt, Gregory A; Alawad, Mohammed A; Schulze, Katharina V; Hudson, André O

    2014-01-01

    While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3-base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (≈GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an 'accessory' during an early expansion of a primordial genetic code, allowing for multiplexed protein coding and structural dynamic information within the same molecular context.

  19. Synonymous codon bias and functional constraint on GC3-related DNA backbone dynamics in the prokaryotic nucleoid

    PubMed Central

    Babbitt, Gregory A.; Alawad, Mohammed A.; Schulze, Katharina V.; Hudson, André O.

    2014-01-01

    While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3-base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (≈GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an ‘accessory’ during an early expansion of a primordial genetic code, allowing for multiplexed protein coding and structural dynamic information within the same molecular context. PMID:25200075

  20. Diversity and Recombination of Dispersed Ribosomal DNA and Protein Coding Genes in Microsporidia

    PubMed Central

    Ironside, Joseph Edward

    2013-01-01

    Microsporidian strains are usually classified on the basis of their ribosomal DNA (rDNA) sequences. Although rDNA occurs as multiple copies, in most non-microsporidian species copies within a genome occur as tandem arrays and are homogenised by concerted evolution. In contrast, microsporidian rDNA units are dispersed throughout the genome in some species, and on this basis are predicted to undergo reduced concerted evolution. Furthermore many microsporidian species appear to be asexual and should therefore exhibit reduced genetic diversity due to a lack of recombination. Here, DNA sequences are compared between microsporidia with different life cycles in order to determine the effects of concerted evolution and sexual reproduction upon the diversity of rDNA and protein coding genes. Comparisons of cloned rDNA sequences between microsporidia of the genus Nosema with different life cycles provide evidence of intragenomic variability coupled with strong purifying selection. This suggests a birth and death process of evolution. However, some concerted evolution is suggested by clustering of rDNA sequences within species. Variability of protein-coding sequences indicates that considerable intergenomic variation also occurs between microsporidian cells within a single host. Patterns of variation in microsporidian DNA sequences indicate that additional diversity is generated by intragenomic and/or intergenomic recombination between sequence variants. The discovery of intragenomic variability coupled with strong purifying selection in microsporidian rRNA sequences supports the hypothesis that concerted evolution is reduced when copies of a gene are dispersed rather than repeated tandemly. The presence of intragenomic variability also renders the use of rDNA sequences for barcoding microsporidia questionable. Evidence of recombination in the single-copy genes of putatively asexual microsporidia suggests that these species may undergo cryptic sexual reproduction, a

  1. Diversity and recombination of dispersed ribosomal DNA and protein coding genes in microsporidia.

    PubMed

    Ironside, Joseph Edward

    2013-01-01

    Microsporidian strains are usually classified on the basis of their ribosomal DNA (rDNA) sequences. Although rDNA occurs as multiple copies, in most non-microsporidian species copies within a genome occur as tandem arrays and are homogenised by concerted evolution. In contrast, microsporidian rDNA units are dispersed throughout the genome in some species, and on this basis are predicted to undergo reduced concerted evolution. Furthermore many microsporidian species appear to be asexual and should therefore exhibit reduced genetic diversity due to a lack of recombination. Here, DNA sequences are compared between microsporidia with different life cycles in order to determine the effects of concerted evolution and sexual reproduction upon the diversity of rDNA and protein coding genes. Comparisons of cloned rDNA sequences between microsporidia of the genus Nosema with different life cycles provide evidence of intragenomic variability coupled with strong purifying selection. This suggests a birth and death process of evolution. However, some concerted evolution is suggested by clustering of rDNA sequences within species. Variability of protein-coding sequences indicates that considerable intergenomic variation also occurs between microsporidian cells within a single host. Patterns of variation in microsporidian DNA sequences indicate that additional diversity is generated by intragenomic and/or intergenomic recombination between sequence variants. The discovery of intragenomic variability coupled with strong purifying selection in microsporidian rRNA sequences supports the hypothesis that concerted evolution is reduced when copies of a gene are dispersed rather than repeated tandemly. The presence of intragenomic variability also renders the use of rDNA sequences for barcoding microsporidia questionable. Evidence of recombination in the single-copy genes of putatively asexual microsporidia suggests that these species may undergo cryptic sexual reproduction, a

  2. Setting standards for DNA banks: toward a model code of conduct.

    PubMed

    McEwen, J E; Reilly, P R

    1996-01-01

    As genomic research proliferates, DNA banking will become more common. In research, samples will be banked largely in an effort to find and clone genes that predispose to disease. Commercially oriented banks, those that offer services to families, may also become more common. These entities will hold sensitive information. DNA banking is not yet regulated. We argue here that new laws are not needed at this time to regulate DNA banking. We suggest an approach that relies on a professional code of conduct and draws on principles of disclosure inherent to the process used in obtaining informed consent. In addition to suggesting 12 specific recommendations for the code of conduct, we suggest that items should be included in depositor's agreements. We offer a rationale for our suggestions.

  3. Novel Bacterial Lipoprotein Structures Conserved in Low-GC Content Gram-positive Bacteria Are Recognized by Toll-like Receptor 2*

    PubMed Central

    Kurokawa, Kenji; Ryu, Kyoung-Hwa; Ichikawa, Rie; Masuda, Akiko; Kim, Min-Su; Lee, Hanna; Chae, Jun-Ho; Shimizu, Takashi; Saitoh, Tatsuya; Kuwano, Koichi; Akira, Shizuo; Dohmae, Naoshi; Nakayama, Hiroshi; Lee, Bok Luel

    2012-01-01

    Bacterial lipoproteins/lipopeptides inducing host innate immune responses are sensed by mammalian Toll-like receptor 2 (TLR2). These bacterial lipoproteins are structurally divided into two groups, diacylated or triacylated lipoproteins, by the absence or presence of an amide-linked fatty acid. The presence of diacylated lipoproteins has been predicted in low-GC content Gram-positive bacteria and mycoplasmas based on the absence of one modification enzyme in their genomes; however, we recently determined triacylated structures in low-GC Gram-positive Staphylococcus aureus, raising questions about the actual lipoprotein structure in other low-GC content Gram-positive bacteria. Here, through intensive MS analyses, we identified a novel and unique bacterial lipoprotein structure containing an N-acyl-S-monoacyl-glyceryl-cysteine (named the lyso structure) from low-GC Gram-positive Enterococcus faecalis, Bacillus cereus, Streptococcus sanguinis, and Lactobacillus bulgaricus. Two of the purified native lyso-form lipoproteins induced proinflammatory cytokine production from mice macrophages in a TLR2-dependent and TLR1-independent manner but with a different dependence on TLR6. Additionally, two other new lipoprotein structures were identified. One is the “N-acetyl” lipoprotein structure containing N-acetyl-S-diacyl-glyceryl-cysteine, which was found in five Gram-positive bacteria, including Bacillus subtilis. The N-acetyl lipoproteins induced the proinflammatory cytokines through the TLR2/6 heterodimer. The other was identified in a mycoplasma strain and is an unusual diacyl lipoprotein structure containing two amino acids before the lipid-modified cysteine residue. Taken together, our results suggest the existence of novel TLR2-stimulating lyso and N-acetyl forms of lipoproteins that are conserved in low-GC content Gram-positive bacteria and provide clear evidence for the presence of yet to be identified key enzymes involved in the bacterial lipoprotein biosynthesis

  4. Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.

    PubMed

    Siddharthan, Rahul

    2006-03-16

    Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.

  5. Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy.

    PubMed

    Miller-Delaney, Suzanne F C; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C; Bray, Isabella M; Reynolds, James P; Gwinn, Ryder; Stallings, Raymond L; Henshall, David C

    2015-03-01

    Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. © The

  6. A molecular bar-coded DNA repair resource for pooled toxicogenomic screens.

    PubMed

    Rooney, John P; Patil, Ashish; Zappala, Maria R; Conklin, Douglas S; Cunningham, Richard P; Begley, Thomas J

    2008-11-01

    DNA damage from exogenous and endogenous sources can promote mutations and cell death. Fortunately, cells contain DNA repair and damage signaling pathways to reduce the mutagenic and cytotoxic effects of DNA damage. The identification of specific DNA repair proteins and the coordination of DNA repair pathways after damage has been a central theme to the field of genetic toxicology and we have developed a tool for use in this area. We have produced 99 molecular bar-coded Escherichia coli gene-deletion mutants specific to DNA repair and damage signaling pathways, and each bar-coded mutant can be tracked in pooled format using bar-code specific microarrays. Our design adapted bar-codes developed for the Saccharomyces cerevisiae gene-deletion project, which allowed us to utilize an available microarray product for pooled gene-exposure studies. Microarray-based screens were used for en masse identification of individual mutants sensitive to methyl methanesulfonate (MMS). As expected, gene-deletion mutants specific to direct, base excision, and recombinational DNA repair pathways were identified as MMS-sensitive in our pooled assay, thus validating our resource. We have demonstrated that molecular bar-codes designed for S. cerevisiae are transferable to E. coli, and that they can be used with pre-existing microarrays to perform competitive growth experiments. Further, when comparing microarray to traditional plate-based screens both overlapping and distinct results were obtained, which is a novel technical finding, with discrepancies between the two approaches explained by differences in output measurements (DNA content versus cell mass). The microarray-based classification of Deltatag and DeltadinG cells as depleted after MMS exposure, contrary to plate-based methods, led to the discovery that Deltatag and DeltadinG cells show a filamentation phenotype after MMS exposure, thus accounting for the discrepancy. A novel biological finding is the observation that while

  7. Heterogeneous base distribution in mitochondrial DNA of Neurospora crassa.

    PubMed Central

    Terpstra, P; Holtrop, M; Kroon, A

    1977-01-01

    The mitochondrial DNA of Neurospora crassa has a heterogeneous intramolecular base distribution. A contiguous piece, representing at least 30% of the total genome, has a G+C content that is 6% lower than the overall G+C content of the DNA. The genes for both ribosomal RNAs are contained in the remaining, relatively G+C rich, part of the genome. PMID:141040

  8. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  9. Correcting sequencing errors in DNA coding regions using a dynamic programming approach.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1995-04-01

    This paper presents an algorithm for detecting and 'correcting' sequencing errors that occur in DNA coding regions. The types of sequencing errors addressed are insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. This would permit improved sequencing efficiency and reduce genome sequencing costs. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of 'neutral' bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. We have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. Preliminary test results have shown the usefulness of this algorithm and also exhibited some of its weakness, providing possible directions for further improvement. On a test set consisting of 68 human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the 'corrected' sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the 'corrupted' sequences using standard GRAIL II method (version 1.2).(ABSTRACT TRUNCATED AT 250 WORDS)

  10. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  11. Correcting sequencing errors in DNA coding regions using a dynamic programming approach

    SciTech Connect

    Xu, Y.; Mural, R.J.; Uberbacher, E.C.

    1994-12-01

    This paper presents an algorithm for detecting and ``correcting`` sequencing errors that occur in DNA coding regions. The types of sequencing error addressed include insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of ``neutral`` bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. The authors have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. On a test set consisting of 68 Human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the ``corrected`` sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the ``corrupted`` sequences using standard GRAIL II method. The method uses a dynamic programming algorithm, and runs in time and space linear to the size of the input sequence.

  12. Specificity-Determining DNA Triplet Code for Positioning of Human Preinitiation Complex

    NASA Astrophysics Data System (ADS)

    Goldshtein, Matan; Lukatsky, David B.

    2017-05-01

    The notion that transcription factors bind DNA only through specific, consensus binding sites has been recently questioned. In a pioneering study by Pugh and Venters no specific consensus motif for the positioning of the human pre-initiation complex (PIC) has been identified. Here, we reveal that nonconsensus, statistical, DNA triplet code provides specificity for the positioning of the human PIC. In particular, we reveal a highly non-random, statistical pattern of repetitive nucleotide triplets that correlates with the genome-wide binding preferences of PIC measured by Chip-exo. We analyze the triplet enrichment and depletion near the transcription start site (TSS) and identify triplets that have the strongest effect on PIC-DNA nonconsensus binding. Our results constitute a proof-of-concept for a new design principle for protein-DNA recognition in the human genome, which can lead to a better mechanistic understanding of transcriptional regulation.

  13. Role of GC-biased mutation pressure on synonymous codon choice in Micrococcus luteus, a bacterium with a high genomic GC-content.

    PubMed Central

    Ohama, T; Muto, A; Osawa, S

    1990-01-01

    The GC (G + C, or G or C)-contents of codon silent positions in all two-codon sets and three codons AUY/A (IIe), and in most of the family boxes of Micrococcus luteus (genomic GC-content: 74%) are 95% to 100% in both the highly and weakly expressed genes. In some family boxes, there is a decrease in NNC codons and an increase in NNG codons from the highly expressed to weakly expressed genes without apparent involvement of NNU and NNA codons. From these observations, we conclude that the selective use of synonymous codons in M. luteus may be largely determined by GC-biased mutation pressure and that in the highly expressed genes tRNAs would act as a weak selection pressure in some family boxes. Available data suggest that the effect of selection pressure by tRNAs on the synonymous codon choice becomes more apparent in the highly expressed genes in eubacteria with intermediate GC-contents such as Escherichia coli and Bacillus subtilis, and that the U/C ratio of the codon third positions in NNU/C-type two-codon sets in the weakly expressed genes would represent the approximate magnitude of directional mutation pressure throughout eubacteria. PMID:2326195

  14. A Conserved Structural Signature of the Homeobox Coding DNA in HOX genes.

    PubMed

    Fongang, Bernard; Kong, Fanping; Negi, Surendra; Braun, Werner; Kudlicki, Andrzej

    2016-10-14

    The homeobox encodes a DNA-binding domain found in transcription factors regulating key developmental processes. The most notable examples of homeobox containing genes are the Hox genes, arranged on chromosomes in the same order as their expression domains along the body axis. The mechanisms responsible for the synchronous regulation of Hox genes and the molecular function of their colinearity remain unknown. Here we report the discovery of a conserved structural signature of the 180-base pair DNA fragment comprising the homeobox. We demonstrate that the homeobox DNA has a characteristic 3-base-pair periodicity in the hydroxyl radical cleavage pattern. This periodic pattern is significant in most of the 39 mammalian Hox genes and in other homeobox-containing transcription factors. The signature is present in segmented bilaterian animals as evolutionarily distant as humans and flies. It remains conserved despite the fact that it would be disrupted by synonymous mutations, which raises the possibility of evolutionary selective pressure acting on the structure of the coding DNA. The homeobox coding DNA may therefore have a secondary function, possibly as a regulatory element. The existence of such element may have important consequences for understanding how these genes are regulated.

  15. Specificity-Determining DNA Triplet Code for Positioning of Human Preinitiation Complex.

    PubMed

    Goldshtein, Matan; Lukatsky, David B

    2017-05-23

    The notion that transcription factors bind DNA only through specific, consensus binding sites has been recently questioned. No specific consensus motif for the positioning of the human preinitiation complex (PIC) has been identified. Here, we reveal that nonconsensus, statistical, DNA triplet code provides specificity for the positioning of the human PIC. In particular, we reveal a highly nonrandom, statistical pattern of repetitive nucleotide triplets that correlates with the genomewide binding preferences of PIC measured by Chip-exo. We analyze the triplet enrichment and depletion near the transcription start site and identify triplets that have the strongest effect on PIC-DNA nonconsensus binding. Using statistical mechanics, a random-binder model without fitting parameters, with genomic DNA sequence being the only input, we further validate that the nonconsensus nucleotide triplet code constitutes a key signature providing PIC binding specificity in the human genome. Our results constitute a proof-of-concept for, to our knowledge, a new design principle for protein-DNA recognition in the human genome, which can lead to a better mechanistic understanding of transcriptional regulation. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  16. A Conserved Structural Signature of the Homeobox Coding DNA in HOX genes

    PubMed Central

    Fongang, Bernard; Kong, Fanping; Negi, Surendra; Braun, Werner; Kudlicki, Andrzej

    2016-01-01

    The homeobox encodes a DNA-binding domain found in transcription factors regulating key developmental processes. The most notable examples of homeobox containing genes are the Hox genes, arranged on chromosomes in the same order as their expression domains along the body axis. The mechanisms responsible for the synchronous regulation of Hox genes and the molecular function of their colinearity remain unknown. Here we report the discovery of a conserved structural signature of the 180-base pair DNA fragment comprising the homeobox. We demonstrate that the homeobox DNA has a characteristic 3-base-pair periodicity in the hydroxyl radical cleavage pattern. This periodic pattern is significant in most of the 39 mammalian Hox genes and in other homeobox-containing transcription factors. The signature is present in segmented bilaterian animals as evolutionarily distant as humans and flies. It remains conserved despite the fact that it would be disrupted by synonymous mutations, which raises the possibility of evolutionary selective pressure acting on the structure of the coding DNA. The homeobox coding DNA may therefore have a secondary function, possibly as a regulatory element. The existence of such element may have important consequences for understanding how these genes are regulated. PMID:27739488

  17. Differentiating the Protein Coding and Noncoding RNA Segments of DNA Using Shannon Entropy

    NASA Astrophysics Data System (ADS)

    Mazaheri, P.; Shirazi, A. H.; Saeedi, N.; Reza Jafari, G.; Sahimi, Muhammad

    The complexity of DNA sequences is evaluated in order to differentiate between protein-coding and noncoding RNA segments. The method is based on computing the Shannon entropy of the sequences. By comparing the entropy of the original sequence with that of its shuffled one, we identify the source of the difference between the two segments and their relative contributions to the sequence. To demonstrate the method, the DNA sequences of the bacterium Clostridium difficile 630 (G + C = 29.1%) and Bdellovibrio bacteriovorus (G + C = 50.6%) are analyzed, which are representatives of bacteria with unbalanced and balanced nucleotide content, respectively. It is shown that in both bacteria, regardless of nucleotide content, ΔrS — the relative difference of the two entropies — is significantly greater in protein-coding regions, when compared with noncoding RNA segments.

  18. Long non-coding RNA PARTICLE bridges histone and DNA methylation.

    PubMed

    O'Leary, Valerie Bríd; Hain, Sarah; Maugg, Doris; Smida, Jan; Azimzadeh, Omid; Tapio, Soile; Ovsepian, Saak Victor; Atkinson, Michael John

    2017-05-11

    PARTICLE (Gene PARTICL- 'Promoter of MAT2A-Antisense RadiaTion Induced Circulating LncRNA) expression is transiently elevated following low dose irradiation typically encountered in the workplace and from natural sources. This long non-coding RNA recruits epigenetic silencers for cis-acting repression of its neighbouring Methionine adenosyltransferase 2A gene. It now emerges that PARTICLE operates as a trans-acting mediator of DNA and histone lysine methylation. Chromatin immunoprecipitation sequencing (ChIP-seq) and immunological evidence established elevated PARTICLE expression linked to increased histone 3 lysine 27 trimethylation. Live-imaging of dbroccoli-PARTICLE revealing its dynamic association with DNA methyltransferase 1 was confirmed by flow cytometry, immunoprecipitation and direct competitive binding interaction through electrophoretic mobility shift assay. Acting as a regulatory docking platform, the long non-coding RNA PARTICLE serves to interlink epigenetic modification machineries and represents a compelling innovative component necessary for gene silencing on a global scale.

  19. Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes (Postprint)

    DTIC Science & Technology

    2007-01-01

    reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07

  20. DNA methylation of miRNA coding sequences putatively associated with childhood obesity.

    PubMed

    Mansego, M L; Garcia-Lacarte, M; Milagro, F I; Marti, A; Martinez, J A

    2017-02-01

    Epigenetic mechanisms may be involved in obesity onset and its consequences. The aim of the present study was to evaluate whether DNA methylation status in microRNA (miRNA) coding regions is associated with childhood obesity. DNA isolated from white blood cells of 24 children (identification sample: 12 obese and 12 non-obese) from the Grupo Navarro de Obesidad Infantil study was hybridized in a 450 K methylation microarray. Several CpGs whose DNA methylation levels were statistically different between obese and non-obese were validated by MassArray® in 95 children (validation sample) from the same study. Microarray analysis identified 16 differentially methylated CpGs between both groups (6 hypermethylated and 10 hypomethylated). DNA methylation levels in miR-1203, miR-412 and miR-216A coding regions significantly correlated with body mass index standard deviation score (BMI-SDS) and explained up to 40% of the variation of BMI-SDS. The network analysis identified 19 well-defined obesity-relevant biological pathways from the KEGG database. MassArray® validation identified three regions located in or near miR-1203, miR-412 and miR-216A coding regions differentially methylated between obese and non-obese children. The current work identified three CpG sites located in coding regions of three miRNAs (miR-1203, miR-412 and miR-216A) that were differentially methylated between obese and non-obese children, suggesting a role of miRNA epigenetic regulation in childhood obesity. © 2016 World Obesity Federation.

  1. Junk DNA and the long non-coding RNA twist in cancer genetics

    PubMed Central

    Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A

    2015-01-01

    The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839

  2. HyDEn: a hybrid steganocryptographic approach for data encryption using randomized error-correcting DNA codes.

    PubMed

    Tulpan, Dan; Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

    2013-01-01

    This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach.

  3. HyDEn: A Hybrid Steganocryptographic Approach for Data Encryption Using Randomized Error-Correcting DNA Codes

    PubMed Central

    Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

    2013-01-01

    This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392

  4. A molecular code dictates sequence-specific DNA recognition by homeodomains.

    PubMed Central

    Damante, G; Pellizzari, L; Esposito, G; Fogolari, F; Viglino, P; Fabbro, D; Tell, G; Formisano, S; Di Lauro, R

    1996-01-01

    Most homeodomains bind to DNA sequences containing the motif 5'-TAAT-3'. The homeodomain of thyroid transcription factor 1 (TTF-1HD) binds to sequences containing a 5'-CAAG-3' core motif, delineating a new mechanism for differential DNA recognition by homeodomains. We investigated the molecular basis of the DNA binding specificity of TTF-1HD by both structural and functional approaches. As already suggested by the three-dimensional structure of TTF-1HD, the DNA binding specificities of the TTF-1, Antennapedia and Engrailed homeodomains, either wild-type or mutants, indicated that the amino acid residue in position 54 is involved in the recognition of the nucleotide at the 3' end of the core motif 5'-NAAN-3'. The nucleotide at the 5' position of this core sequence is recognized by the amino acids located in position 6, 7 and 8 of the TTF-1 and Antennapedia homeodomains. These data, together with previous suggestions on the role of amino acids in position 50, indicate that the DNA binding specificity of homeodomains can be determined by a combinatorial molecular code. We also show that some specific combinations of the key amino acid residues involved in DNA recognition do not follow a simple, additive rule. Images PMID:8890172

  5. A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding

    NASA Astrophysics Data System (ADS)

    Jin, Xin; Nie, Rencan; Zhou, Dongming; Yao, Shaowen; Chen, Yanyan; Yu, Jiefu; Wang, Quan

    2016-11-01

    A novel method for the calculation of DNA sequence similarity is proposed based on simplified pulse-coupled neural network (S-PCNN) and Huffman coding. In this study, we propose a coding method based on Huffman coding, where the triplet code was used as a code bit to transform DNA sequence into numerical sequence. The proposed method uses the firing characters of S-PCNN neurons in DNA sequence to extract features. Besides, the proposed method can deal with different lengths of DNA sequences. First, according to the characteristics of S-PCNN and the DNA primary sequence, the latter is encoded using Huffman coding method, and then using the former, the oscillation time sequence (OTS) of the encoded DNA sequence is extracted. Simultaneously, relevant features are obtained, and finally the similarities or dissimilarities of the DNA sequences are determined by Euclidean distance. In order to verify the accuracy of this method, different data sets were used for testing. The experimental results show that the proposed method is effective.

  6. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  7. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  8. [Molecular evolution of MHC DQA genes. I. The maintenance of interallelic divergence and the influence of GC content on gene structure].

    PubMed

    Pan, X; Fu, J

    1997-06-01

    The analyses of the proportion of synonymous and missense nucleotide substitution (PS and PN) in different exons, antigen recognition sites (ARS) and non-ARS of EN2 (NAEN2) of 23 alleles at MHC DQA loci in 7 mammal species gave rise to the following findings. (1) PN was about twice as much as PS in ARS among the alleles at DQA1 of any given species, i.e. 7 alleles at HLA-DQA1 or 8 alleles at IaAa this accords with overdominant selection; (2) PS showed more or less the same as PN in ARS among different loci (DQA1 or DQA2 in different species, or DQA1 and DQA2 in one species) or NAEN2 of all comparative pairs, this conforms the expectation of neutral selection; (3) In exon4 and exon3, not only was the substitution proportion extremely low, but also PS was much higher than PN (the ratio PS over PN is 19.5 in alleles at IaAa of mouse and 4 among alleles at different loci), this coincides obviously with purification selection. The analysis of GC content of MHC DQA showed that its peaks were in the regions corresponding to the middle bulks of some domains, that the highest and constant level was in exon4 and that GC content in the third codon position (GC III content) associates inversely with PS. These results indicate that the specified maintenance mechanisms of interallelic diversity relevant to their functions exist in given exons corresponding to some domains of the same MHC DQA locus and GC III content is an important factor in keeping the structure and function of gene under selection constraint. The method for estimating nucleotide substitution proportion was modified.

  9. Bio-bar-code functionalized magnetic nanoparticle label for ultrasensitive flow injection chemiluminescence detection of DNA hybridization.

    PubMed

    Bi, Sai; Zhou, Hong; Zhang, Shusheng

    2009-10-07

    A signal amplification strategy based on bio-bar-code functionalized magnetic nanoparticles as labels holds promise to improve the sensitivity and detection limit of the detection of DNA hybridization and single-nucleotide polymorphisms by flow injection chemiluminescence assays.

  10. Characterization of the cDNA and gene coding for the biotin synthase of Arabidopsis thaliana.

    PubMed Central

    Weaver, L M; Yu, F; Wurtele, E S; Nikolau, B J

    1996-01-01

    Biotin, an essential cofactor, is synthesized de novo only by plants and some microbes. An Arabidopsis thaliana expressed sequence tag that shows sequence similarity to the carboxyl end of biotin synthase from Escherichia coli was used to isolate a near-full-length cDNA. This cDNA was shown to code for the Arabidopsis biotin synthase by its ability to complement a bioB mutant of E. coli. Site-specific mutagenesis indicates that residue threonine-173, which is highly conserved in biotin synthases, is important for catalytic competence of the enzyme. The primary sequence of the Arabidopsis biotin synthase is most similar to biotin synthases from E. coli, Serratia marcescens, and Saccharomyces cerevisiae (about 50% sequence identity) and more distantly related to the Bacillus sphaericus enzyme (33% sequence identity). The primary sequence of the amino terminus of the Arabidopsis biotin synthase may represent an organelle-targeting transit peptide. The single Arabidopsis gene coding for biotin synthase, BIO2, was isolated and sequenced. The biotin synthase coding sequence is interrupted by five introns. The gene sequence upstream of the translation start site has several unusual features, including imperfect palindromes and polypyrimidine sequences, which may function in the transcriptional regulation of the BIO2 gene. PMID:8819873

  11. Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors

    PubMed Central

    Liu, Jiajian; Stormo, Gary D.

    2008-01-01

    Motivation: Modeling and identifying the DNA-protein recognition code is one of the most challenging problems in computational biology. Several quantitative methods have been developed to model DNA-protein interactions with specific focus on the C2H2 zinc-finger proteins, the largest transcription factor family in eukaryotic genomes. In many cases, they performed well. But the overall the predictive accuracy of these methods is still limited. One of the major reasons is all these methods used weight matrix models to represent DNA-protein interactions, assuming all base-amino acid contacts contribute independently to the total free energy of binding. Results: We present a context-dependent model for DNA–zinc-finger protein interactions that allows us to identify inter-positional dependencies in the DNA recognition code for C2H2 zinc-finger proteins. The degree of non-independence was detected by comparing the linear perceptron model with the non-linear neural net (NN) model for their predictions of DNA–zinc-finger protein interactions. This dependency is supported by the complex base-amino acid contacts observed in DNA–zinc-finger interactions from structural analyses. Using extensive published qualitative and quantitative experimental data, we demonstrated that the context-dependent model developed in this study can significantly improves predictions of DNA binding profiles and free energies of binding for both individual zinc fingers and proteins with multiple zinc fingers when comparing to previous positional-independent models. This approach can be extended to other protein families with complex base-amino acid residue interactions that would help to further understand the transcriptional regulation in eukaryotic genomes. Availability:The software implemented as c programs and are available by request. http://ural.wustl.edu/softwares.html Contact: stormo@ural.wustl.edu PMID:18586699

  12. DNA methylation patterns of protein coding genes and long noncoding RNAs in female schizophrenic patients.

    PubMed

    Liao, Qi; Wang, Yunliang; Cheng, Jia; Dai, Dongjun; Zhou, Xingyu; Zhang, Yuzheng; Gao, Shugui; Duan, Shiwei

    2015-02-01

    Schizophrenia (SCZ) is a complex mental disorder contributed by both genetic and epigenetic factors. Long noncoding RNAs (lncRNAs) was recently found playing an important regulatory role in mental disorders. However, little was known about the DNA methylation of lncRNAs, although numerous SCZ studies have been performed on genetic polymorphisms or epigenetic marks in protein coding genes. We presented a comprehensive genome wide DNA methylation study of both protein coding genes and lncRNAs in female patients with paranoid and undifferentiated SCZ. Using the methyl-CpG binding domain (MBD) protein-enriched genome sequencing (MBD-seq), 8,163 and 764 peaks were identified in paranoid and undifferentiated SCZ, respectively (p < 1 × 10-5). Gene ontology analysis showed that the hypermethylated regions were enriched in the genes related to neuron system and brain for both paranoid and undifferentiated SCZ (p < 0.05). Among these peaks, 121 peaks were located in gene promoter regions that might affect gene expression and influence the SCZ related pathways. Interestingly, DNA methylation of 136 and 23 known lncRNAs in Refseq database were identified in paranoid and undifferentiated SCZ, respectively. In addition, ∼20% of intergenic peaks annotated based on Refseq genes were overlapped with lncRNAs in UCSC and gencode databases. In order to show the results well for most biological researchers, we created an online database to display and visualize the information of DNA methyation peaks in both types of SCZ (http://www.bioinfo.org/scz/scz.htm). Our results showed that the aberrant DNA methylation of lncRNAs might be another important epigenetic factor for SCZ.

  13. DNA strand breaks induced by electrons simulated with Nanodosimetry Monte Carlo Simulation Code: NASIC.

    PubMed

    Li, Junli; Li, Chunyan; Qiu, Rui; Yan, Congchong; Xie, Wenzhang; Wu, Zhen; Zeng, Zhi; Tung, Chuanjong

    2015-09-01

    The method of Monte Carlo simulation is a powerful tool to investigate the details of radiation biological damage at the molecular level. In this paper, a Monte Carlo code called NASIC (Nanodosimetry Monte Carlo Simulation Code) was developed. It includes physical module, pre-chemical module, chemical module, geometric module and DNA damage module. The physical module can simulate physical tracks of low-energy electrons in the liquid water event-by-event. More than one set of inelastic cross sections were calculated by applying the dielectric function method of Emfietzoglou's optical-data treatments, with different optical data sets and dispersion models. In the pre-chemical module, the ionised and excited water molecules undergo dissociation processes. In the chemical module, the produced radiolytic chemical species diffuse and react. In the geometric module, an atomic model of 46 chromatin fibres in a spherical nucleus of human lymphocyte was established. In the DNA damage module, the direct damages induced by the energy depositions of the electrons and the indirect damages induced by the radiolytic chemical species were calculated. The parameters should be adjusted to make the simulation results be agreed with the experimental results. In this paper, the influence study of the inelastic cross sections and vibrational excitation reaction on the parameters and the DNA strand break yields were studied. Further work of NASIC is underway. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. DANIO-CODE: Toward an Encyclopedia of DNA Elements in Zebrafish.

    PubMed

    Tan, Haihan; Onichtchouk, Daria; Winata, Cecilia

    2016-02-01

    The zebrafish has emerged as a model organism for genomics studies. The symposium "Toward an encyclopedia of DNA elements in zebrafish" held in London in December 2014, was coorganized by Ferenc Müller and Fiona Wardle. This meeting is a follow-up of a similar previous workshop held 2 years earlier and represents a push toward the formalization of a community effort to annotate functional elements in the zebrafish genome. The meeting brought together zebrafish researchers, bioinformaticians, as well as members of established consortia, to exchange scientific findings and experience, as well as to discuss the initial steps toward the formation of a DANIO-CODE consortium. In this study, we provide the latest updates on the current progress of the consortium's efforts, opening up a broad invitation to researchers to join in and contribute to DANIO-CODE.

  15. DANIO-CODE: Toward an Encyclopedia of DNA Elements in Zebrafish

    PubMed Central

    2016-01-01

    Abstract The zebrafish has emerged as a model organism for genomics studies. The symposium “Toward an encyclopedia of DNA elements in zebrafish” held in London in December 2014, was coorganized by Ferenc Müller and Fiona Wardle. This meeting is a follow-up of a similar previous workshop held 2 years earlier and represents a push toward the formalization of a community effort to annotate functional elements in the zebrafish genome. The meeting brought together zebrafish researchers, bioinformaticians, as well as members of established consortia, to exchange scientific findings and experience, as well as to discuss the initial steps toward the formation of a DANIO-CODE consortium. In this study, we provide the latest updates on the current progress of the consortium's efforts, opening up a broad invitation to researchers to join in and contribute to DANIO-CODE. PMID:26671609

  16. Quartz crystal microbalance detection of DNA single-base mutation based on monobase-coded cadmium tellurium nanoprobe.

    PubMed

    Zhang, Yuqin; Lin, Fanbo; Zhang, Youyu; Li, Haitao; Zeng, Yue; Tang, Hao; Yao, Shouzhuo

    2011-01-01

    A new method for the detection of point mutation in DNA based on the monobase-coded cadmium tellurium nanoprobes and the quartz crystal microbalance (QCM) technique was reported. A point mutation (single-base, adenine, thymine, cytosine, and guanine, namely, A, T, C and G, mutation in DNA strand, respectively) DNA QCM sensor was fabricated by immobilizing single-base mutation DNA modified magnetic beads onto the electrode surface with an external magnetic field near the electrode. The DNA-modified magnetic beads were obtained from the biotin-avidin affinity reaction of biotinylated DNA and streptavidin-functionalized core/shell Fe(3)O(4)/Au magnetic nanoparticles, followed by a DNA hybridization reaction. Single-base coded CdTe nanoprobes (A-CdTe, T-CdTe, C-CdTe and G-CdTe, respectively) were used as the detection probes. The mutation site in DNA was distinguished by detecting the decreases of the resonance frequency of the piezoelectric quartz crystal when the coded nanoprobe was added to the test system. This proposed detection strategy for point mutation in DNA is proved to be sensitive, simple, repeatable and low-cost, consequently, it has a great potential for single nucleotide polymorphism (SNP) detection.

  17. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

    PubMed

    Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

    2012-01-01

    Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.

  18. Isolation of cDNA clones coding for the beta subunit of human beta-hexosaminidase.

    PubMed Central

    O'Dowd, B F; Quan, F; Willard, H F; Lamhonwah, A M; Korneluk, R G; Lowden, J A; Gravel, R A; Mahuran, D J

    1985-01-01

    The major forms of beta-hexosaminidase (2-acetamido-2-deoxy-beta-D-glucoside acetamidodeoxyglucohydrolase, EC 3.2.1.30) occur as multimers of alpha and beta chains--hexosaminidase A (alpha beta a beta b) and hexosaminidase B 2(beta a beta b). To facilitate the investigation of beta-chain biosynthesis and the nature of mutation in Sandhoff disease, a human hexosaminidase beta-chain cDNA clone was isolated. Hexosaminidase B (10 mg) was treated with CNBr, five peptide fragments were isolated by reverse-phase HPLC, and their amino acid sequences were determined. One of these contained a string of six amino acids from which an oligonucleotide probe was defined. The simian virus 40-transformed human fibroblast cDNA library of Okayama and Berg was screened by colony hybridization with the radiolabeled probe. Thirteen probe-binding clones were selected out of 50,000 clones screened. Four of these designated pHex were shown to be identical at their 3' ends by restriction enzyme mapping, differing only in their 5' extensions (1.4-1.7 kilobases). The nucleotide sequence of a 174-base-pair segment contained the deduced amino acid sequence of two of the five CNBr peptides, indicating that the pHex clones encode the beta subunit of hexosaminidase. In addition, pHex cDNA was found homologous to multiple bands in digests of genomic human DNA totaling 43 kilobases (kb), all of which were mapped to chromosome 5 in somatic cell hybrids, as expected of the HEXB gene. The pHex cDNA also hybridized to a 2.2-kilobase RNA that apparently codes for the pre-beta-polypeptide of hexosaminidase. This RNA species was absent in the fibroblasts of one of three patients with Sandhoff disease examined. We anticipate that these clones will be of value to diagnosis and carrier detection of Sandhoff disease in affected families. Images PMID:2579389

  19. Coding region SNP analysis to enhance dog mtDNA discrimination power in forensic casework.

    PubMed

    Verscheure, Sophie; Backeljau, Thierry; Desmyter, Stijn

    2015-01-01

    The high population frequencies of three control region haplotypes contribute to the low discrimination power of the dog mtDNA control region. It also diminishes the evidential power of a match with one of these haplotypes in forensic casework. A mitochondrial genome study of 214 Belgian dogs suggested 26 polymorphic coding region sites that successfully resolved dogs with the three most frequent control region haplotypes. In this study, three SNP assays were developed to determine the identity of the 26 informative sites. The control region of 132 newly sampled dogs was sequenced and added to the study of 214 dogs. The assays were applied to 58 dogs of the haplotypes of interest, which confirmed their suitability for enhancing dog mtDNA discrimination power. In the Belgian population study of 346 dogs, the set of 26 sites divided the dogs into 25 clusters of mtGenome sequences with substantially lower population frequency estimates than their control region sequences. In case of a match with one of the three control region haplotypes, using these three SNP assays in conjunction with control region sequencing would augment the exclusion probability of dog mtDNA analysis from 92.9% to 97.0%. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  20. Functional Intersection of ATM and DNA-Dependent Protein Kinase Catalytic Subunit in Coding End Joining during V(D)J Recombination

    PubMed Central

    Lee, Baeck-Seung; Gapud, Eric J.; Zhang, Shichuan; Dorsett, Yair; Bredemeyer, Andrea; George, Rosmy; Callen, Elsa; Daniel, Jeremy A.; Osipovich, Oleg; Oltz, Eugene M.; Bassing, Craig H.; Nussenzweig, Andre; Lees-Miller, Susan; Hammel, Michal; Chen, Benjamin P. C.

    2013-01-01

    V(D)J recombination is initiated by the RAG endonuclease, which introduces DNA double-strand breaks (DSBs) at the border between two recombining gene segments, generating two hairpin-sealed coding ends and two blunt signal ends. ATM and DNA-dependent protein kinase catalytic subunit (DNA-PKcs) are serine-threonine kinases that orchestrate the cellular responses to DNA DSBs. During V(D)J recombination, ATM and DNA-PKcs have unique functions in the repair of coding DNA ends. ATM deficiency leads to instability of postcleavage complexes and the loss of coding ends from these complexes. DNA-PKcs deficiency leads to a nearly complete block in coding join formation, as DNA-PKcs is required to activate Artemis, the endonuclease that opens hairpin-sealed coding ends. In contrast to loss of DNA-PKcs protein, here we show that inhibition of DNA-PKcs kinase activity has no effect on coding join formation when ATM is present and its kinase activity is intact. The ability of ATM to compensate for DNA-PKcs kinase activity depends on the integrity of three threonines in DNA-PKcs that are phosphorylation targets of ATM, suggesting that ATM can modulate DNA-PKcs activity through direct phosphorylation of DNA-PKcs. Mutation of these threonine residues to alanine (DNA-PKcs3A) renders DNA-PKcs dependent on its intrinsic kinase activity during coding end joining, at a step downstream of opening hairpin-sealed coding ends. Thus, DNA-PKcs has critical functions in coding end joining beyond promoting Artemis endonuclease activity, and these functions can be regulated redundantly by the kinase activity of either ATM or DNA-PKcs. PMID:23836881

  1. Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage.

    PubMed Central

    Choo, Y; Klug, A

    1994-01-01

    We have used two selection techniques to study sequence-specific DNA recognition by the zinc finger, a small, modular DNA-binding minidomain. We have chosen zinc fingers because they bind as independent modules and so can be linked together in a peptide designed to bind a predetermined DNA site. In this paper, we describe how a library of zinc fingers displayed on the surface of bacteriophage enables selection of fingers capable of binding to given DNA triplets. The amino acid sequences of selected fingers which bind the same triplet are compared to examine how sequence-specific DNA recognition occurs. Our results can be rationalized in terms of coded interactions between zinc fingers and DNA, involving base contacts from a few alpha-helical positions. In the paper following this one, we describe a complementary technique which confirms the identity of amino acids capable of DNA sequence discrimination from these positions. Images PMID:7972027

  2. Low mitochondrial DNA variation among American alligators and a novel non-coding region in crocodilians.

    PubMed

    Glenn, Travis C; Staton, Joseph L; Vu, Alex T; Davis, Lisa M; Bremer, Jaime R Alvarado; Rhodes, Walter E; Brisbin, I Lehr; Sawyer, Roger H

    2002-12-15

    We analyzed 1317-1823 base pairs (bp) of mitochondrial DNA sequence beginning in the 5' end of cytochrome b (cyt b) and ending in the central domain of the control region for 25 American alligators (Alligator mississippiensis) and compared these to a homologous sequence from a Chinese alligator (A. sinensis). Both species share a non-coding spacer between cyt b and tRNA(Thr). Chinese alligator cyt b differs from that of the American alligator by 17.5% at the nucleotide level and 13.8% for inferred amino acids, which is consistent with their presumed ancient divergence. Only two cyt b haplotypes were detected among the 25 American alligators (693-1199 bp surveyed), with one haplotype shared among 24 individuals. One alligator from Mississippi differed from all other alligators by a single silent substitution. The control region contained only slightly more variation among the 25 American alligators, with two variable positions (624 bp surveyed), yielding three haplotypes with 22, two, and one individuals in each of these groups. Previous genetic studies examining allozymes and the proportion of variable microsatellite DNA loci also found low levels of genetic diversity in American alligators. However, in contrast with allozymes, microsatellites, and morphology, the mtDNA data shows no evidence of differentiation among populations from the extremes of the species range. These results suggest that American alligators underwent a severe population bottleneck in the late Pleistocene, resulting in nearly homogenous mtDNA among all American alligators today. Copyright 2002 Wiley-Liss, Inc.

  3. A molecular genetic analysis of Eragrostis tef (Zucc.) Trotter: non-coding regions of chloroplast DNA, 18S rDNA and the transcription factor VP1.

    PubMed

    Espelund, M; Bekele, E; Holst-Jensen, A; Jakobsen, K S; Nordal, I

    2000-01-01

    The non-coding chloroplast DNA sequences of the trnL (UAA) intron and the trnL-trnF (GAA) intergeneric spacer (IGS), the coding sequences of nuclear 18S rDNA, and the transcription factor Vp1 of the cereal tef (Eragrostis tef (Zucc.) Trotter) were studied. No intraspecific variation was found among the 6 studied tef varieties. However, the study displayed that Eragrostis tef has a number of unique traits compared to other grasses. Phylogenetic analysis of the chloroplast DNA gave three grass clades, joining Eragrostis with sorghum and maize in one. In the analysis of the 18S rDNA sequences, the three grass species were joined in a monophyletic trichotomy in the cladogram, in which maize is the most divergent, rice the least and tef intermediate. The Vp1 is highly conserved. The Vp1 phylogeny showed that the tef Vp1-sequence is the hitherto most divergent Vp1-sequence reported from a grass.

  4. DENV gene of bacteriophage T4 codes for both pyrimidine dimer-DNA glycosylase and apyrimidinic endonuclease activities

    SciTech Connect

    McMillan, S.; Edenberg, H.J.; Radany, E.H.; Friedberg, R.C.; Friedberg, E.C.

    1981-10-01

    Recent studies have shown that purified preparations of phage T4 UV DNA-incising activity (T4 UV endonuclease or endonuclease V of phase T4) contain a pyrimidine dimer-DNA glycosylase activity that catalyzes hydrolysis of the 5' glycosyl bond of dimerized pyrimidines in UV-irradiated DNA. Such enzyme preparations have also been shown to catalyze the hydrolysis of phosphodiester bonds in UV-irradiated DNA at a neutral pH, presumably reflecting the action of an apurinic/apyrimidinic endonuclease at the apyrimidinic sites created by the pyrimidine dimer-DNA glycosylase. In this study we found that preparations of T4 UV DNA-incising activity contained apurinic/apyrimidinic endonuclease activity that nicked depurinated form I simian virus 40 DNA. Apurinic/apyrimidinic endonuclease activity was also found in extracts of Escherichia coli infected with T4 denV/sup +/ phage. Extracts of cells infected with T4 denV mutants contained significantly lower levels of apurinic/apyrimidinic endonuclease activity; these levels were no greater than the levels present in extracts of uninfected cells. Furthermore, the addition of DNA containing UV-irradiated DNA and T4 enzyme resulted in competition for pyrimidine dimer-DNA glycosylase activity against the UV-irradiated DNA. On the basis of these results, we concluded that apurinic/apyrimidinic endonuclease activity is encoded by the denV gene of phage T4, the same gene that codes for pyrimidine dimer-DNA glycosylase activity.

  5. Comparison of Geant4-DNA simulation of S-values with other Monte Carlo codes

    NASA Astrophysics Data System (ADS)

    André, T.; Morini, F.; Karamitros, M.; Delorme, R.; Le Loirec, C.; Campos, L.; Champion, C.; Groetz, J.-E.; Fromm, M.; Bordage, M.-C.; Perrot, Y.; Barberet, Ph.; Bernal, M. A.; Brown, J. M. C.; Deleuze, M. S.; Francis, Z.; Ivanchenko, V.; Mascialino, B.; Zacharatou, C.; Bardiès, M.; Incerti, S.

    2014-01-01

    Monte Carlo simulations of S-values have been carried out with the Geant4-DNA extension of the Geant4 toolkit. The S-values have been simulated for monoenergetic electrons with energies ranging from 0.1 keV up to 20 keV, in liquid water spheres (for four radii, chosen between 10 nm and 1 μm), and for electrons emitted by five isotopes of iodine (131, 132, 133, 134 and 135), in liquid water spheres of varying radius (from 15 μm up to 250 μm). The results have been compared to those obtained from other Monte Carlo codes and from other published data. The use of the Kolmogorov-Smirnov test has allowed confirming the statistical compatibility of all simulation results.

  6. Humans and chimpanzees differ in their cellular response to DNA damage and non-coding sequence elements of DNA repair-associated genes.

    PubMed

    Weis, E; Galetzka, D; Herlyn, H; Schneider, E; Haaf, T

    2008-01-01

    Compared to humans, chimpanzees appear to be less susceptible to many types of cancer. Because DNA repair defects lead to accumulation of gene and chromosomal mutations, species differences in DNA repair are one plausible explanation. Here we analyzed the repair kinetics of human and chimpanzee cells after cisplatin treatment and irradiation. Dot blots for the quantification of single-stranded (ss) DNA repair intermediates revealed a biphasic response of human and chimpanzee lymphoblasts to cisplatin-induced damage. The early phase of DNA repair was identical in both species with a peak of ssDNA intermediates at 1 h after DNA damage induction. However, the late phase differed between species. Human cells showed a second peak of ssDNA intermediates at 6 h, chimpanzee cells at 5 h. One of four analyzed DNA repair-associated genes, UBE2A, was differentially expressed in human and chimpanzee cells at 5 h after cisplatin treatment. Immunofluorescent staining of gammaH2AX foci demonstrated equally high numbers of DNA strand breaks in human and chimpanzee cells at 30 min after irradiation and equally low numbers at 2 h. However, at 1 h chimpanzee cells had significantly less DNA breaks than human cells. Comparative sequence analyses of approximately 100 DNA repair-associated genes in human and chimpanzee revealed 13% and 32% genes, respectively, with evidence for an accelerated evolution in promoter regions and introns. This is strikingly contrasting to the 3% of DNA repair-associated genes with positive selection in the coding sequence. Compared to the rhesus macaque as an outgroup, chimpanzees have a higher accelerated evolution in non-coding sequences than humans. The TRF1-interacting, ankyrin-related ADP-ribose polymerase (TNKS) gene showed an accelerated intraspecific evolution among humans. Our results are consistent with the view that chimpanzee cells repair different types of DNA damage faster than human cells, whereas the overall repair capacity is similar in

  7. cDNA sequence of human transforming gene hst and identification of the coding sequence required for transforming activity

    SciTech Connect

    Taira, M.; Yoshida, T.; Miyagawa, K.; Sakamoto, H.; Terada, M.; Sugimura, T.

    1987-05-01

    The hst gene was originally identified as a transforming gene in DNAs from human stomach cancers and from a noncancerous portion of stomach mucosa by DNA-mediated transfection assay using NIH3T3 cells. cDNA clones of hst were isolated from the cDNA library constructed from poly(A)/sup +/ RNA of a secondary transformant induced by the DNA from a stomach cancer. The sequence analysis of the hst cDNA revealed the presence of two open reading frames. When this cDNA was inserted into an expression vector containing the simian virus 40 promoter, it efficiently induced the transformation of NIH3T3 cells upon transfection. It was found that one of the reading frames, which coded for 206 amino acids, was responsible for the transforming activity.

  8. Mechanism of ultraviolet-induced mutagenesis: the coding properties of ultraviolet-irradiated poly(dC) replicated by E. coli DNA polymerase I.

    PubMed Central

    Lecomte, P; Boiteux, S; Doubleday, O

    1981-01-01

    We have identified three lesions rather than cyclobutane dimers which alter the properties of UV-irradiated poly(dC) as a template for E.coli DNA polymerase I, and have characterised these lesions with respect to their coding properties, rates of formation and decay, and their sensitivity to uracil DNA glycosylase. Our results lead us to conclude that these lesions are (1) cytosine hydrates, which code for cytosine and to a lesser extent thymine, (2) uracil hydrates, which code for adenine and are not sensitive to uracil DNA glycosylase, and (3) uracils, which code for adenine and are removed by uracil DNA glycosylase. PMID:7024915

  9. Detection of coding microsatellite frameshift mutations in DNA mismatch repair-deficient mouse intestinal tumors.

    PubMed

    Woerner, Stefan M; Tosti, Elena; Yuan, Yan P; Kloor, Matthias; Bork, Peer; Edelmann, Winfried; Gebert, Johannes

    2015-11-01

    Different DNA mismatch repair (MMR)-deficient mouse strains have been developed as models for the inherited cancer predisposing Lynch syndrome. It is completely unresolved, whether coding mononucleotide repeat (cMNR) gene mutations in these mice can contribute to intestinal tumorigenesis and whether MMR-deficient mice are a suitable molecular model of human microsatellite instability (MSI)-associated intestinal tumorigenesis. A proof-of-principle study was performed to identify mouse cMNR-harboring genes affected by insertion/deletion mutations in MSI murine intestinal tumors. Bioinformatic algorithms were developed to establish a database of mouse cMNR-harboring genes. A panel of five mouse noncoding mononucleotide markers was used for MSI classification of intestinal matched normal/tumor tissues from MMR-deficient (Mlh1(-/-) , Msh2(-/-) , Msh2(LoxP/LoxP) ) mice. cMNR frameshift mutations of candidate genes were determined by DNA fragment analysis. Murine MSI intestinal tumors but not normal tissues from MMR-deficient mice showed cMNR frameshift mutations in six candidate genes (Elavl3, Tmem107, Glis2, Sdccag1, Senp6, Rfc3). cMNRs of mouse Rfc3 and Elavl3 are conserved in type and length in their human orthologs that are known to be mutated in human MSI colorectal, endometrial and gastric cancer. We provide evidence for the utility of a mononucleotide marker panel for detection of MSI in murine tumors, the existence of cMNR instability in MSI murine tumors, the utility of mouse subspecies DNA for identification of polymorphic repeats, and repeat conservation among some orthologous human/mouse genes, two of them showing instability in human and mouse MSI intestinal tumors. MMR-deficient mice hence are a useful molecular model system for analyzing MSI intestinal carcinogenesis.

  10. Joining mutants of RAG1 and RAG2 that demonstrate impaired interactions with the coding-end DNA.

    PubMed

    Nagawa, Fumikiyo; Hirose, Satoshi; Nishizumi, Hirofumi; Nishihara, Tadashi; Sakano, Hitoshi

    2004-09-10

    In V(D)J joining of antigen receptor genes, two recombination signal sequences (RSSs), 12- and 23-RSSs, form a complex with the protein products of recombination activating genes, RAG1 and RAG2. DNaseI footprinting demonstrates that the interaction of RAG proteins with substrate RSS DNA is not just limited to the signal region but involves the coding sequence as well. Joining mutants of RAG1 and RAG2 demonstrate impaired interactions with the coding region in both pre- and postcleavage type complexes. A possible role of this RAG coding region interaction is discussed in the context of V(D)J recombination.

  11. Analysis of phylogeny and codon usage bias and relationship of GC content, amino acid composition with expression of the structural nif genes.

    PubMed

    Mondal, Sunil Kanti; Kundu, Sudip; Das, Rabindranath; Roy, Sujit

    2016-08-01

    Bacteria and archaea have evolved with the ability to fix atmospheric dinitrogen in the form of ammonia, catalyzed by the nitrogenase enzyme complex which comprises three structural genes nifK, nifD and nifH. The nifK and nifD encodes for the beta and alpha subunits, respectively, of component 1, while nifH encodes for component 2 of nitrogenase. Phylogeny based on nifDHK have indicated that Cyanobacteria is closer to Proteobacteria alpha and gamma but not supported by the tree based on 16SrRNA. The evolutionary ancestor for the different trees was also different. The GC1 and GC2% analysis showed more consistency than GC3% which appeared to below for Firmicutes, Cyanobacteria and Euarchaeota while highest in Proteobacteria beta and clearly showed the proportional effect on the codon usage with a few exceptions. Few genes from Firmicutes, Euryarchaeota, Proteobacteria alpha and delta were found under mutational pressure. These nif genes with low and high GC3% from different classes of organisms showed similar expected number of codons. Distribution of the genes and codons, based on codon usage demonstrated opposite pattern for different orientation of mirror plane when compared with each other. Overall our results provide a comprehensive analysis on the evolutionary relationship of the three structural nif genes, nifK, nifD and nifH, respectively, in the context of codon usage bias, GC content relationship and amino acid composition of the encoded proteins and exploration of crucial statistical method for the analysis of positive data with non-constant variance to identify the shape factors of codon adaptation index.

  12. SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability

    PubMed Central

    Iakovishina, Daria; Janoueix-Lerosey, Isabelle; Barillot, Emmanuel; Regnier, Mireille; Boeva, Valentina

    2016-01-01

    Motivation: Whole genome sequencing of paired-end reads can be applied to characterize the landscape of large somatic rearrangements of cancer genomes. Several methods for detecting structural variants with whole genome sequencing data have been developed. So far, none of these methods has combined information about abnormally mapped read pairs connecting rearranged regions and associated global copy number changes automatically inferred from the same sequencing data file. Our aim was to create a computational method that could use both types of information, i.e. normal and abnormal reads, and demonstrate that by doing so we can highly improve both sensitivity and specificity rates of structural variant prediction. Results: We developed a computational method, SV-Bay, to detect structural variants from whole genome sequencing mate-pair or paired-end data using a probabilistic Bayesian approach. This approach takes into account depth of coverage by normal reads and abnormalities in read pair mappings. To estimate the model likelihood, SV-Bay considers GC-content and read mappability of the genome, thus making important corrections to the expected read count. For the detection of somatic variants, SV-Bay makes use of a matched normal sample when it is available. We validated SV-Bay on simulated datasets and an experimental mate-pair dataset for the CLB-GA neuroblastoma cell line. The comparison of SV-Bay with several other methods for structural variant detection demonstrated that SV-Bay has better prediction accuracy both in terms of sensitivity and false-positive detection rate. Availability and implementation: https://github.com/InstitutCurie/SV-Bay Contact: valentina.boeva@inserm.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26740523

  13. The dnaN gene codes for the beta subunit of DNA polymerase III holoenzyme of escherichia coli.

    PubMed

    Burgers, P M; Kornberg, A; Sakakibara, Y

    1981-09-01

    An Escherichia coli mutant, dnaN59, stops DNA synthesis promptly upon a shift to a high temperature; the wild-type dnaN gene carried in a transducing phage encodes a polypeptide of about 41,000 daltons [Sakakibara, Y. & Mizukami, T. (1980) Mol. Gen. Genet. 178, 541-553; Yuasa, S. & Sakakibara, Y. (1980) Mol. Gen. Genet. 180, 267-273]. We now find that the product of dnaN gene is the beta subunit of DNA polymerase III holoenzyme, the principal DNA synthetic multipolypeptide complex in E. coli. The conclusion is based on the following observations: (i) Extracts from dnaN59 cells were defective in phage phi X174 and G4 DNA synthesis after the mutant cells had been exposed to the increased temperature. (ii) The enzymatic defect was overcome by addition of purified beta subunit but not by other subunits of DNA polymerase III holoenzyme or by other replication proteins required for phi X174 DNA synthesis. (iii) Partially purified beta subunit from the dnaN mutant, unlike that from the wild type, was inactive in reconstituting the holoenzyme when mixed with the other purified subunits. (iv) Increased dosage of the dnaN gene provided by a plasmid carrying the gene raised cellular levels of the beta subunit 5- to 6-fold.

  14. The dnaN gene codes for the beta subunit of DNA polymerase III holoenzyme of escherichia coli.

    PubMed Central

    Burgers, P M; Kornberg, A; Sakakibara, Y

    1981-01-01

    An Escherichia coli mutant, dnaN59, stops DNA synthesis promptly upon a shift to a high temperature; the wild-type dnaN gene carried in a transducing phage encodes a polypeptide of about 41,000 daltons [Sakakibara, Y. & Mizukami, T. (1980) Mol. Gen. Genet. 178, 541-553; Yuasa, S. & Sakakibara, Y. (1980) Mol. Gen. Genet. 180, 267-273]. We now find that the product of dnaN gene is the beta subunit of DNA polymerase III holoenzyme, the principal DNA synthetic multipolypeptide complex in E. coli. The conclusion is based on the following observations: (i) Extracts from dnaN59 cells were defective in phage phi X174 and G4 DNA synthesis after the mutant cells had been exposed to the increased temperature. (ii) The enzymatic defect was overcome by addition of purified beta subunit but not by other subunits of DNA polymerase III holoenzyme or by other replication proteins required for phi X174 DNA synthesis. (iii) Partially purified beta subunit from the dnaN mutant, unlike that from the wild type, was inactive in reconstituting the holoenzyme when mixed with the other purified subunits. (iv) Increased dosage of the dnaN gene provided by a plasmid carrying the gene raised cellular levels of the beta subunit 5- to 6-fold. PMID:6458041

  15. Widespread selection across coding and noncoding DNA in the pea aphid genome.

    PubMed

    Bickel, Ryan D; Dunham, Joseph P; Brisson, Jennifer A

    2013-06-21

    Genome-wide patterns of diversity and selection are critical measures for understanding how evolution has shaped the genome. Yet, these population genomic estimates are available for only a limited number of model organisms. Here we focus on the population genomics of the pea aphid (Acyrthosiphon pisum). The pea aphid is an emerging model system that exhibits a range of intriguing biological traits not present in classic model systems. We performed low-coverage genome resequencing of 21 clonal pea aphid lines collected from alfalfa host plants in North America to characterize genome-wide patterns of diversity and selection. We observed an excess of low-frequency polymorphisms throughout coding and noncoding DNA, which we suggest is the result of a founding event and subsequent population expansion in North America. Most gene regions showed lower levels of Tajima's D than synonymous sites, suggesting that the majority of the genome is not evolving neutrally but rather exhibits significant constraint. Furthermore, we used the pea aphid's unique manner of X-chromosome inheritance to assign genomic scaffolds to either autosomes or the X chromosome. Comparing autosomal vs. X-linked sequence variation, we discovered that autosomal genes show an excess of low frequency variants indicating that purifying selection acts more efficiently on the X chromosome. Overall, our results provide a critical first step in characterizing the genetic diversity and evolutionary pressures on an aphid genome.

  16. Large-scale motif discovery using DNA Gray code and equiprobable oligomers

    PubMed Central

    Ichinose, Natsuhiro; Yada, Tetsushi; Gotoh, Osamu

    2012-01-01

    Motivation: How to find motifs from genome-scale functional sequences, such as all the promoters in a genome, is a challenging problem. Word-based methods count the occurrences of oligomers to detect excessively represented ones. This approach is known to be fast and accurate compared with other methods. However, two problems have hampered the application of such methods to large-scale data. One is the computational cost necessary for clustering similar oligomers, and the other is the bias in the frequency of fixed-length oligomers, which complicates the detection of significant words. Results: We introduce a method that uses a DNA Gray code and equiprobable oligomers, which solve the clustering problem and the oligomer bias, respectively. Our method can analyze 18 000 sequences of ~1 kbp long in 30 s. We also show that the accuracy of our method is superior to that of a leading method, especially for large-scale data and small fractions of motif-containing sequences. Availability: The online and stand-alone versions of the application, named Hegma, are available at our website: http://www.genome.ist.i.kyoto-u.ac.jp/~ichinose/hegma/ Contact: ichinose@i.kyoto-u.ac.jp; o.gotoh@i.kyoto-u.ac.jp PMID:22057160

  17. Large-scale motif discovery using DNA Gray code and equiprobable oligomers.

    PubMed

    Ichinose, Natsuhiro; Yada, Tetsushi; Gotoh, Osamu

    2012-01-01

    How to find motifs from genome-scale functional sequences, such as all the promoters in a genome, is a challenging problem. Word-based methods count the occurrences of oligomers to detect excessively represented ones. This approach is known to be fast and accurate compared with other methods. However, two problems have hampered the application of such methods to large-scale data. One is the computational cost necessary for clustering similar oligomers, and the other is the bias in the frequency of fixed-length oligomers, which complicates the detection of significant words. We introduce a method that uses a DNA Gray code and equiprobable oligomers, which solve the clustering problem and the oligomer bias, respectively. Our method can analyze 18 000 sequences of ~1 kbp long in 30 s. We also show that the accuracy of our method is superior to that of a leading method, especially for large-scale data and small fractions of motif-containing sequences. The online and stand-alone versions of the application, named Hegma, are available at our website: http://www.genome.ist.i.kyoto-u.ac.jp/~ichinose/hegma/ ichinose@i.kyoto-u.ac.jp; o.gotoh@i.kyoto-u.ac.jp

  18. DNA sequence-based "bar codes" for tracking the origins of expressed sequence tags from a maize cDNA library constructed using multiple mRNA sources.

    PubMed

    Qiu, Fang; Guo, Ling; Wen, Tsui-Jung; Liu, Feng; Ashlock, Daniel A; Schnable, Patrick S

    2003-10-01

    To enhance gene discovery, expressed sequence tag (EST) projects often make use of cDNA libraries produced using diverse mixtures of mRNAs. As such, expression data are lost because the origins of the resulting ESTs cannot be determined. Alternatively, multiple libraries can be prepared, each from a more restricted source of mRNAs. Although this approach allows the origins of ESTs to be determined, it requires the production of multiple libraries. A hybrid approach is reported here. A cDNA library was prepared using 21 different pools of maize (Zea mays) mRNAs. DNA sequence "bar codes" were added during first-strand cDNA synthesis to uniquely identify the mRNA source pool from which individual cDNAs were derived. Using a decoding algorithm that included error correction, it was possible to identify the source mRNA pool of more than 97% of the ESTs. The frequency at which a bar code is represented in an EST contig should be proportional to the abundance of the corresponding mRNA in the source pool. Consistent with this, all ESTs derived from several genes (zein and adh1) that are known to be exclusively expressed in kernels or preferentially expressed under anaerobic conditions, respectively, were exclusively tagged with bar codes associated with mRNA pools prepared from kernel and anaerobically treated seedlings, respectively. Hence, by allowing for the retention of expression data, the bar coding of cDNA libraries can enhance the value of EST projects.

  19. URF6, Last Unidentified Reading Frame of Human mtDNA, Codes for an NADH Dehydrogenase Subunit

    NASA Astrophysics Data System (ADS)

    Chomyn, Anne; Cleeter, Michael W. J.; Ragan, C. Ian; Riley, Marcia; Doolittle, Russell F.; Attardi, Giuseppe

    1986-10-01

    The polypeptide encoded in URF6, the last unassigned reading frame of human mitochondrial DNA, has been identified with antibodies to peptides predicted from the DNA sequence. Antibodies prepared against highly purified respiratory chain NADH dehydrogenase from beef heart or against the cytoplasmically synthesized 49-kilodalton iron-sulfur subunit isolated from this enzyme complex, when added to a deoxycholate or a Triton X-100 mitochondrial lysate of HeLa cells, specifically precipitated the URF6 product together with the six other URF products previously identified as subunits of NADH dehydrogenase. These results strongly point to the URF6 product as being another subunit of this enzyme complex. Thus, almost 60% of the protein coding capacity of mammalian mitochondrial DNA is utilized for the assembly of the first enzyme complex of the respiratory chain. The absence of such information in yeast mitochondrial DNA dramatizes the variability in gene content of different mitochondrial genomes.

  20. Functional expression in primate cells of cloned DNA coding for the hemagglutinin surface glycoprotein of influenza virus.

    PubMed Central

    Sveda, M M; Lai, C J

    1981-01-01

    We have used simian virus 40 (SV40) DNA as a vector for expression of functional activity of a cloned influenza viral DNA segment in primate cells. Cloned full-length DNA sequences coding for the hemagglutinin of influenza A virus (Udorn/72/[H3N2]) were inserted into the late region of a viable deletion mutant of SV40, and the hybrid DNA was propagated in the presence of an early SV40 mutant (tsA28) helper. Infection of primate cells with the hybrid virus produced a polypeptide similar in molecular size to the hemagglutinin of influenza virus, as shown by immunoprecipitation and gel electrophoresis. The polypeptide was glycosylated, as shown by incorporation of radioactive sugars. The putative hemagglutinin exhibited functional activity, as shown by agglutination of erythrocytes. In addition, an indirect immunofluorescence assay showed that the hemagglutinin polypeptide of the hybrid virus could be detected on the surface of infected cells. Images PMID:6272305

  1. Phytoplasma plasmid DNA extraction.

    PubMed

    Andersen, Mark T; Liefting, Lia W

    2013-01-01

    Phytoplasma plasmids have generally been detected from DNA extracted from plants and insects using methods designed for the purification of total phytoplasma DNA. Methods include extraction from tissues that are high in phytoplasma titre, such as the phloem of plants, with the use of CsCl-bisbenzimide gradients that exploit the low G+C content of phytoplasma DNA. Many of the methods employed for phytoplasma purification have been described elsewhere in this book. Here we describe in detail two methods that are specifically aimed at isolating plasmid DNA.

  2. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    PubMed

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  3. Natural selection on coding and noncoding DNA sequences is associated with virulence genes in a plant pathogenic fungus.

    PubMed

    Rech, Gabriel E; Sanz-Martín, José M; Anisimova, Maria; Sukno, Serenella A; Thon, Michael R

    2014-09-04

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5' untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen.

  4. Natural Selection on Coding and Noncoding DNA Sequences Is Associated with Virulence Genes in a Plant Pathogenic Fungus

    PubMed Central

    Rech, Gabriel E.; Sanz-Martín, José M.; Anisimova, Maria; Sukno, Serenella A.; Thon, Michael R.

    2014-01-01

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5′ untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen. PMID:25193312

  5. Signalign: An Ontology of DNA as Signal for Comparative Gene Structure Prediction Using Information-Coding-and-Processing Techniques.

    PubMed

    Yu, Ning; Guo, Xuan; Gu, Feng; Pan, Yi

    2016-03-01

    Conventional character-analysis-based techniques in genome analysis manifest three main shortcomings-inefficiency, inflexibility, and incompatibility. In our previous research, a general framework, called DNA As X was proposed for character-analysis-free techniques to overcome these shortcomings, where X is the intermediates, such as digit, code, signal, vector, tree, graph network, and so on. In this paper, we further implement an ontology of DNA As Signal, by designing a tool named Signalign for comparative gene structure analysis, in which DNA sequences are converted into signal series, processed by modified method of dynamic time warping and measured by signal-to-noise ratio (SNR). The ontology of DNA As Signal integrates the principles and concepts of other disciplines including information coding theory and signal processing into sequence analysis and processing. Comparing with conventional character-analysis-based methods, Signalign can not only have the equivalent or superior performance, but also enrich the tools and the knowledge library of computational biology by extending the domain from character/string to diverse areas. The evaluation results validate the success of the character-analysis-free technique for improved performances in comparative gene structure prediction.

  6. The molecular cloning and characterisation of cDNA coding for the alpha subunit of the acetylcholine receptor.

    PubMed Central

    Sumikawa, K; Houghton, M; Smith, J C; Bell, L; Richards, B M; Barnard, E A

    1982-01-01

    A rare cDNA coding for most of the alpha subunit of the Torpedo nicotinic acetylcholine receptor has been cloned into bacteria. The use of a mismatched oligonucleotide primer of reverse transcriptase facilitated the design of an efficient, specific probe for recombinant bacteria. DNA sequence analysis has enabled the elucidation of a large part of the polypeptide primary sequence which is discussed in relation to its acetylcholine binding activity and the location of receptor within the plasma membrane. When used as a radioactive probe, the cloned cDNA binds specifically to a single Torpedo mRNA species of about 2350 nucleotides in length but fails to show significant cross-hybridisation with alpha subunit mRNA extracted from cat muscle. Images PMID:6183641

  7. Improved PCR Amplification of Broad Spectrum GC DNA Templates.

    PubMed

    Guido, Nicholas; Starostina, Elena; Leake, Devin; Saaem, Ishtiaq

    2016-01-01

    Many applications in molecular biology can benefit from improved PCR amplification of DNA segments containing a wide range of GC content. Conventional PCR amplification of DNA sequences with regions of GC less than 30%, or higher than 70%, is complex due to secondary structures that block the DNA polymerase as well as mispriming and mis-annealing of the DNA. This complexity will often generate incomplete or nonspecific products that hamper downstream applications. In this study, we address multiplexed PCR amplification of DNA segments containing a wide range of GC content. In order to mitigate amplification complications due to high or low GC regions, we tested a combination of different PCR cycling conditions and chemical additives. To assess the fate of specific oligonucleotide (oligo) species with varying GC content in a multiplexed PCR, we developed a novel method of sequence analysis. Here we show that subcycling during the amplification process significantly improved amplification of short template pools (~200 bp), particularly when the template contained a low percent of GC. Furthermore, the combination of subcycling and 7-deaza-dGTP achieved efficient amplification of short templates ranging from 10-90% GC composition. Moreover, we found that 7-deaza-dGTP improved the amplification of longer products (~1000 bp). These methods provide an updated approach for PCR amplification of DNA segments containing a broad range of GC content.

  8. Improved PCR Amplification of Broad Spectrum GC DNA Templates

    PubMed Central

    Guido, Nicholas; Starostina, Elena; Leake, Devin; Saaem, Ishtiaq

    2016-01-01

    Many applications in molecular biology can benefit from improved PCR amplification of DNA segments containing a wide range of GC content. Conventional PCR amplification of DNA sequences with regions of GC less than 30%, or higher than 70%, is complex due to secondary structures that block the DNA polymerase as well as mispriming and mis-annealing of the DNA. This complexity will often generate incomplete or nonspecific products that hamper downstream applications. In this study, we address multiplexed PCR amplification of DNA segments containing a wide range of GC content. In order to mitigate amplification complications due to high or low GC regions, we tested a combination of different PCR cycling conditions and chemical additives. To assess the fate of specific oligonucleotide (oligo) species with varying GC content in a multiplexed PCR, we developed a novel method of sequence analysis. Here we show that subcycling during the amplification process significantly improved amplification of short template pools (~200 bp), particularly when the template contained a low percent of GC. Furthermore, the combination of subcycling and 7-deaza-dGTP achieved efficient amplification of short templates ranging from 10–90% GC composition. Moreover, we found that 7-deaza-dGTP improved the amplification of longer products (~1000 bp). These methods provide an updated approach for PCR amplification of DNA segments containing a broad range of GC content. PMID:27271574

  9. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system.

    PubMed

    Kawano, Tomonori

    2013-03-01

    There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.

  10. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    PubMed Central

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-01-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity. PMID:26472689

  11. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    NASA Astrophysics Data System (ADS)

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-10-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity.

  12. Breaking the code of DNA binding specificity of TAL-type III effectors.

    PubMed

    Boch, Jens; Scholze, Heidi; Schornack, Sebastian; Landgraf, Angelika; Hahn, Simone; Kay, Sabine; Lahaye, Thomas; Nickstadt, Anja; Bonas, Ulla

    2009-12-11

    The pathogenicity of many bacteria depends on the injection of effector proteins via type III secretion into eukaryotic cells in order to manipulate cellular processes. TAL (transcription activator-like) effectors from plant pathogenic Xanthomonas are important virulence factors that act as transcriptional activators in the plant cell nucleus, where they directly bind to DNA via a central domain of tandem repeats. Here, we show how target DNA specificity of TAL effectors is encoded. Two hypervariable amino acid residues in each repeat recognize one base pair in the target DNA. Recognition sequences of TAL effectors were predicted and experimentally confirmed. The modular protein architecture enabled the construction of artificial effectors with new specificities. Our study describes the functionality of a distinct type of DNA binding domain and allows the design of DNA binding domains for biotechnology.

  13. Muscle coding sequences and their regulation during myogenesis: cloning of muscle actin cDNA probes.

    PubMed

    Minty, A; Caravatti, M; Robert, B; Cohen, A; Daubas, P; Weydert, A; Gros, F; Buckingham, M

    1981-01-01

    For a number of years our group has been mainly interested in the regulation of muscle gene expression during myogenesis. Using primary cultures and cell lines we have tried to find out whether the coding sequences for muscle proteins are already present in an unexpressed form or if there is a transcriptional switch at the onset of differentiation. Metabolic studies on pulse-labelled RNA, together with translation and molecular hybridization experiments have given a certain number of indications. More recently the development of genetic engineering techniques has made it possible to answer these questions directly with probes which are complementary to specific muscle coding sequences. We have identified a plasmid which contains a coding sequence for muscle actin. Other recombinant plasmids are being characterized. Such plasmids, used as probes, will permit us to study the organization and expression of the genes coding for the contractile proteins in muscle cells.

  14. Genome of Staphylococcal Phage K: a New Lineage of Myoviridae Infecting Gram-Positive Bacteria with a Low G+C Content

    PubMed Central

    O'Flaherty, S.; Coffey, A.; Edwards, R.; Meaney, W.; Fitzgerald, G. F.; Ross, R. P.

    2004-01-01

    Phage K is a polyvalent phage of the Myoviridae family which is active against a wide range of staphylococci. Phage genome sequencing revealed a linear DNA genome of 127,395 bp, which carries 118 putative open reading frames. The genome is organized in a modular form, encoding modules for lysis, structural proteins, DNA replication, and transcription. Interestingly, the structural module shows high homology to the structural module from Listeria phage A511, suggesting intergenus horizontal transfer. In addition, phage K exhibits the potential to encode proteins necessary for its own replisome, including DNA ligase, primase, helicase, polymerase, RNase H, and DNA binding proteins. Phage K has a complete absence of GATC sites, making it insensitive to restriction enzymes which cleave this sequence. Three introns (lys-I1, pol-I2, and pol-I3) encoding putative endonucleases were located in the genome. Two of these (pol-I2 and pol-I3) were found to interrupt the DNA polymerase gene, while the other (lys-I1) interrupts the lysin gene. Two of the introns encode putative proteins with homology to HNH endonucleases, whereas the other encodes a 270-amino-acid protein which contains two zinc fingers (CX2CX22CX2C and CX2CX23CX2C). The availability of the genome of this highly virulent phage, which is active against infective staphylococci, should provide new insights into the biology and evolution of large broad-spectrum polyvalent phages. PMID:15090528

  15. Deciphering the Epigenetic Code: An Overview of DNA Methylation Analysis Methods

    PubMed Central

    Umer, Muhammad

    2013-01-01

    Abstract Significance: Methylation of cytosine in DNA is linked with gene regulation, and this has profound implications in development, normal biology, and disease conditions in many eukaryotic organisms. A wide range of methods and approaches exist for its identification, quantification, and mapping within the genome. While the earliest approaches were nonspecific and were at best useful for quantification of total methylated cytosines in the chunk of DNA, this field has seen considerable progress and development over the past decades. Recent Advances: Methods for DNA methylation analysis differ in their coverage and sensitivity, and the method of choice depends on the intended application and desired level of information. Potential results include global methyl cytosine content, degree of methylation at specific loci, or genome-wide methylation maps. Introduction of more advanced approaches to DNA methylation analysis, such as microarray platforms and massively parallel sequencing, has brought us closer to unveiling the whole methylome. Critical Issues: Sensitive quantification of DNA methylation from degraded and minute quantities of DNA and high-throughput DNA methylation mapping of single cells still remain a challenge. Future Directions: Developments in DNA sequencing technologies as well as the methods for identification and mapping of 5-hydroxymethylcytosine are expected to augment our current understanding of epigenomics. Here we present an overview of methodologies available for DNA methylation analysis with special focus on recent developments in genome-wide and high-throughput methods. While the application focus relates to cancer research, the methods are equally relevant to broader issues of epigenetics and redox science in this special forum. Antioxid. Redox Signal. 18, 1972–1986. PMID:23121567

  16. The vicilin gene family of pea (Pisum sativum L.): a complete cDNA coding sequence for preprovicilin.

    PubMed Central

    Lycett, G W; Delauney, A J; Gatehouse, J A; Gilroy, J; Croy, R R; Boulter, D

    1983-01-01

    A cDNA plasmid bank has been constructed using mRNA from developing pea seeds and three cDNAs coding for vicilin polypeptides have been selected. These cDNAs have been sequenced and between them cover the whole of the coding sequence plus part of the 5' and 3' untranslated regions. Comparison with amino acid sequence data from the protein indicates that vicilin is synthesised as preprovicilin with subsequent removal of a signal peptide and a C-terminal peptide as well as post translational endo-proteolytic cleavage. The cDNAs represent two different classes of vicilin genes whilst amino acid data show that there are at least three major classes of vicilin polypeptide. The vicilin sequences show extensive homology with conglycinin and phaseolin except in the regions of the internal proteolytic cleavages. The evolutionary significance of this relationship is discussed. Images PMID:6687941

  17. A framework for the DNA-protein recognition code of the probe helix in transcription factors: the chemical and stereochemical rules.

    PubMed

    Suzuki, M

    1994-04-15

    Understanding the general mechanisms of sequence specific DNA recognition by proteins is a major challenge in structural biology. The existence of a 'DNA recognition code' for proteins, by which certain amino acid residues on a protein surface confer specificity for certain DNA bases, has been the subject of much discussion. However, no simple code has yet been established. The principles of DNA recognition can be described at two levels. The 'chemical' rules describe the partnerships between amino acid side chains and DNA bases making favourable interactions in the major groove of DNA. Here I analyze the occurrence of nucleotide-amino acid contacts in previously determined crystal structures of DNA-protein complexes and find that simple rules pertain. I also describe 'stereochemical' rules for the probe helix type of DNA-binding motif found in certain transcription factors including leucine zipper and homeodomain proteins. These are a consequence of the binding geometry, and specify the amino acid and base positions used for the contacts, and the sizes of residues in the contact interface. The chemical rules can be generalized for any DNA-binding motif, while the stereochemical rules are specific to a particular DNA-binding motif. The recognition code for a particular binding motif can be described by combining the two sets of rules.

  18. Tumor regression induced by intratumoral injection of DNA coding for human interleukin 12 into melanoma metastases in gray horses.

    PubMed

    Heinzerling, L M; Feige, K; Rieder, S; Akens, M K; Dummer, R; Stranzinger, G; Moelling, K

    2001-01-01

    Preclinical studies investigating new therapeutic principles against melanoma are presently being carried out in mouse models; however, these are not optimal. Here we describe a novel animal model using gray horses. These animals spontaneously develop metastatic melanoma that resembles human disease and is thus highly relevant for preclinical studies testing new immunotherapy protocols. We found that injection of plasmid DNA coding for the human cytokine interleukin 12 into established metastases induced significant regression in all 12 treated lesions in a total of 7 horses. Complete disappearance was observed in one treated lesion, with no recurrence after 6 months. No adverse events have been observed in any of the animals during and after treatment. These results demonstrate the effectiveness and safety of interleukin 12 encoding plasmid DNA therapy against established metastatic disease in a large animal model and serve as a basis for a clinical trial.

  19. RGB colour coding of Y-shaped DNA for simultaneous tri-analyte solid phase hybridization detection.

    PubMed

    Krissanaprasit, Abhichart; Somasundrum, Mithran; Surareungchai, Werasak

    2011-01-15

    We present a new concept for tri-analyte DNA detection based on the idea of a Y-shaped capture probe which, after tri-target and fluorescently labeled reporter probe binding, becomes colour-coded to generate images in an RGB colour scheme. Hence, the RGB value of the resulting secondary pseudo-colour presented by the hybridized Y-DNA can be related to the ratio of the primary pseudo-colours present in its make-up, and thus to the ratio of the three target concentrations. As a proof of concept we detect sequences from the genes of the pathogenic bacterial strains Escherichia coli O157:H7, Vibrio cholera and Salmonella enteric in a semi-quantitative manner across the range 20-167 nM. The assay was relatively quick, with a time from hybridization to completed data interpretation of approximately 4 h. Copyright © 2010 Elsevier B.V. All rights reserved.

  20. Bio-bar-code dendrimer-like DNA as signal amplifier for cancerous cells assay using ruthenium nanoparticle-based ultrasensitive chemiluminescence detection.

    PubMed

    Bi, Sai; Hao, Shuangyuan; Li, Li; Zhang, Shusheng

    2010-09-07

    Bio-bar-code dendrimer-like DNA (bbc-DL-DNA) is employed as a label for the amplification assay of cancer cells in combination with the newly explored chemiluminescence (CL) system of luminol-H(2)O(2)-Ru(3+) and specificity of structure-switching aptamers selected by cell-based SELEX.

  1. Molecular cloning of a gene (poIA) coding for an unusual DNA polymerase I from Treponema pallidum.

    PubMed

    Rodes, B; Liu, H; Johnson, S; George, R; Steiner, B

    2000-07-01

    The gene coding for the DNA polymerase I from Treponema pallidum, Nichols strain, was cloned and sequenced. Depending on which of the two alternative initiation codons was used, the protein was either 997 or 1015 amino acids long and the predicted protein had a molecular mass of either 112 or 114 kDa. Sequence comparisons with other polA genes showed that all three domains expected in the DNA polymerase I class of enzymes were present in the protein (5'-3' exonuclease, 3'-5' exonuclease and polymerase domains). Additionally, there were four unique insertions of 20-30 amino acids each, not seen in other DNA polymerase I enzymes. Two of the inserts were near the boundary of the two exonuclease domains and the other two interrupted the 3'-5' exonuclease domain which is involved in proofreading. The predicted amino-acid sequence had an exceptionally high content of cysteine (2.4% compared with <0.05% for most other sequenced DNA polymerase I enzymes). The polA gene was further cloned into pProEXHTa for expression and purification. The transformants expressed a protein of 115 kDa. Antibodies raised against synthetic peptide fragments of the putative DNA polymerase I recognised the 115-kda band in Western blot analysis. No DNA synthesis activity could be demonstrated on a primed single-stranded template. Although significant quantities of the protein were produced in the host Escherichia coli carrying the plasmid, it was not capable of complementing a polA(-) mutant in the replication of a polA-dependent plasmid.

  2. A novel non-coding RNA lncRNA-JADE connects DNA damage signalling to histone H4 acetylation.

    PubMed

    Wan, Guohui; Hu, Xiaoxiao; Liu, Yunhua; Han, Cecil; Sood, Anil K; Calin, George A; Zhang, Xinna; Lu, Xiongbin

    2013-10-30

    A prompt and efficient DNA damage response (DDR) eliminates the detrimental effects of DNA lesions in eukaryotic cells. Basic and preclinical studies suggest that the DDR is one of the primary anti-cancer barriers during tumorigenesis. The DDR involves a complex network of processes that detect and repair DNA damage, in which long non-coding RNAs (lncRNAs), a new class of regulatory RNAs, may play an important role. In the current study, we identified a novel lncRNA, lncRNA-JADE, that is induced after DNA damage in an ataxia-telangiectasia mutated (ATM)-dependent manner. LncRNA-JADE transcriptionally activates Jade1, a key component in the HBO1 (human acetylase binding to ORC1) histone acetylation complex. Consequently, lncRNA-JADE induces histone H4 acetylation in the DDR. Markedly higher levels of lncRNA-JADE were observed in human breast tumours in comparison with normal breast tissues. Knockdown of lncRNA-JADE significantly inhibited breast tumour growth in vivo. On the basis of these results, we propose that lncRNA-JADE is a key functional link that connects the DDR to histone H4 acetylation, and that dysregulation of lncRNA-JADE may contribute to breast tumorigenesis.

  3. A novel non-coding RNA lncRNA-JADE connects DNA damage signalling to histone H4 acetylation

    PubMed Central

    Wan, Guohui; Hu, Xiaoxiao; Liu, Yunhua; Han, Cecil; Sood, Anil K; Calin, George A; Zhang, Xinna; Lu, Xiongbin

    2013-01-01

    A prompt and efficient DNA damage response (DDR) eliminates the detrimental effects of DNA lesions in eukaryotic cells. Basic and preclinical studies suggest that the DDR is one of the primary anti-cancer barriers during tumorigenesis. The DDR involves a complex network of processes that detect and repair DNA damage, in which long non-coding RNAs (lncRNAs), a new class of regulatory RNAs, may play an important role. In the current study, we identified a novel lncRNA, lncRNA-JADE, that is induced after DNA damage in an ataxia-telangiectasia mutated (ATM)-dependent manner. LncRNA-JADE transcriptionally activates Jade1, a key component in the HBO1 (human acetylase binding to ORC1) histone acetylation complex. Consequently, lncRNA-JADE induces histone H4 acetylation in the DDR. Markedly higher levels of lncRNA-JADE were observed in human breast tumours in comparison with normal breast tissues. Knockdown of lncRNA-JADE significantly inhibited breast tumour growth in vivo. On the basis of these results, we propose that lncRNA-JADE is a key functional link that connects the DDR to histone H4 acetylation, and that dysregulation of lncRNA-JADE may contribute to breast tumorigenesis. PMID:24097061

  4. Conservation of genetic information: a code for site-specific DNA recognition.

    PubMed Central

    Harris, L F; Sullivan, M R; Hickok, D F

    1993-01-01

    We present findings of genetic information conservation between the glucocorticoid response element (GRE) DNA and the cDNA encoding the glucocorticoid receptor (GR) DNA-binding domain (DBD). The regions of nucleotide sub-sequence similarity to the GRE in the GR DBD occur specifically at nucleotide sequences on the ends of exons 3,4, and 5 at their splice junction sites. These sequences encode the DNA recognition helix on exon 3, a beta-strand on exon 4, and a putative alpha-helix on exon 5, respectively. The nucleotide sequence of exon 5 that encodes the putative alpha-helix located on the carboxyl terminus of the GR DBD shares sequence similarity with the flanking nucleotide regions of the GRE. We generated a computer model of the GR DBD using atomic coordinates derived from nuclear magnetic resonance spectroscopy to which we attached the exon 5-encoded putative alpha-helix. We docked this GR DBD structure at the 39-base-pair nucleotide sequence containing the GRE binding site and flanking nucleotides, which contained conserved genetic information. We observed that amino acids of the DNA recognition helix, the beta-strand, and the putative alpha-helix are spatially aligned with trinucleotides identical to their cognate codons within the GRE and its flanking nucleotides. Images Fig. 3 PMID:8516297

  5. Characterization of EBV Promoters and Coding Regions by Sequencing PCR-Amplified DNA Fragments.

    PubMed

    Szenthe, Kalman; Bánáti, Ferenc

    2017-01-01

    DNA sequencing approaches originally developed in two directions, the chemical degradation method and the chain-termination method. The latter one became more widespread and a huge amount of sequencing data including whole genome sequences accumulated, based on the use of capillary sequencer systems and the application of a modified chain-termination method which proved to be relatively easy, fast, and reliable. In addition, relatively long, up to 1000 bp sequences could be obtained with a single read with high per-base accuracy. Although the recent appearance of next-generation DNA sequencing (NGS) technologies enabled high-throughput and low cost analysis of DNA, the modified chain-terminating methods are often applied in research until now. In the following, we shall present the application of capillary sequencing for the sequence characterization of viral genomes in case of partial and whole genome sequencing, and demonstrate it on the BARF1 promoter of Epstein Barr virus (EBV).

  6. DNA sequencing and bar-coding using solid-state nanopores.

    PubMed

    Atas, Evrim; Singer, Alon; Meller, Amit

    2012-12-01

    Nanopores have emerged as a prominent single-molecule analytic tool with particular promise for genomic applications. In this review, we discuss two potential applications of the nanopore sensors: First, we present a nanopore-based single-molecule DNA sequencing method that utilizes optical detection for massively parallel throughput. Second, we describe a method by which nanopores can be used as single-molecule genotyping tools. For DNA sequencing, the distinction among the four types of DNA nucleobases is achieved by employing a biochemical procedure for DNA expansion. In this approach, each nucleobase in each DNA strand is converted into one of four predefined unique 16-mers in a process that preserves the nucleobase sequence. The resulting converted strands are then hybridized to a library of four molecular beacons, each carrying a unique fluorophore tag, that are perfect complements to the 16-mers used for conversion. Solid-state nanopores are then used to sequentially remove these beacons, one after the other, leading to a series of photon bursts in four colors that can be optically detected. Single-molecule genotyping is achieved by tagging the DNA fragments with γ-modified synthetic peptide nucleic acid probes coupled to an electronic characterization of the complexes using solid-state nanopores. This method can be used to identify and differentiate genes with a high level of sequence similarity at the single-molecule level, but different pathology or response to treatment. We will illustrate this method by differentiating the pol gene for two highly similar human immunodeficiency virus subtypes, paving the way for a novel diagnostics platform for viral classification. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Methods for sequencing GC-rich and CCT repeat DNA templates

    DOEpatents

    Robinson, Donna L.

    2007-02-20

    The present invention is directed to a PCR-based method of cycle sequencing DNA and other polynucleotide sequences having high CG content and regions of high GC content, and includes for example DNA strands with a high Cytosine and/or Guanosine content and repeated motifs such as CCT repeats.

  8. Coding of DNA samples and data in the pharmaceutical industry: current practices and future directions--perspective of the I-PWG.

    PubMed

    Franc, M A; Cohen, N; Warner, A W; Shaw, P M; Groenen, P; Snapir, A

    2011-04-01

    DNA samples collected in clinical trials and stored for future research are valuable to pharmaceutical drug development. Given the perceived higher risk associated with genetic research, industry has implemented complex coding methods for DNA. Following years of experience with these methods and with addressing questions from institutional review boards (IRBs), ethics committees (ECs) and health authorities, the industry has started reexamining the extent of the added value offered by these methods. With the goal of harmonization, the Industry Pharmacogenomics Working Group (I-PWG) conducted a survey to gain an understanding of company practices for DNA coding and to solicit opinions on their effectiveness at protecting privacy. The results of the survey and the limitations of the coding methods are described. The I-PWG recommends dialogue with key stakeholders regarding coding practices such that equal standards are applied to DNA and non-DNA samples. The I-PWG believes that industry standards for privacy protection should provide adequate safeguards for DNA and non-DNA samples/data and suggests a need for more universal standards for samples stored for future research.

  9. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  10. The non-coding B2 RNA binds to the DNA cleft and active site region of RNA polymerase II

    PubMed Central

    Ponicsan, Steven L.; Houel, Stephane; Old, William M.; Ahn, Natalie G.; Goodrich, James A.; Kugel, Jennifer F.

    2013-01-01

    The B2 family of short interspersed elements is transcribed into non-coding RNA by RNA polymerase III. The ~180 nt B2 RNA has been shown to potently repress mRNA transcription by binding tightly to RNA polymerase II (Pol II) and assembling with it into complexes on promoter DNA, where it keeps the polymerase from properly engaging the promoter DNA. Mammalian Pol II is a ~500 kD complex that contains 12 different protein subunits, providing many possible surfaces for interaction with B2 RNA. We found that the carboxy-terminal domain of the largest Pol II subunit was not required for B2 RNA to bind Pol II and repress transcription in vitro. To identify the surface on Pol II to which the minimal functional region of B2 RNA binds, we coupled multi-step affinity purification, reversible formaldehyde crosslinking, peptide sequencing by mass spectrometry, and analysis of peptide enrichment. The Pol II peptides most highly recovered after crosslinking to B2 RNA mapped to the DNA binding cleft and active site region of Pol II. These studies determine the location of a defined nucleic acid binding site on a large, native, multi-subunit complex and provide insight into the mechanism of transcriptional repression by B2 RNA. PMID:23416138

  11. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  12. An Abundant Class of Non-coding DNA Can Prevent Stochastic Gene Silencing in the C. elegans Germline.

    PubMed

    Frøkjær-Jensen, Christian; Jain, Nimit; Hansen, Loren; Davis, M Wayne; Li, Yongbin; Zhao, Di; Rebora, Karine; Millet, Jonathan R M; Liu, Xiao; Kim, Stuart K; Dupuy, Denis; Jorgensen, Erik M; Fire, Andrew Z

    2016-07-14

    Cells benefit from silencing foreign genetic elements but must simultaneously avoid inactivating endogenous genes. Although chromatin modifications and RNAs contribute to maintenance of silenced states, the establishment of silenced regions will inevitably reflect underlying DNA sequence and/or structure. Here, we demonstrate that a pervasive non-coding DNA feature in Caenorhabditis elegans, characterized by 10-base pair periodic An/Tn-clusters (PATCs), can license transgenes for germline expression within repressive chromatin domains. Transgenes containing natural or synthetic PATCs are resistant to position effect variegation and stochastic silencing in the germline. Among endogenous genes, intron length and PATC-character undergo dramatic changes as orthologs move from active to repressive chromatin over evolutionary time, indicating a dynamic character to the An/Tn periodicity. We propose that PATCs form the basis of a cellular immune system, identifying certain endogenous genes in heterochromatic contexts as privileged while foreign DNA can be suppressed with no requirement for a cellular memory of prior exposure.

  13. Population dynamics coded in DNA: genetic traces of the expansion of modern humans

    NASA Astrophysics Data System (ADS)

    Kimmel, Marek

    1999-12-01

    It has been proposed that modern humans evolved from a small ancestral population, which appeared several hundred thousand years ago in Africa. Descendants of the founder group migrated to Europe and then to Asia, not mixing with the pre-existing local populations but replacing them. Two demographic elements are present in this “out of Africa” hypothesis: numerical growth of the modern humans and their migration into Eurasia. Did these processes leave an imprint in our DNA? To address this question, we use the classical Fisher-Wright-Moran model of population genetics, assuming variable population size and two models of mutation: the infinite-sites model and the stepwise-mutation model. We use the coalescence theory, which amounts to tracing the common ancestors of contemporary genes. We obtain mathematical formulae expressing the distribution of alleles given the time changes of population size . In the framework of the infinite-sites model, simulations indicate that the pattern of past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mitochondrial DNA sequences indicates that the current mitochondrial DNA sequence variation is not inconsistent with the logistic growth of the modern human population. In the framework of the stepwise-mutation model, we demonstrate that population bottleneck followed by growth in size causes an imbalance between allele-size variance and heterozygosity. We analyze a set of data on tetranucleotide repeats which reveals the existence of this imbalance. The pattern of imbalance is consistent with the bottleneck being most ancient in Africans, most recent in Asians and intermediate in Europeans. These findings are consistent with the “out of Africa” hypothesis, although by no means do they constitute its proof.

  14. Bar-coded, multiplexed sequencing of targeted DNA regions using the Illumina Genome Analyzer.

    PubMed

    Szelinger, Szabolcs; Kurdoglu, Ahmet; Craig, David W

    2011-01-01

    To date, genome-wide association (GWA) studies, in which thousands of markers throughout the genome are simultaneously genotyped, have identified hundreds of loci underlying disease susceptibility. These regions typically span 5-100 kb, and resequencing efforts to identify potential functional variants within these loci represent the next logical step in the genetic characterization pipeline. Next-generation DNA sequencing technologies are, in principle, well-suited for this task, yet despite the massive sequencing capability afforded by these platforms, the present-day reality is that it remains difficult, time-consuming, and expensive to resequence large numbers of samples across moderately sized genomic regions. To address this obstacle, we developed a generalized framework for multiplexed resequencing of targeted regions of the human genome on the Illumina Genome Analyzer using degenerate, indexed DNA sequence barcodes ligated to fragmented DNA prior to sequencing. Using this method, the DNA of multiple individuals can be simultaneously sequenced at several regions. We find that achieving adequate coverage is one of the most important factors in the design of an experiment, but other key considerations include whether the objective is to discover genetic variants for genotyping later by a separate method, to genotype all identified variants by sequencing, or to exhaustively identify all common and rare variants in the region. Given the massive bandwidth of next-generation sequencing technologies and their low inherent throughput in terms of sequencing arrays per week, multiplexed sequencing using the barcoding approach offers a clear mechanism for focusing bandwidth to a smaller region across many more individuals or samples.

  15. The DNA sequence and biology of human chromosome 19

    SciTech Connect

    Grimwood, J; Gordon, L A; Olsen, A; Terry, A; Schmutz, J; Lamerdin, J; Hellsten, U; Goodstein, D; Couronne, O; Tran-Gyamfi, M

    2004-04-06

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high GC content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in Mendelian disorders, including familial hypercholesterolemia and insulin-resistant diabetes. Nearly one quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

  16. Fine-tuning the ubiquitin code at DNA double-strand breaks: deubiquitinating enzymes at work

    PubMed Central

    Citterio, Elisabetta

    2015-01-01

    Ubiquitination is a reversible protein modification broadly implicated in cellular functions. Signaling processes mediated by ubiquitin (ub) are crucial for the cellular response to DNA double-strand breaks (DSBs), one of the most dangerous types of DNA lesions. In particular, the DSB response critically relies on active ubiquitination by the RNF8 and RNF168 ub ligases at the chromatin, which is essential for proper DSB signaling and repair. How this pathway is fine-tuned and what the functional consequences are of its deregulation for genome integrity and tissue homeostasis are subject of intense investigation. One important regulatory mechanism is by reversal of substrate ubiquitination through the activity of specific deubiquitinating enzymes (DUBs), as supported by the implication of a growing number of DUBs in DNA damage response processes. Here, we discuss the current knowledge of how ub-mediated signaling at DSBs is controlled by DUBs, with main focus on DUBs targeting histone H2A and on their recent implication in stem cell biology and cancer. PMID:26442100

  17. Cloning and characterization of a cDNA coding for mouse placental alkaline phosphatase

    SciTech Connect

    Terao, M.; Mintz, B.

    1987-10-01

    Mouse alkaline phosphatase was partially purified from placenta. Data obtained by immunoblotting analysis suggested that the primary structure of this enzyme has a much greater homology to that of human and bovine liver ALPs than to the human placental isozyme. Therefore, a full-length cDNA encoding human liver-type ALP was used as a probe to isolate the mouse placental ALP cDNA. The cloned mouse cDNA is 2459 base pairs long and is composed of an open reading frame encoding a 524-amino acid polypeptide that contains a putative signal peptide of 17 amino acids. Homology at the amino acid level of the mouse placental ALP is 90% to the human liver isozyme but only 55% to the human placental counterpart. RNA blot hybridization results indicate that the mouse placental ALP is encoded by a gene identical to the gene expressed in mouse liver, kidney, and teratocarcinoma stem cells. This gene is therefore evolutionarily highly conserved in mouse and human.

  18. African swine fever virus ORF P1192R codes for a functional type II DNA topoisomerase.

    PubMed

    Coelho, João; Martins, Carlos; Ferreira, Fernando; Leitão, Alexandre

    2015-01-01

    Topoisomerases modulate the topological state of DNA during processes, such as replication and transcription, that cause overwinding and/or underwinding of the DNA. African swine fever virus (ASFV) is a nucleo-cytoplasmic double-stranded DNA virus shown to contain an OFR (P1192R) with homology to type II topoisomerases. Here we observed that pP1192R is highly conserved among ASFV isolates but dissimilar from other viral, prokaryotic or eukaryotic type II topoisomerases. In both ASFV/Ba71V-infected Vero cells and ASFV/L60-infected pig macrophages we detected pP1192R at intermediate and late phases of infection, cytoplasmically localized and accumulating in the viral factories. Finally, we used a Saccharomyces cerevisiae temperature-sensitive strain in order to demonstrate, through complementation and in vitro decatenation assays, the functionality of P1192R, which we further confirmed by mutating its predicted catalytic residue. Overall, this work strengthens the idea that P1192R constitutes a target for studying, and possibly controlling, ASFV transcription and replication.

  19. HGSA DNA day essay contest winner 60 years on: still coding for cutting-edge science.

    PubMed

    Yates, Patrick

    2013-08-01

    MESSAGE FROM THE EDUCATION COMMITTEE: In 2013, the Education Committee of the Human Genetics Society of Australasia (HGSA) established the DNA Day Essay Contest in Australia and New Zealand. The contest was first established by the American Society of Human Genetics in 2005 and the HGSA DNA Day Essay Contest is adapted from this contest via a collaborative partnership. The aim of the contest is to engage high school students with important concepts in genetics through literature research and reflection. As 2013 marks the 60th anniversary of the discovery of the double helix of DNA by James Watson and Francis Crick and the 10th anniversary of the first sequencing of the human genome, the essay topic was to choose either of these breakthroughs and explain its broader impact on biotechnology, human health and disease, or our understanding of basic genetics, such as genetic variation or gene expression. The contest attracted 87 entrants in 2013, with the winning essay authored by Patrick Yates, a Year 12 student from Melbourne High School. Further details about the contest including the names and schools of the other finalists can be found at http://www.hgsa-essay.net.au/. The Education Committee would like to thank all the 2013 applicants and encourage students to enter in 2014.

  20. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system

    PubMed Central

    Kawano, Tomonori

    2013-01-01

    There have been a wide variety of approaches for handling the pieces of DNA as the “unplugged” tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given “passwords” and/or secret numbers using DNA sequences. The “passwords” of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original “passwords.” The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303

  1. Flanking sequence specificity determines coding microsatellite heteroduplex and mutation rates with defective DNA mismatch repair (MMR).

    PubMed

    Chung, H; Lopez, C G; Young, D J; Lai, J F; Holmstrom, J; Ream-Robinson, D; Cabrera, B L; Carethers, J M

    2010-04-15

    The activin type II receptor (ACVR2) contains two identical microsatellites in exons 3 and 10, but only the exon 10 microsatellite is frameshifted in mismatch repair (MMR)-defective colonic tumors. The reason for this selectivity is not known. We hypothesized that ACVR2 frameshifts were influenced by DNA sequences surrounding the microsatellite. We constructed plasmids in which exons 3 or 10 of ACVR2 were cloned +1 bp out of frame of enhanced green fluorescent protein (EGFP), allowing -1 bp frameshift to express EGFP. Plasmids were stably transfected into MMR-deficient cells, and subsequent non-fluorescent cells were sorted, cultured and harvested for mutation analysis. We swapped DNA sequences flanking the exon 3 and 10 microsatellites to test our hypothesis. Native ACVR2 exon 3 and 10 microsatellites underwent heteroduplex formation (A(7)/T(8)) in hMLH1(-/-) cells, but only exon 10 microsatellites fully mutated (A(7)/T(7)) in both hMLH1(-/-) and hMSH6(-/-) backgrounds, showing selectivity for exon 10 frameshifts and inability of exon 3 heteroduplexes to fully mutate. Substituting nucleotides flanking the exon 3 microsatellite for nucleotides flanking the exon 10 microsatellite significantly reduced heteroduplex and full mutation in hMLH1(-/-) cells. When the exon 3 microsatellite was flanked by nucleotides normally surrounding the exon 10 microsatellite, fully mutant exon 3 frameshifts appeared. Mutation selectivity for ACVR2 lies partly with flanking nucleotides surrounding each microsatellite.

  2. Isolation and identification of a cDNA clone coding for an HLA-DR transplantation antigen alpha-chain.

    PubMed

    Gustafsson, K; Bill, P; Larhammar, D; Wiman, K; Claesson, L; Schenning, L; Servenius, B; Sundelin, J; Rask, L; Peterson, P A

    1982-10-01

    Membrane-bound mRNA was isolated from Raji cells and enriched for message coding for the HLA-DR transplantation antigen alpha-chain by sucrose gradient centrifugation. Double-stranded cDNA was constructed from this mRNA fraction, ligated to plasmid pBR322, and cloned into Escherichia coli. By hybrid selection, a plasmid, pDR-alpha-1, able to hybridize with mRNA coding for the HLA-DR alpha-chain was identified. From the nucleotide sequence of one end of the insert an amino acid sequence was predicted which is identical to part of the amino-terminal sequence of an HLA-DR alpha-chain preparation isolated from Raji cells. This clearly shows that pDR-alpha-1 carries almost the complete message for an HLD-DR alpha-chain. From the nucleotide sequence of this plasmid it will be possible to predict the primary structure of an HLA-DR alpha-chain.

  3. Analysis of cDNA coding MHC class II beta chain of the chimpanzee (Pan troglodytes).

    PubMed

    Hatta, Yuki; Kanai, Tomoko; Matsumoto, Yoshitsugu; Kyuwa, Shigeru; Hayasaka, Ikuo; Yoshikawa, Yasuhiro

    2002-04-01

    The chimpanzee (Pan troglodytes, Patr) is the closest zoological living relative of humans and shares approximately 98.6% genetic homology to human beings. Although major histocompatibility complex (MHC) plays a critical role in T cell-mediated immune responses in vertebrates, the information on Patr MHC remains at a relatively poor level. Therefore, we attempted to isolate Patr MHC class II genes and determine their nucleotide sequences. The cDNAs encoding Patr MHC class II DP, DQ and DR beta chains were isolated from the cDNA library of a chimpanzee B lymphocyte cell line Bch261. As a result of screening, the clone 6-3-1 as a representative of Patr DP clone, clone 30-1 as a Patr DQ clone, and clones 4-7-1 and 55-1 having different sequences as Patr DR clones were detected. The clone 6-3-1 consisted of 1,062 nucleotides including an open reading frame (ORF) of 777 bp. In the same way, clone 30-1 consisted of 1,172 nucleotides including ORF of 786 bp, clones 4-7-1 and 55-1 consisted of 1,163 nucleotides including ORF of 801 bp. Except for five nucleotide changes, clones 4-7-1 and 55-1 were the same sequence. By comparison with the nucleotide sequences already reported on chimpanzee MHC class II beta 1 genes, clones 6-3-1, 30-1, 4-7-1 and 55-1 were classified as PatrDPB1*16, PatrDQB1*0302, PatrDRB1*0201 and PatrDRB1*0204, respectively. This is the first report to describe complete cDNA sequences of Patr DP and DQ molecules. The nucleotide sequence data of Patr MHC class II genes obtained in this study will be useful for the genotyping of Patr MHC class II genes in individual chimpanzees.

  4. A positive detecting code and its decoding algorithm for DNA library screening.

    PubMed

    Uehara, Hiroaki; Jimbo, Masakazu

    2009-01-01

    The study of gene functions requires high-quality DNA libraries. However, a large number of tests and screenings are necessary for compiling such libraries. We describe an algorithm for extracting as much information as possible from pooling experiments for library screening. Collections of clones are called pools, and a pooling experiment is a group test for detecting all positive clones. The probability of positiveness for each clone is estimated according to the outcomes of the pooling experiments. Clones with high chance of positiveness are subjected to confirmatory testing. In this paper, we introduce a new positive clone detecting algorithm, called the Bayesian network pool result decoder (BNPD). The performance of BNPD is compared, by simulation, with that of the Markov chain pool result decoder (MCPD) proposed by Knill et al. in 1996. Moreover, the combinatorial properties of pooling designs suitable for the proposed algorithm are discussed in conjunction with combinatorial designs and d-disjunct matrices. We also show the advantage of utilizing packing designs or BIB designs for the BNPD algorithm.

  5. Variable continental distribution of polymorphisms in the coding regions of DNA-repair genes.

    PubMed

    Mathonnet, Géraldine; Labuda, Damian; Meloche, Caroline; Wambach, Tina; Krajinovic, Maja; Sinnett, Daniel

    2003-01-01

    DNA-repair pathways are critical for maintaining the integrity of the genetic material by protecting against mutations due to exposure-induced damages or replication errors. Polymorphisms in the corresponding genes may be relevant in genetic epidemiology by modifying individual cancer susceptibility or therapeutic response. We report data on the population distribution of potentially functional variants in XRCC1, APEX1, ERCC2, ERCC4, hMLH1, and hMSH3 genes among groups representing individuals of European, Middle Eastern, African, Southeast Asian and North American descent. The data indicate little interpopulation differentiation in some of these polymorphisms and typical FST values ranging from 10 to 17% at others. Low FST was observed in APEX1 and hMSH3 exon 23 in spite of their relatively high minor allele frequencies, which could suggest the effect of balancing selection. In XRCC1, hMSH3 exon 21 and hMLH1 Africa clusters either with Middle East and Europe or with Southeast Asia, which could be related to the demographic history of human populations, whereby human migrations and genetic drift rather than selection would account for the observed differences.

  6. Adaption of SYBR Green-based reagent kit for real-time PCR quantitation of GC-rich DNA.

    PubMed

    Chang, G J; Seyfert, H M; Shen, X Z

    2015-07-28

    In the mammalian genome, approximately 50% of all genes are controlled by promoters with high GC contents. Analyzing the epigenetic mechanisms regulating their expression is difficult. Hence, we examined a method for stable quantification of such GC-rich DNA sequences. Quantification of DNA during real-time PCR is often based on reagent kits containing the fluorescent dye SYBR Green. However, these ready-made kits may not be suitable for amplifying DNA samples with a high GC content (>70%). DNA segments with eccentric GC contents are frequently found in proximal promoter areas, and their quantification may be necessary in chromatin accessibility by real-time polymerase chain reaction or chromatin immunoprecipitation analyses of epigenetic mechanisms of gene regulation. We therefore optimized the SYBR Green I FastStart reaction system by supplementing the system with dimethyl sulfoxide, betaine, and increased DNA polymerase content. Here, we describe the development of the assay and demonstrate its effectiveness for two different DNA templates, showing that these modifications allow for the reliable amplification and quantification of DNA with GC contents exceeding >70% using the LightCycler instrument.

  7. Counterintuitive DNA Sequence Dependence in Supercoiling-Induced DNA Melting

    PubMed Central

    Vlijm, Rifka; v.d. Torre, Jaco; Dekker, Cees

    2015-01-01

    The metabolism of DNA in cells relies on the balance between hybridized double-stranded DNA (dsDNA) and local de-hybridized regions of ssDNA that provide access to binding proteins. Traditional melting experiments, in which short pieces of dsDNA are heated up until the point of melting into ssDNA, have determined that AT-rich sequences have a lower binding energy than GC-rich sequences. In cells, however, the double-stranded backbone of DNA is destabilized by negative supercoiling, and not by temperature. To investigate what the effect of GC content is on DNA melting induced by negative supercoiling, we studied DNA molecules with a GC content ranging from 38% to 77%, using single-molecule magnetic tweezer measurements in which the length of a single DNA molecule is measured as a function of applied stretching force and supercoiling density. At low force (<0.5pN), supercoiling results into twisting of the dsDNA backbone and loop formation (plectonemes), without inducing any DNA melting. This process was not influenced by the DNA sequence. When negative supercoiling is introduced at increasing force, local melting of DNA is introduced. We measured for the different DNA molecules a characteristic force Fchar, at which negative supercoiling induces local melting of the dsDNA. Surprisingly, GC-rich sequences melt at lower forces than AT-rich sequences: Fchar = 0.56pN for 77% GC but 0.73pN for 38% GC. An explanation for this counterintuitive effect is provided by the realization that supercoiling densities of a few percent only induce melting of a few percent of the base pairs. As a consequence, denaturation bubbles occur in local AT-rich regions and the sequence-dependent effect arises from an increased DNA bending/torsional energy associated with the plectonemes. This new insight indicates that an increased GC-content adjacent to AT-rich DNA regions will enhance local opening of the double-stranded DNA helix. PMID:26513573

  8. East Asian mtDNA haplogroup determination in Koreans: haplogroup-level coding region SNP analysis and subhaplogroup-level control region sequence analysis.

    PubMed

    Lee, Hwan Young; Yoo, Ji-Eun; Park, Myung Jin; Chung, Ukhee; Kim, Chong-Youl; Shin, Kyoung-Jin

    2006-11-01

    The present study analyzed 21 coding region SNP markers and one deletion motif for the determination of East Asian mitochondrial DNA (mtDNA) haplogroups by designing three multiplex systems which apply single base extension methods. Using two multiplex systems, all 593 Korean mtDNAs were allocated into 15 haplogroups: M, D, D4, D5, G, M7, M8, M9, M10, M11, R, R9, B, A, and N9. As the D4 haplotypes occurred most frequently in Koreans, the third multiplex system was used to further define D4 subhaplogroups: D4a, D4b, D4e, D4g, D4h, and D4j. This method allowed the complementation of coding region information with control region mutation motifs and the resultant findings also suggest reliable control region mutation motifs for the assignment of East Asian mtDNA haplogroups. These three multiplex systems produce good results in degraded samples as they contain small PCR products (101-154 bp) for single base extension reactions. SNP scoring was performed in 101 old skeletal remains using these three systems to prove their utility in degraded samples. The sequence analysis of mtDNA control region with high incidence of haplogroup-specific mutations and the selective scoring of highly informative coding region SNPs using the three multiplex systems are useful tools for most applications involving East Asian mtDNA haplogroup determination and haplogroup-directed stringent quality control.

  9. Genotyping human ancient mtDNA control and coding region polymorphisms with a multiplexed Single-Base-Extension assay: the singular maternal history of the Tyrolean Iceman.

    PubMed

    Endicott, Phillip; Sanchez, Juan J; Pichler, Irene; Brotherton, Paul; Brooks, Jerome; Egarter-Vigl, Eduard; Cooper, Alan; Pramstaller, Peter

    2009-06-19

    Progress in the field of human ancient DNA studies has been severely restricted due to the myriad sources of potential contamination, and because of the pronounced difficulty in identifying authentic results. Improving the robustness of human aDNA results is a necessary pre-requisite to vigorously testing hypotheses about human evolution in Europe, including possible admixture with Neanderthals. This study approaches the problem of distinguishing between authentic and contaminating sequences from common European mtDNA haplogroups by applying a multiplexed Single-Base-Extension assay, containing both control and coding region sites, to DNA extracted from the Tyrolean Iceman. The multiplex assay developed for this study was able to confirm that the Iceman's mtDNA belongs to a new European mtDNA clade with a very limited distribution amongst modern data sets. Controlled contamination experiments show that the correct results are returned by the multiplex assay even in the presence of substantial amounts of exogenous DNA. The overall level of discrimination achieved by targeting both control and coding region polymorphisms in a single reaction provides a methodology capable of dealing with most cases of homoplasy prevalent in European haplogroups. The new genotyping results for the Iceman confirm the extreme fallibility of human aDNA studies in general, even when authenticated by independent replication. The sensitivity and accuracy of the multiplex Single-Base-Extension methodology forms part of an emerging suite of alternative techniques for the accurate retrieval of ancient DNA sequences from both anatomically modern humans and Neanderthals. The contamination of laboratories remains a pressing concern in aDNA studies, both in the pre and post-PCR environments, and the adoption of a forensic style assessment of a priori risks would significantly improve the credibility of results.

  10. Role of conserved non-coding DNA elements in the Foxp3 gene in regulatory T-cell fate.

    PubMed

    Zheng, Ye; Josefowicz, Steven; Chaudhry, Ashutosh; Peng, Xiao P; Forbush, Katherine; Rudensky, Alexander Y

    2010-02-11

    Immune homeostasis is dependent on tight control over the size of a population of regulatory T (T(reg)) cells capable of suppressing over-exuberant immune responses. The T(reg) cell subset is comprised of cells that commit to the T(reg) lineage by upregulating the transcription factor Foxp3 either in the thymus (tT(reg)) or in the periphery (iT(reg)). Considering a central role for Foxp3 in T(reg) cell differentiation and function, we proposed that conserved non-coding DNA sequence (CNS) elements at the Foxp3 locus encode information defining the size, composition and stability of the T(reg) cell population. Here we describe the function of three Foxp3 CNS elements (CNS1-3) in T(reg) cell fate determination in mice. The pioneer element CNS3, which acts to potently increase the frequency of T(reg) cells generated in the thymus and the periphery, binds c-Rel in in vitro assays. In contrast, CNS1, which contains a TGF-beta-NFAT response element, is superfluous for tT(reg) cell differentiation, but has a prominent role in iT(reg) cell generation in gut-associated lymphoid tissues. CNS2, although dispensable for Foxp3 induction, is required for Foxp3 expression in the progeny of dividing T(reg) cells. Foxp3 binds to CNS2 in a Cbf-beta-Runx1 and CpG DNA demethylation-dependent manner, suggesting that Foxp3 recruitment to this 'cellular memory module' facilitates the heritable maintenance of the active state of the Foxp3 locus and, therefore, T(reg) lineage stability. Together, our studies demonstrate that the composition, size and maintenance of the T(reg) cell population are controlled by Foxp3 CNS elements engaged in response to distinct cell-extrinsic or -intrinsic cues.

  11. DNA-LCEB: a high-capacity and mutation-resistant DNA data-hiding approach by employing encryption, error correcting codes, and hybrid twofold and fourfold codon-based strategy for synonymous substitution in amino acids.

    PubMed

    Hafeez, Ibbad; Khan, Asifullah; Qadir, Abdul

    2014-11-01

    Data-hiding in deoxyribonucleic acid (DNA) sequences can be used to develop an organic memory and to track parent genes in an offspring as well as in genetically modified organism. However, the main concerns regarding data-hiding in DNA sequences are the survival of organism and successful extraction of watermark from DNA. This implies that the organism should live and reproduce without any functional disorder even in the presence of the embedded data. Consequently, performing synonymous substitution in amino acids for watermarking becomes a primary option. In this regard, a hybrid watermark embedding strategy that employs synonymous substitution in both twofold and fourfold codons of amino acids is proposed. This work thus presents a high-capacity and mutation-resistant watermarking technique, DNA-LCEB, for hiding secret information in DNA of living organisms. By employing the different types of synonymous codons of amino acids, the data storage capacity has been significantly increased. It is further observed that the proposed DNA-LCEB employing a combination of synonymous substitution, lossless compression, encryption, and Bose-Chaudary-Hocquenghem coding is secure and performs better in terms of both capacity and robustness compared to existing DNA data-hiding schemes. The proposed DNA-LCEB is tested against different mutations, including silent, miss-sense, and non-sense mutations, and provides substantial improvement in terms of mutation detection/correction rate and bits per nucleotide. A web application for DNA-LCEB is available at http://111.68.99.218/DNA-LCEB.

  12. DNA vaccine coding for the rhesus prostate specific antigen delivered by intradermal electroporation in patients with relapsed prostate cancer.

    PubMed

    Eriksson, Fredrik; Tötterman, Thomas; Maltais, Anna-Karin; Pisa, Pavel; Yachnin, Jeffrey

    2013-08-20

    We tested safety, clinical efficacy and immunogenicity of a DNA vaccine coding for rhesus prostate specific antigen (PSA) delivered by intradermal injection and skin electroporation. Fifteen patients with biochemical relapse of prostate cancer without macroscopic disease participated in this phase I study. Patients were started on a 1 month course of androgen deprivation therapy (ADT) prior to treatment. Vaccine doses ranged from 50 to 1,600 μg. Study subjects received five vaccinations at four week intervals. All patients have had at least one year of follow-up. No systemic toxicity was observed. Discomfort from electroporation did not require analgesia or topical anesthetic. No clinically significant changes in PSA kinetics were observed as all patients required antiandrogen therapy shortly after completion of the 5 months of vaccination due to rising PSA. Immunogenicity, as measured by T-cell reactivity to the modified PSA peptide and to a mix of overlapping PSA peptides representing the full length protein, was observed in some patients. All but one patient had pre-study PSA specific T-cell reactivity. ADT alone resulted in increases in T-cell reactivity in most patients. Intradermal vaccination with skin electroporation is easily performed with only minor discomfort for the patient. Patients with biochemical relapse of prostate cancer are a good model for testing immune therapies. Copyright © 2013 Elsevier Ltd. All rights reserved.

  13. Evolutionary Conservation of a Coding Function for D4Z4, the Tandem DNA Repeat Mutated in Facioscapulohumeral Muscular Dystrophy

    PubMed Central

    Clapp, Jannine ; Mitchell, Laura M. ; Bolland, Daniel J. ; Fantes, Judy ; Corcoran, Anne E. ; Scotting, Paul J. ; Armour, John A. L. ; Hewitt, Jane E. 

    2007-01-01

    Facioscapulohumeral muscular dystrophy (FSHD) is caused by deletions within the polymorphic DNA tandem array D4Z4. Each D4Z4 repeat unit has an open reading frame (ORF), termed “DUX4,” containing two homeobox sequences. Because there has been no evidence of a transcript from the array, these deletions are thought to cause FSHD by a position effect on other genes. Here, we identify D4Z4 homologues in the genomes of rodents, Afrotheria (superorder of elephants and related species), and other species and show that the DUX4 ORF is conserved. Phylogenetic analysis suggests that primate and Afrotherian D4Z4 arrays are orthologous and originated from a retrotransposed copy of an intron-containing DUX gene, DUXC. Reverse-transcriptase polymerase chain reaction and RNA fluorescence and tissue in situ hybridization data indicate transcription of the mouse array. Together with the conservation of the DUX4 ORF for >100 million years, this strongly supports a coding function for D4Z4 and necessitates re-examination of current models of the FSHD disease mechanism. PMID:17668377

  14. A non-coding plastid DNA phylogeny of Asian Begonia (Begoniaceae): evidence for morphological homoplasy and sectional polyphyly.

    PubMed

    Thomas, D C; Hughes, M; Phutthai, T; Rajbhandary, S; Rubite, R; Ardi, W H; Richardson, J E

    2011-09-01

    Maximum likelihood and Bayesian analyses of non-coding plastid DNA sequence data based on a broad sampling of all major Asian Begonia sections (ndhA intron, ndhF-rpl32 spacer, rpl32-trnL spacer, 3977 aligned characters, 84 species) were used to reconstruct the phylogeny of Asian Begonia and to test the monophyly of major Asian Begonia sections. Ovary and fruit characters which are crucial in current sectional circumscriptions were mapped on the phylogeny to assess their utility in infrageneric classifications. The results indicate that the strong systematic emphasis placed on single, homoplasious characters such as undivided placenta lamellae (section Reichenheimia) and fleshy pericarps (section Sphenanthera), and the recognition of sections primarily based on a suite of plesiomorphic characters including three-locular ovaries with axillary, bilamellate placentae and dry, dehiscent pericarps (section Diploclinium), has resulted in the circumscription of several polyphyletic sections. Moreover, sections Platycentrum and Petermannia were recovered as paraphyletic. Because of the homoplasy of systematically important characters, current classifications have a certain diagnostic, but only poor predictive value. The presented phylogeny provides for the first time a reasonably resolved and supported phylogenetic framework for Asian Begonia which has the power to inform future taxonomic, biogeographic and evolutionary studies.

  15. Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

    PubMed

    Qiu, Guo-Hua

    2016-01-01

    In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense.

  16. New Insights into the Lake Chad Basin Population Structure Revealed by High-Throughput Genotyping of Mitochondrial DNA Coding SNPs

    PubMed Central

    Černý, Viktor; Carracedo, Ángel

    2011-01-01

    Background Located in the Sudan belt, the Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed. Both from an archaeological and a genetic point of view, this region has been interpreted to be the center of a bidirectional corridor connecting West and East Africa, as well as a meeting point for populations coming from North Africa through the Saharan desert. Methodology/Principal Findings Samples from twelve ethnic groups from the Chad Basin (n = 542) have been high-throughput genotyped for 230 coding region mitochondrial DNA (mtDNA) Single Nucleotide Polymorphisms (mtSNPs) using Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) mass spectrometry. This set of mtSNPs allowed for much better phylogenetic resolution than previous studies of this geographic region, enabling new insights into its population history. Notable haplogroup (hg) heterogeneity has been observed in the Chad Basin mirroring the different demographic histories of these ethnic groups. As estimated using a Bayesian framework, nomadic populations showed negative growth which was not always correlated to their estimated effective population sizes. Nomads also showed lower diversity values than sedentary groups. Conclusions/Significance Compared to sedentary population, nomads showed signals of stronger genetic drift occurring in their ancestral populations. These populations, however, retained more haplotype diversity in their hypervariable segments I (HVS-I), but not their mtSNPs, suggesting a more ancestral ethnogenesis. Whereas the nomadic population showed a higher Mediterranean influence signaled mainly by sub-lineages of M1, R0, U6, and U5, the other populations showed a more consistent sub-Saharan pattern. Although lifestyle may have an influence on diversity patterns and hg composition, analysis of molecular variance has not identified these differences. The present study indicates that analysis of mt

  17. Isolation and characterization of a cDNA clone for the complete protein coding region of the delta subunit of the mouse acetylcholine receptor.

    PubMed Central

    LaPolla, R J; Mayne, K M; Davidson, N

    1984-01-01

    A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870

  18. The Arabidopsis HOMOLOGY-DEPENDENT GENE SILENCING1 Gene Codes for an S-Adenosyl-l-Homocysteine Hydrolase Required for DNA Methylation-Dependent Gene Silencing

    PubMed Central

    Rocha, Pedro S.C.F.; Sheikh, Mazhar; Melchiorre, Rosalba; Fagard, Mathilde; Boutet, Stéphanie; Loach, Rebecca; Moffatt, Barbara; Wagner, Conrad; Vaucheret, Hervé; Furner, Ian

    2005-01-01

    Genes introduced into higher plant genomes can become silent (gene silencing) and/or cause silencing of homologous genes at unlinked sites (homology-dependent gene silencing or HDG silencing). Mutations of the HOMOLOGY-DEPENDENT GENE SILENCING1 (HOG1) locus relieve transcriptional gene silencing and methylation-dependent HDG silencing and result in genome-wide demethylation. The hog1 mutant plants also grow slowly and have low fertility and reduced seed germination. Three independent mutants of HOG1 were each found to have point mutations at the 3′ end of a gene coding for S-adenosyl-l-homocysteine (SAH) hydrolase, and hog1-1 plants show reduced SAH hydrolase activity. A transposon (hog1-4) and a T-DNA tag (hog1-5) in the HOG1 gene each behaved as zygotic embryo lethal mutants and could not be made homozygous. The results suggest that the homozygous hog1 point mutants are leaky and result in genome demethylation and poor growth and that homozygous insertion mutations result in zygotic lethality. Complementation of the hog1-1 point mutation with a T-DNA containing the gene coding for SAH hydrolase restored gene silencing, HDG silencing, DNA methylation, fast growth, and normal seed viability. The same T-DNA also complemented the zygotic embryo lethal phenotype of the hog1-4 tagged mutant. A model relating the HOG1 gene, DNA methylation, and methylation-dependent HDG silencing is presented. PMID:15659630

  19. Detecting selection in the blue crab, Callinectes sapidus, using DNA sequence data from multiple nuclear protein-coding genes.

    PubMed

    Yednock, Bree K; Neigel, Joseph E

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available.

  20. Detecting Selection in the Blue Crab, Callinectes sapidus, Using DNA Sequence Data from Multiple Nuclear Protein-Coding Genes

    PubMed Central

    Yednock, Bree K.; Neigel, Joseph E.

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available. PMID:24896825

  1. Cellulases and coding sequences

    SciTech Connect

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  2. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  3. C.U.R.R.F. (Codon Usage regarding Restriction Finder): a free Java(®)-based tool to detect potential restriction sites in both coding and non-coding DNA sequences.

    PubMed

    Gatter, Michael; Gatter, Thomas; Matthäus, Falk

    2012-10-01

    The synthesis of complete genes is becoming a more and more popular approach in heterologous gene expression. Reasons for this are the decreasing prices and the numerous advantages in comparison to classic molecular cloning methods. Two of these advantages are the possibility to adapt the codon usage to the host organism and the option to introduce restriction enzyme target sites of choice. C.U.R.R.F. (Codon Usage regarding Restriction Finder) is a free Java(®)-based software program which is able to detect possible restriction sites in both coding and non-coding DNA sequences by introducing multiple silent or non-silent mutations, respectively. The deviation of an alternative sequence containing a desired restriction motive from the sequence with the optimal codon usage is considered during the search of potential restriction sites in coding DNA and mRNA sequences as well as protein sequences. C.U.R.R.F is available at http://www.zvm.tu-dresden.de/die_tu_dresden/fakultaeten/fakultaet_mathematik_und_naturwissenschaften/fachrichtung_biologie/mikrobiologie/allgemeine_mikrobiologie/currf.

  4. Study characterizes long non-coding RNA’s response to DNA damage in colon cancer cells | Center for Cancer Research

    Cancer.gov

    Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer cells.  Read more...

  5. A homologue of the nuclear coded 49 kd subunit of bovine mitochondrial NADH-ubiquinone reductase is coded in chloroplast DNA.

    PubMed Central

    Fearnley, I M; Runswick, M J; Walker, J E

    1989-01-01

    The mitochondrial NADH-ubiquinone reductase (complex I) is an assembly of approximately 26 different polypeptides. In vertebrates and invertebrates, seven of its subunits are the products of genes in the mitochondrial DNA, and homologues of these genes have been found previously in the chloroplast genomes of Marchantia polymorpha and Nicotiana tabacum, although their function in the chloroplast is unknown. The remainder of the subunits of the mitochondrial complex are nuclear gene products that are imported into the organelle, amongst them the 49 kd subunit, a component of the iron--sulphur subcomplex of the enzyme. In the present work, the N-terminal sequence of this protein has been determined, and this has been used to design two mixtures of synthetic oligonucleotides, each containing 32 different sequences 17 bases long. These mixtures have been used as hybridization probes to isolate cDNA clones from a bovine library. The DNA sequences of these clones have been determined and they encode the mature 49 kd protein, with the exception of amino acids 1 and 2. The protein sequence of 430 amino acids is closely related to those of proteins that are encoded in open reading frames (ORFs) present in the chloroplast genomes of M.polymorpha and N.tabacum. Only one cysteine is conserved and the sequences provide no indication that the 49 kd protein contains iron--sulphur centres. These ORFs are found in the single copy regions of chloroplast DNA in close proximity to four of the homologues of the mammalian mitochondrial genes that encode subunits of complex I.(ABSTRACT TRUNCATED AT 250 WORDS) Images PMID:2498081

  6. Characterization of a cDNA clone coding for a mouse 85 kDa heat shock protein from a 3-methylcholanthrene-induced tumor

    SciTech Connect

    Moore, S.K.; Robinson, E.A.; Ullrich, S.J.; Appella, E.

    1986-05-01

    Heat shock proteins (hsp) of approx. 85 kDa have been found associated with steroid hormone receptors and with the src oncogene product. Recently, the authors have shown that a mouse tumor-associated transplantation antigen from a 3-methylcholanthrene-induced tumor shares amino acid sequence homology with the 85 kDa hsp. Amino acid sequence of peptides from this antigen were used to synthesize oligonucleotide probes to screen a mouse cDNA library. A cDNA clone coding for a 85 kDa mouse hsp has been isolated and its sequence determined. Predicted amino acid sequences from this cDNA clone share significant homology to the published 90 kDa hsp of Saccaromyces cerevisiae and to the 83 kDa hsp of Drosophila melanogaster. In addition, the predicted amino acid sequence at the carboxyl terminus shares identity with that of the 70 kDa hsp from various species as well as that of the Escherichia coli dnaK gene product. Northern blot analysis indicates that the mouse 85 kDa hsp is coded for by a kb mRNA. The size of the mRNA is indistinguishable between normal and malignant cells.

  7. HEXIM1 and NEAT1 Long Non-coding RNA Form a Multi-subunit Complex that Regulates DNA-Mediated Innate Immune Response.

    PubMed

    Morchikh, Mehdi; Cribier, Alexandra; Raffel, Raoul; Amraoui, Sonia; Cau, Julien; Severac, Dany; Dubois, Emeric; Schwartz, Olivier; Bennasser, Yamina; Benkirane, Monsef

    2017-08-03

    The DNA-mediated innate immune response underpins anti-microbial defenses and certain autoimmune diseases. Here we used immunoprecipitation, mass spectrometry, and RNA sequencing to identify a ribonuclear complex built around HEXIM1 and the long non-coding RNA NEAT1 that we dubbed the HEXIM1-DNA-PK-paraspeckle components-ribonucleoprotein complex (HDP-RNP). The HDP-RNP contains DNA-PK subunits (DNAPKc, Ku70, and Ku80) and paraspeckle proteins (SFPQ, NONO, PSPC1, RBM14, and MATRIN3). We show that binding of HEXIM1 to NEAT1 is required for its assembly. We further demonstrate that the HDP-RNP is required for the innate immune response to foreign DNA, through the cGAS-STING-IRF3 pathway. The HDP-RNP interacts with cGAS and its partner PQBP1, and their interaction is remodeled by foreign DNA. Remodeling leads to the release of paraspeckle proteins, recruitment of STING, and activation of DNAPKc and IRF3. Our study establishes the HDP-RNP as a key nuclear regulator of DNA-mediated activation of innate immune response through the cGAS-STING pathway. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Massively parallel sequencing of the entire control region and targeted coding region SNPs of degraded mtDNA using a simplified library preparation method.

    PubMed

    Lee, Eun Young; Lee, Hwan Young; Oh, Se Yoon; Jung, Sang-Eun; Yang, In Seok; Lee, Yang-Han; Yang, Woo Ick; Shin, Kyoung-Jin

    2016-05-01

    The application of next-generation sequencing (NGS) to forensic genetics is being explored by an increasing number of laboratories because of the potential of high-throughput sequencing for recovering genetic information from multiple markers and multiple individuals in a single run. A cumbersome and technically challenging library construction process is required for NGS. In this study, we propose a simplified library preparation method for mitochondrial DNA (mtDNA) analysis that involves two rounds of PCR amplification. In the first-round of multiplex PCR, six fragments covering the entire mtDNA control region and 22 fragments covering interspersed single nucleotide polymorphisms (SNPs) in the coding region that can be used to determine global haplogroups and East Asian haplogroups were amplified using template-specific primers with read sequences. In the following step, indices and platform-specific sequences for the MiSeq(®) system (Illumina) were added by PCR. The barcoded library produced using this simplified workflow was successfully sequenced on the MiSeq system using the MiSeq Reagent Nano Kit v2. A total of 0.4 GB of sequences, 80.6% with base quality of >Q30, were obtained from 12 degraded DNA samples and mapped to the revised Cambridge Reference Sequence (rCRS). A relatively even read count was obtained for all amplicons, with an average coverage of 5200 × and a less than three-fold read count difference between amplicons per sample. Control region sequences were successfully determined, and all samples were assigned to the relevant haplogroups. In addition, enhanced discrimination was observed by adding coding region SNPs to the control region in in silico analysis. Because the developed multiplex PCR system amplifies small-sized amplicons (<250 bp), NGS analysis using the library preparation method described here allows mtDNA analysis using highly degraded DNA samples. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. A gene for a Class II DNA photolyase from Oryza sativa: cloning of the cDNA by dilution-amplification.

    PubMed

    Hirouchi, T; Nakajima, S; Najrana, T; Tanaka, M; Matsunaga, T; Hidema, J; Teranishi, M; Fujino, T; Kumagai, T; Yamamoto, K

    2003-07-01

    Ultraviolet radiation induces the formation of two classes of photoproducts in DNA-the cyclobutane pyrimidine dimer (CPD) and the pyrimidine [6-4] pyrimidone photoproduct (6-4 product). Many organisms produce enzymes, termed photolyases, which specifically bind to these lesions and split them via a UV-A/blue light-dependent mechanism, thereby reversing the damage. These photolyases are specific for either CPDs or 6-4 products. Two classes of photolyases (class I and class II) repair CPDs. A gene that encodes a protein with class II CPD photolyase activity in vitro has been cloned from several plants including Arabidopsis thaliana, Cucumis sativus and Chlamydomonas reinhardtii. We report here the isolation of a homolog of this gene from rice (Oryza sativa), which was cloned on the basis of sequence similarity and PCR-based dilution-amplification. The cDNA comprises a very GC-rich (75%) 5; region, while the 3; portion has a GC content of 50%. This gene encodes a protein with CPD photolyase activity when expressed in E. coli. The CPD photolyase gene encodes at least two types of mRNA, formed by alternative splicing of exon 5. One of the mRNAs encodes an ORF for 506 amino acid residues, while the other is predicted to code for 364 amino acid residues. The two RNAs occur in about equal amounts in O. sativa cells.

  10. A sandwich-hybridization assay for simultaneous determination of HIV and tuberculosis DNA targets based on signal amplification by quantum dots-PowerVision™ polymer coding nanotracers.

    PubMed

    Yan, Zhongdan; Gan, Ning; Zhang, Huairong; Wang, De; Qiao, Li; Cao, Yuting; Li, Tianhua; Hu, Futao

    2015-09-15

    A novel sandwich-hybridization assay for simultaneous electrochemical detection of multiple DNA targets related to human immune deficiency virus (HIV) and tuberculosis (TB) was developed based on the different quantum dots-PowerVision(TM) polymer nanotracers. The polymer nanotracers were respectively fabricated by immobilizing SH-labeled oligonucleotides (s-HIV or s-TB), which can partially hybrid with virus DNA (HIV or TB), on gold nanoparticles (Au NPs) and then modified with PowerVision(TM) (PV) polymer-encapsulated quantum dots (CdS or PbS) as signal tags. PV is a dendrimer enzyme linked polymer, which can immobilize abundant QDs to amplify the stripping voltammetry signals from the metal ions (Pb or Cd). The capture probes were prepared through the immobilization of SH-labeled oligonucleotides, which can complementary with HIV and TB DNA, on the magnetic Fe3O4@Au (GMPs) beads. After sandwich-hybridization, the polymer nanotracers together with HIV and TB DNA targets were simultaneously introduced onto the surface of GMPs. Then the two encoding metal ions (Cd(2+) and Pb(2+)) were used to differentiate two viruses DNA due to the different subsequent anodic stripping voltammetric peaks at -0.84 V (Cd) and -0.61 V (Pb). Because of the excellent signal amplification of the polymer nanotracers and the great specificity of DNA targets, this assay could detect targets DNA as low as 0.2 femtomolar and exhibited excellent selectivity with the dynamitic range from 0.5 fM to 500 pM. Those results demonstrated that this electrochemical coding assay has great potential in applications for screening more viruses DNA while changing the probes.

  11. Identification of a cDNA clone that contains the complete coding sequence for a 140-kD rat NCAM polypeptide

    PubMed Central

    1987-01-01

    Neural cell adhesion molecules (NCAMs) are cell surface glycoproteins that appear to mediate cell-cell adhesion. In vertebrates NCAMs exist in at least three different polypeptide forms of apparent molecular masses 180, 140, and 120 kD. The 180- and 140-kD forms span the plasma membrane whereas the 120-kD form lacks a transmembrane region. In this study, we report the isolation of NCAM clones from an adult rat brain cDNA library. Sequence analysis indicated that the longest isolate, pR18, contains a 2,574 nucleotide open reading frame flanked by 208 bases of 5' and 409 bases of 3' untranslated sequence. The predicted polypeptide encoded by clone pR18 contains a single membrane-spanning region and a small cytoplasmic domain (120 amino acids), suggesting that it codes for a full-length 140-kD NCAM form. In Northern analysis, probes derived from 5' sequences of pR18, which presumably code for extracellular portions of the molecule hybridized to five discrete mRNA size classes (7.4, 6.7, 5.2, 4.3, and 2.9 kb) in adult rat brain but not to liver or muscle RNA. However, the 5.2- and 2.9-kb mRNA size classes did not hybridize to either a large restriction fragment or three oligonucleotides derived from the putative transmembrane coding region and regions that lie 3' to it. The 3' probes did hybridize to the 7.4-, 6.7-, and 4.3-kb message size classes. These combined results indicate that clone pR18 is derived from either the 7.4-, 6.7-, or 4.3- kb adult rat brain RNA size class. Comparison with chicken and mouse NCAM cDNA sequences suggests that pR18 represents the amino acid coding region of the 6.7- or 4.3-kb mRNA. The isolation of pR18, the first cDNA that contains the complete coding sequence of an NCAM polypeptide, unambiguously demonstrates the predicted linear amino acid sequence of this probable rat 140-kD polypeptide. This cDNA also contains a 30-base pair segment not found in NCAM cDNAs isolated from other species. The significance of this segment and other

  12. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea

    PubMed Central

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190

  13. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea.

    PubMed

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods.

  14. Genetic Code Evolution Reveals the Neutral Emergence of Mutational Robustness, and Information as an Evolutionary Constraint

    PubMed Central

    Massey, Steven E.

    2015-01-01

    The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of “neutral emergence”. The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these “pseudaptations”, and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an “unfreezing” of the codon – amino acid mapping that defines the genetic code, consistent with Crick’s Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content

  15. Genetic code evolution reveals the neutral emergence of mutational robustness, and information as an evolutionary constraint.

    PubMed

    Massey, Steven E

    2015-04-24

    The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of "neutral emergence". The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these "pseudaptations", and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an "unfreezing" of the codon - amino acid mapping that defines the genetic code, consistent with Crick's Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content between organisms, a

  16. Mitochondrial DNA of Clathrina clathrus (Calcarea, Calcinea): six linear chromosomes, fragmented rRNAs, tRNA editing, and a novel genetic code.

    PubMed

    Lavrov, Dennis V; Pett, Walker; Voigt, Oliver; Wörheide, Gert; Forget, Lise; Lang, B Franz; Kayal, Ehsan

    2013-04-01

    Sponges (phylum Porifera) are a large and ancient group of morphologically simple but ecologically important aquatic animals. Although their body plan and lifestyle are relatively uniform, sponges show extensive molecular and genetic diversity. In particular, mitochondrial genomes from three of the four previously studied classes of Porifera (Demospongiae, Hexactinellida, and Homoscleromorpha) have distinct gene contents, genome organizations, and evolutionary rates. Here, we report the mitochondrial genome of Clathrina clathrus (Calcinea, Clathrinidae), a representative of the fourth poriferan class, the Calcarea, which proves to be the most unusual. Clathrina clathrus mitochondrial DNA (mtDNA) consists of six linear chromosomes 7.6-9.4 kb in size and encodes at least 37 genes: 13 protein codings, 2 ribosomal RNAs (rRNAs), and 24 transfer RNAs (tRNAs). Protein genes include atp9, which has now been found in all major sponge lineages, but no atp8. Our analyses further reveal the presence of a novel genetic code that involves unique reassignments of the UAG codons from termination to tyrosine and of the CGN codons from arginine to glycine. Clathrina clathrus mitochondrial rRNAs are encoded in three (srRNA) and ≥6 (lrRNA) fragments distributed out of order and on several chromosomes. The encoded tRNAs contain multiple mismatches in the aminoacyl acceptor stems that are repaired posttranscriptionally by 3'-end RNA editing. Although our analysis does not resolve the phylogenetic position of calcareous sponges, likely due to their high rates of mitochondrial sequence evolution, it confirms mtDNA as a promising marker for population studies in this group. The combination of unusual mitochondrial features in C. clathrus redefines the extremes of mtDNA evolution in animals and further argues against the idea of a "typical animal mtDNA."

  17. H3.3 demarcates GC-rich coding and subtelomeric regions and serves as potential memory mark for virulence gene expression in Plasmodium falciparum

    PubMed Central

    Fraschka, Sabine Anne-Kristin; Henderson, Rob Wilhelmus Maria; Bártfai, Richárd

    2016-01-01

    Histones, by packaging and organizing the DNA into chromatin, serve as essential building blocks for eukaryotic life. The basic structure of the chromatin is established by four canonical histones (H2A, H2B, H3 and H4), while histone variants are more commonly utilized to alter the properties of specific chromatin domains. H3.3, a variant of histone H3, was found to have diverse localization patterns and functions across species but has been rather poorly studied in protists. Here we present the first genome-wide analysis of H3.3 in the malaria-causing, apicomplexan parasite, P. falciparum, which revealed a complex occupancy profile consisting of conserved and parasite-specific features. In contrast to other histone variants, PfH3.3 primarily demarcates euchromatic coding and subtelomeric repetitive sequences. Stable occupancy of PfH3.3 in these regions is largely uncoupled from the transcriptional activity and appears to be primarily dependent on the GC-content of the underlying DNA. Importantly, PfH3.3 specifically marks the promoter region of an active and poised, but not inactive antigenic variation (var) gene, thereby potentially contributing to immune evasion. Collectively, our data suggest that PfH3.3, together with other histone variants, indexes the P. falciparum genome to functionally distinct domains and contribute to a key survival strategy of this deadly pathogen. PMID:27555062

  18. Structural and functional analysis of four non-coding Y RNAs from Chinese hamster cells: identification, molecular dynamics simulations and DNA replication initiation assays.

    PubMed

    de Lima Neto, Quirino Alves; Duarte Junior, Francisco Ferreira; Bueno, Paulo Sérgio Alves; Seixas, Flavio Augusto Vicente; Kowalski, Madzia Pauline; Kheir, Eyemen; Krude, Torsten; Fernandez, Maria Aparecida

    2016-01-05

    The genes coding for Y RNAs are evolutionarily conserved in vertebrates. These non-coding RNAs are essential for the initiation of chromosomal DNA replication in vertebrate cells. However thus far, no information is available about Y RNAs in Chinese hamster cells, which have already been used to detect replication origins and alternative DNA structures around these sites. Here, we report the gene sequences and predicted structural characteristics of the Chinese hamster Y RNAs, and analyze their ability to support the initiation of chromosomal DNA replication in vitro. We identified DNA sequences in the Chinese hamster genome of four Y RNAs (chY1, chY3, chY4 and chY5) with upstream promoter sequences, which are homologous to the four main types of vertebrate Y RNAs. The chY1, chY3 and chY5 genes were highly conserved with their vertebrate counterparts, whilst the chY4 gene showed a relatively high degree of diversification from the other vertebrate Y4 genes. Molecular dynamics simulations suggest that chY4 RNA is structurally stable despite its evolutionarily divergent predicted stem structure. Of the four Y RNA genes present in the hamster genome, we found that only the chY1 and chY3 RNA were strongly expressed in the Chinese hamster GMA32 cell line, while expression of the chY4 and chY5 RNA genes was five orders of magnitude lower, suggesting that they may in fact not be expressed. We synthesized all four chY RNAs and showed that any of these four could support the initiation of DNA replication in an established human cell-free system. These data therefore establish that non-coding chY RNAs are stable structures and can substitute for human Y RNAs in a reconstituted cell-free DNA replication initiation system. The pattern of Y RNA expression and functionality is consistent with Y RNAs of other rodents, including mouse and rat.

  19. TTS Mapping: integrative WEB tool for analysis of triplex formation target DNA Sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome

    PubMed Central

    2009-01-01

    Background DNA triplexes can naturally occur, co-localize and interact with many other regulatory DNA elements (e.g. G-quadruplex (G4) DNA motifs), specific DNA-binding proteins (e.g. transcription factors (TFs)), and micro-RNA (miRNA) precursors. Specific genome localizations of triplex target DNA sites (TTSs) may cause abnormalities in a double-helix DNA structure and can be directly involved in some human diseases. However, genome localization of specific TTSs, their interconnection with regulatory DNA elements and physiological roles in a cell are poor defined. Therefore, it is important to identify comprehensive and reliable catalogue of specific potential TTSs (pTTSs) and their co-localization patterns with other regulatory DNA elements in the human genome. Results "TTS mapping" database is a web-based search engine developed here, which is aimed to find and annotate pTTSs within a region of interest of the human genome. The engine provides descriptive statistics of pTTSs in a given region and its sequence context. Different annotation tracks of TTS-overlapping gene region(s), G4 motifs, CpG Island, miRNA precursors, miRNA targets, transcription factor binding sites (TFBSs), Single Nucleotide Polymorphisms (SNPs), small nucleolar RNAs (snoRNA), and repeat elements are also mapped based onto a sequence location provided by UCSC genome browser, G4 database http://www.quadruplex.org and several other datasets. The results pages provide links to UCSC genome browser annotation tracks and relative DBs. BLASTN program was included to check the uniqueness of a given pTTS in the human genome. Recombination- and mutation-prone genes (e.g. EVI-1, MYC) were found to be significantly enriched by TTSs and multiple co-occurring with our regulatory DNA elements. TTS mapping reveals that a high-complementary and evolutionarily conserved polypurine and polypyrimidine DNA sequence pair linked by a non-conserved short DNA sequence can form miR-483 transcribed from intron 2 of

  20. TTS mapping: integrative WEB tool for analysis of triplex formation target DNA sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome.

    PubMed

    Jenjaroenpun, Piroon; Kuznetsov, Vladimir A

    2009-12-03

    DNA triplexes can naturally occur, co-localize and interact with many other regulatory DNA elements (e.g. G-quadruplex (G4) DNA motifs), specific DNA-binding proteins (e.g. transcription factors (TFs)), and micro-RNA (miRNA) precursors. Specific genome localizations of triplex target DNA sites (TTSs) may cause abnormalities in a double-helix DNA structure and can be directly involved in some human diseases. However, genome localization of specific TTSs, their interconnection with regulatory DNA elements and physiological roles in a cell are poor defined. Therefore, it is important to identify comprehensive and reliable catalogue of specific potential TTSs (pTTSs) and their co-localization patterns with other regulatory DNA elements in the human genome. "TTS mapping" database is a web-based search engine developed here, which is aimed to find and annotate pTTSs within a region of interest of the human genome. The engine provides descriptive statistics of pTTSs in a given region and its sequence context. Different annotation tracks of TTS-overlapping gene region(s), G4 motifs, CpG Island, miRNA precursors, miRNA targets, transcription factor binding sites (TFBSs), Single Nucleotide Polymorphisms (SNPs), small nucleolar RNAs (snoRNA), and repeat elements are also mapped based onto a sequence location provided by UCSC genome browser, G4 database http://www.quadruplex.org and several other datasets. The results pages provide links to UCSC genome browser annotation tracks and relative DBs. BLASTN program was included to check the uniqueness of a given pTTS in the human genome. Recombination- and mutation-prone genes (e.g. EVI-1, MYC) were found to be significantly enriched by TTSs and multiple co-occurring with our regulatory DNA elements. TTS mapping reveals that a high-complementary and evolutionarily conserved polypurine and polypyrimidine DNA sequence pair linked by a non-conserved short DNA sequence can form miR-483 transcribed from intron 2 of IGF2 gene and bound

  1. DNA Repair Is Associated with Information Content in Bacteria, Archaea, and DNA Viruses.

    PubMed

    Acosta, Sharlene; Carela, Miguelina; Garcia-Gonzalez, Aurian; Gines, Mariela; Vicens, Luis; Cruet, Ricardo; Massey, Steven E

    2015-01-01

    The concept of a "proteomic constraint" proposes that DNA repair capacity is positively correlated with the information content of a genome, which can be approximated to the size of the proteome (P). This in turn implies that DNA repair genes are more likely to be present in genomes with larger values of P. This stands in contrast to the common assumption that informational genes have a core function and so are evenly distributed across organisms. We examined the presence/absence of 18 DNA repair genes in bacterial genomes. A positive relationship between gene presence and P was observed for 17 genes in the total dataset, and 16 genes when only nonintracellular bacteria were examined. A marked reduction of DNA repair genes was observed in intracellular bacteria, consistent with their reduced value of P. We also examined archaeal and DNA virus genomes, and show that the presence of DNA repair genes is likewise related to a larger value of P. In addition, the products of the bacterial genes mutY, vsr, and ndk, involved in the correction of GC/AT mutations, are strongly associated with reduced genome GC content. We therefore propose that a reduction in information content leads to a loss of DNA repair genes and indirectly to a reduction in genome GC content in bacteria by exposure to the underlying AT mutation bias. The reduction in P may also indirectly lead to the increase in substitution rates observed in intracellular bacteria via loss of DNA repair genes.

  2. Molecular cloning and expression in photosynthetic bacteria of a soybean cDNA coding for phytoene desaturase, an enzyme of the carotenoid biosynthesis pathway.

    PubMed Central

    Bartley, G E; Viitanen, P V; Pecker, I; Chamovitz, D; Hirschberg, J; Scolnik, P A

    1991-01-01

    Carotenoids are orange, yellow, or red photo-protective pigments present in all plastids. The first carotenoid of the pathway is phytoene, a colorless compound that is converted into colored carotenoids through a series of desaturation reactions. Genes coding for carotenoid desaturases have been cloned from microbes but not from plants. We report the cloning of a cDNA for pds1, a soybean (Glycine max) gene that, based on a complementation assay using the photosynthetic bacterium Rhodobacter capsulatus, codes for an enzyme that catalyzes the two desaturation reactions that convert phytoene into zeta-carotene, a yellow carotenoid. The 2281-base-pair cDNA clone analyzed contains an open reading frame with the capacity to code for a 572-residue protein of predicted Mr 63,851. Alignment of the deduced Pds1 peptide sequence with the sequences of fungal and bacterial carotenoid desaturases revealed conservation of several amino acid residues, including a dinucleotide-binding motif that could mediate binding to FAD. The Pds1 protein is synthesized in vitro as a precursor that, upon import into isolated chloroplasts, is processed to a smaller mature form. Hybridization of the pds1 cDNA to genomic blots indicated that this gene is a member of a low-copy-number gene family. One of these loci was genetically mapped using restriction fragment length polymorphisms between Glycine max and Glycine soja. We conclude that pds1 is a nuclear gene encoding a phytoene desaturase enzyme that, as its microbial counterparts, contains sequence motifs characteristic of flavoproteins. Images PMID:1862081

  3. Identification of an androgen-repressed mRNA in rat ventral prostate as coding for sulphated glycoprotein 2 by cDNA cloning and sequence analysis.

    PubMed Central

    Bettuzzi, S; Hiipakka, R A; Gilna, P; Liao, S T

    1989-01-01

    The concentrations of a small number of mRNAs in the rat ventral prostate increase after castration and then decrease upon androgen treatment. Since the repression of specific gene expression may be important in the regulation of organ growth, we have cloned a cDNA for an androgen-repressed mRNA, the concentration of which increased 17-fold 4 days after castration, and this increase was reversed rapidly by androgen treatment. By sequence analysis the androgen-repressed mRNA was identified as that coding for sulphated glycoprotein 2. Images Fig. 1. PMID:2920020

  4. Cloning and characterization of a cDNA coding 3-hydroxy-3-methylglutary CoA reductase involved in glycyrrhizic acid biosynthesis in Glycyrrhiza uralensis.

    PubMed

    Liu, Ying; Xu, Qiao-Xian; Xi, Pei-Yu; Chen, Hong-Hao; Liu, Chun-Sheng

    2013-05-01

    The roots of Glycyrrhiza uralensis are widely used in Chinese medicine for their action of clearing heat, detoxicating, relieving cough, dispelling sputum and tonifying spleen and stomach. The reason why Glycyrrhiza uralensis has potent and significant actions is that it contains various active secondary metabolites, especially glycyrrhizic acid. In the present study, we cloned the cDNA coding 3-hydroxy-3-methylglutary CoA reductase (HMGR) involved in glycyrrhizic acid biosynthesis in Glycyrrhiza uralensis. The corresponding cDNA was expressed in Escherichia coli as fusion proteins. Recombinant HMGR exhibited catalysis activity in reduction of HMG-CoA to mevalonic acid (MVA) just as HMGR isolated from other species. Because HMGR gene is very important in the biosynthesis of glycyrrhizic acid in Glycyrrhiza uralensis, this work is significant for further studies concerned with strengthening the efficacy of Glycyrrhiza uralensis by means of increasing glycyrrhizic acid content and exploring the biosynthesis of glycyrrhizic acid in vitro.

  5. DNMT3B interacts with constitutive centromere protein CENP-C to modulate DNA methylation and the histone code at centromeric regions.

    PubMed

    Gopalakrishnan, Suhasni; Sullivan, Beth A; Trazzi, Stefania; Della Valle, Giuliano; Robertson, Keith D

    2009-09-01

    DNA methylation is an epigenetically imposed mark of transcriptional repression that is essential for maintenance of chromatin structure and genomic stability. Genome-wide methylation patterns are mediated by the combined action of three DNA methyltransferases: DNMT1, DNMT3A and DNMT3B. Compelling links exist between DNMT3B and chromosome stability as emphasized by the mitotic defects that are a hallmark of ICF syndrome, a disease arising from germline mutations in DNMT3B. Centromeric and pericentromeric regions are essential for chromosome condensation and the fidelity of segregation. Centromere regions contain distinct epigenetic marks, including dense DNA hypermethylation, yet the mechanisms by which DNA methylation is targeted to these regions remains largely unknown. In the present study, we used a yeast two-hybrid screen and identified a novel interaction between DNMT3B and constitutive centromere protein CENP-C. CENP-C is itself essential for mitosis. We confirm this interaction in mammalian cells and map the domains responsible. Using siRNA knock downs, bisulfite genomic sequencing and ChIP, we demonstrate for the first time that CENP-C recruits DNA methylation and DNMT3B to both centromeric and pericentromeric satellite repeats and that CENP-C and DNMT3B regulate the histone code in these regions, including marks characteristic of centromeric chromatin. Finally, we demonstrate that loss of CENP-C or DNMT3B leads to elevated chromosome misalignment and segregation defects during mitosis and increased transcription of centromeric repeats. Taken together, our data reveal a novel mechanism by which DNA methylation is targeted to discrete regions of the genome and contributes to chromosomal stability.

  6. Cloning and Stable Expression of cDNA Coding For Platelet Endothelial Cell Adhesion Molecule -1 (PECAM-1, CD31) in NIH-3T3 Cell Line.

    PubMed

    Salehi-Lalemarzi, Hamed; Shanehbandi, Dariush; Shafaghat, Farzaneh; Abbasi-Kenarsari, Hajar; Baradaran, Behzad; Movassaghpour, Ali Akbar; Kazemi, Tohid

    2015-06-01

    PECAM-1 (CD31) is a glycoprotein expressed on endothelial and bone marrow precursor cells. It plays important roles in angiogenesis, maintenance and integration of the cytoskeleton and direction of leukocytes to the site of inflammation. We aimed to clone the cDNA coding for human CD31 from KG1a for further subcloning and expression in NIH-3T3 mouse cell line. CD31 cDNA was cloned from KG1a cell line after total RNA extraction and cDNA synthesis. Pfu DNA polymerase-amplified specific band was ligated to pGEMT-easy vector and sub-cloned in pCMV6-Neo expression vector. After transfection of NIH-3T3 cells using 3 μg of recombinant construct and 6 μl of JetPEI transfection reagent, stable expression was obtained by selection of cells by G418 antibiotic and confirmed by surface flow cytometry. 2235 bp specific band was aligned completely to human CD31 reference sequence in NCBI database. Transient and stable expression of human CD31 on transfected NIH-3T3 mouse fibroblast cells was achieved (23% and 96%, respectively) as shown by flow cytometry. Due to murine origin of NIH-3T3 cell line, CD31-expressing NIH-3T3 cells could be useful as immunogen in production of diagnostic monoclonal antibodies against human CD31, with no need for purification of recombinant proteins.

  7. Cloning and expression of a cDNA coding for the human platelet-derived growth factor receptor: Evidence for more than one receptor class

    SciTech Connect

    Gronwald, R.G.K.; Grant, F.J.; Haldeman, B.A.; Hart, C.E.; O'Hara, P.J.; Hagen, F.S.; Ross, R.; Bowen-Pope, D.F.; Murray, M.J. )

    1988-05-01

    The complete nucleotide sequence of a cDNA encoding the human platelet-derived growth factor (PDGF) receptor is presented. The cDNA contains an open reading frame that codes for a protein of 1106 amino acids. Comparison to the mouse PDGF receptor reveals an overall amino acid sequence identity of 86%. This sequence identity rises to 98% in the cytoplasmic split tyrosine kinase domain. RNA blot hybridization analysis of poly(A){sup +} RNA from human dermal fibroblasts detects a major and a minor transcript using the cDNA as a probe. Baby hamster kidney cells, transfected with an expression vector containing the receptor cDNA, express an {approx} 190-kDa cell surface protein that is recognized by an anti-human PDGF receptor antibody. The recombinant PDGF receptor is functional in the transfected baby hamster kidney cells as demonstrated by ligand-induced phosphorylation of the receptor. Binding properties of the recombinant PDGF receptor were also assessed with pure preparations of BB and AB isoforms of PDGF. Unlike human dermal fibroblasts, which bind both isoforms with high affinity, the transfected baby hamster kidney cells bind only the BB isoform of PDGF with high affinity. This observation is consistent with the existence of more than one PDGF receptor class.

  8. Progressive multifocal leukoencephalopathy. Diagnosis by in situ hybridization with a biotinylated JC virus DNA probe using an automated Histomatic Code-On slide stainer.

    PubMed

    Hulette, C M; Downey, B T; Burger, P C

    1991-08-01

    The accurate surgical pathological diagnosis of progressive multifocal leukoencephalopathy (PML) depends on the demonstration of pathognomonic histological features in cerebral biopsy tissue. The diagnosis may be difficult, however, if only small tissue fragments are submitted from the center of a demyelinating lesion. Previous studies by other authors have established that in situ hybridization with a biotinylated JC virus DNA probe can be a valuable diagnostic adjunct because it identifies the virally infected cells with great specificity and does not depend on the larger specimen, which may be necessary for a firm histological diagnosis. To confirm and extend these findings, we have used a commercially available biotinylated JC virus DNA probe to demonstrate the presence of viral DNA in formalin-fixed, paraffin-embedded tissues from four open biopsies, four needle biopsies, and two autopsies of patients with PML. With the goal of making this procedure applicable to the general surgical pathology laboratory, this method was adapted to the Histomatic Code-On slide stainer. The Histomatic is a programmable, robotic instrument with walk-away capability for hybridization histochemistry. Operation of this instrument requires the same expertise as execution of immunocytochemistry. With the advent of commercially available JC virus DNA probes and an automated system for hybridization histochemistry, this technology for diagnosis of PML may enter the routine diagnostic surgical pathology laboratory.

  9. DNA.

    ERIC Educational Resources Information Center

    Felsenfeld, Gary

    1985-01-01

    Structural form, bonding scheme, and chromatin structure of and gene-modification experiments with deoxyribonucleic acid (DNA) are described. Indicates that DNA's double helix is variable and also flexible as it interacts with regulatory and other molecules to transfer hereditary messages. (DH)

  10. DNA.

    ERIC Educational Resources Information Center

    Felsenfeld, Gary

    1985-01-01

    Structural form, bonding scheme, and chromatin structure of and gene-modification experiments with deoxyribonucleic acid (DNA) are described. Indicates that DNA's double helix is variable and also flexible as it interacts with regulatory and other molecules to transfer hereditary messages. (DH)

  11. Temporal and spatial trends in prey composition of wahoo Acanthocybium solandri: a diet analysis from the central North Pacific Ocean using visual and DNA bar-coding techniques.

    PubMed

    Oyafuso, Z S; Toonen, R J; Franklin, E C

    2016-04-01

    A diet analysis was conducted on 444 wahoo Acanthocybium solandri caught in the central North Pacific Ocean longline fishery and a nearshore troll fishery surrounding the Hawaiian Islands from June to December 2014. In addition to traditional observational methods of stomach contents, a DNA bar-coding approach was integrated into the analysis by sequencing the cytochrome c oxidase subunit 1 (COI) region of the mtDNA genome to taxonomically identify individual prey items that could not be classified visually to species. For nearshore-caught A. solandri, juvenile pre-settlement reef fish species from various families dominated the prey composition during the summer months, followed primarily by Carangidae in autumn months. Gempylidae, Echeneidae and Scombridae were dominant prey taxa from the offshore fishery. Molidae was a common prey family found in stomachs collected north-east of the Hawaiian Archipelago while tetraodontiform reef fishes, known to have extended pelagic stages, were prominent prey items south-west of the Hawaiian Islands. The diet composition of A. solandri was indicative of an adaptive feeder and thus revealed dominant geographic and seasonal abundances of certain taxa from various ecosystems in the marine environment. The addition of molecular bar-coding to the traditional visual method of prey identifications allowed for a more comprehensive range of the prey field of A. solandri to be identified and should be used as a standard component in future diet studies.

  12. Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) - Definition of a Distinct Class of Begomovirus-Associated Satellites.

    PubMed

    Lozano, Gloria; Trenado, Helena P; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W; Navas-Castillo, Jesús

    2016-01-01

    Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem-loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem-loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed.

  13. Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) – Definition of a Distinct Class of Begomovirus-Associated Satellites

    PubMed Central

    Lozano, Gloria; Trenado, Helena P.; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W.; Navas-Castillo, Jesús

    2016-01-01

    Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem–loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem–loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed. PMID:26925037

  14. Isolation of cDNA clones coding for the alpha and beta chains of human propionyl-CoA carboxylase: chromosomal assignments and DNA polymorphisms associated with PCCA and PCCB genes.

    PubMed Central

    Lamhonwah, A M; Barankiewicz, T J; Willard, H F; Mahuran, D J; Quan, F; Gravel, R A

    1986-01-01

    Propionyl-CoA carboxylase [PCC, propanoyl-CoA:carbon-dioxide ligase (ADP-forming), EC 6.4.1.3] is a biotin-dependent enzyme involved in the degradation of branched-chain amino acids, fatty acids with odd-numbered chain lengths, and other metabolites. Inherited deficiency of the enzyme results in propionic acidemia, an autosomal recessive disorder showing considerable clinical heterogeneity. To facilitate investigations of enzyme structure and the nature of mutation in propionic acidemia, we have isolated cDNA clones coding for the alpha and beta polypeptides of human PCC. Sequences of two peptides derived from human liver PCC were used to specify oligonucleotide probes that were then used to screen a human fibroblast cDNA library. Two classes of cDNA clones were thus identified. One class contained the anticipated Ala-Met-Lys-Met sequence, corresponding to the biotin binding site found in several biotin-dependent carboxylases, thus confirming the alpha-chain assignment of these clones. In addition, they contained the deduced amino acid sequence of two of the sequenced peptides, including that of one of the oligonucleotide probes. The second class, coding for the beta polypeptide, contained the sequences of four peptides, including the sequence corresponding to the other oligonucleotide probe. Blot hybridization of RNA from normal human fibroblasts revealed a single mRNA species of 2.9 kilobases coding for the alpha polypeptide and two species of 4.5 and 2.0 kilobases detected for the beta polypeptide. By use of a panel of somatic mouse-human hybrids, the human gene encoding the alpha polypeptide (PCCA) was localized to chromosome 13, while the gene encoding the beta polypeptide (PCCB) was assigned to chromosome 3. Restriction fragment length polymorphisms were identified, at both PCCA and PCCB, that should prove useful to individual families at risk for propionic acidemia. Images PMID:3460076

  15. DNA base composition, nature of intracellular DNA, morphology, and classification of bacteriophages infecting Micrococcus luteus.

    PubMed

    Compton, S W; Mayo, J A; Ehrlich, M; Ackermann, H W; Tremblay, L; Cords, C E; Scaletti, J V

    1979-09-01

    Ten bacteriophages infecting Micrococcus luteus have been characterized. All phages contain double-stranded DNA, of 64.3--73.5 mol% guanine plus cytosine (GC). The DNA of phage N7 has the highest GC content reported for any bacterial virus. No unusual bases have been found. The intracellular replicating DNAs of six phages are covalently closed circular molecules. All 10 phages have isometric, probably icosahedral, heads and long, flexible, noncontractile tails and can be sorted into two morphological groups based on size and presence or absence of a collar. Host-range studies indicate six host-range groups.

  16. Molecular cloning and expression in Escherichia coli of the cDNA coding for rat lipocortin I (calpactin II).

    PubMed

    Shimizu, Y; Takabayashi, E; Yano, S Y; Shimizu, N; Yamada, K; Gushima, H

    1988-05-15

    Lipocortins (LC) are a family of proteins that were initially described to be induced by glucocorticosteroids and to inhibit phospholipase A2 (PLA2). Using oligodeoxynucleotide probes corresponding to partial amino acid (aa) sequences of rat lipocortin I (LCI), we have isolated a cDNA clone for rat LCI from a cDNA library prepared from poly(A)+RNA of peritoneal cells of dexamethasone-treated rat. The cDNA insert (1355 bp) had an open reading frame of 1038 bp that encoded a 346-aa polypeptide (Mr 38,784). The nucleotide sequence and the amino acid sequence deduced from it showed high homology with the reported sequences of human LCI. A plasmid containing the trc promoter and cDNA sequence for 346 aa residues of the rat LCI was constructed and expressed in Escherichia coli. Antibody to human LCI crossreacted with the recombinant rat LCI, and the recombinant protein had characteristics of natural rat LCI including PLA2 inhibitory activity in vitro.

  17. Comparative analyses of coding and noncoding DNA regions indicate that Acropora (Anthozoa: Scleractina) possesses a similar evolutionary tempo of nuclear vs. mitochondrial genomes as in plants.

    PubMed

    Chen, I-Ping; Tang, Chung-Yu; Chiou, Chih-Yung; Hsu, Jia-Ho; Wei, Nuwei Vivian; Wallace, Carden C; Muir, Paul; Wu, Henry; Chen, Chaolun Allen

    2009-01-01

    Evidence suggests that the mitochondrial (mt)DNA of anthozoans is evolving at a slower tempo than their nuclear DNA; however, parallel surveys of nuclear and mitochondrial variations and calibrated rates of both synonymous and nonsynonymous substitutions across taxa are needed in order to support this scenario. We examined species of the scleractinian coral genus Acropora, including previously unstudied species, for molecular variations in protein-coding genes and noncoding regions of both nuclear and mt genomes. DNA sequences of a calmodulin (CaM)-encoding gene region containing three exons, two introns and a 411-bp mt intergenic spacer (IGS) spanning the cytochrome b (cytb) and NADH 2 genes, were obtained from 49 Acropora species. The molecular evolutionary rates of coding and noncoding regions in nuclear and mt genomes were compared in conjunction with published data, including mt cytochrome b, the control region, and nuclear Pax-C introns. Direct sequencing of the mtIGS revealed an average interspecific variation comparable to that seen in published data for mt cytb. The average interspecific variation of the nuclear genome was two to five times greater than that of the mt genome. Based on the calibration of the closure of Panama Isthmus (3.0 mya) and closure of the Tethy Seaway (12 mya), synonymous substitution rates ranged from 0.367% to 1.467% Ma(-1) for nuclear CaM, which is about 4.8 times faster than those of mt cytb (0.076-0.303% Ma(-1)). This is similar to the findings in plant genomes that the nuclear genome is evolving at least five times faster than those of mitochondrial counterparts.

  18. The non-coding B2 RNA binds to the DNA cleft and active-site region of RNA polymerase II.

    PubMed

    Ponicsan, Steven L; Houel, Stephane; Old, William M; Ahn, Natalie G; Goodrich, James A; Kugel, Jennifer F

    2013-10-09

    The B2 family of short interspersed elements is transcribed into non-coding RNA by RNA polymerase III. The ~180-nt B2 RNA has been shown to potently repress mRNA transcription by binding tightly to RNA polymerase II (Pol II) and assembling with it into complexes on promoter DNA, where it keeps the polymerase from properly engaging the promoter DNA. Mammalian Pol II is an ~500-kDa complex that contains 12 different protein subunits, providing many possible surfaces for interaction with B2 RNA. We found that the carboxy-terminal domain of the largest Pol II subunit was not required for B2 RNA to bind Pol II and repress transcription in vitro. To identify the surface on Pol II to which the minimal functional region of B2 RNA binds, we coupled multi-step affinity purification, reversible formaldehyde cross-linking, peptide sequencing by mass spectrometry, and analysis of peptide enrichment. The Pol II peptides most highly recovered after cross-linking to B2 RNA mapped to the DNA binding cleft and active-site region of Pol II. These studies determine the location of a defined nucleic acid binding site on a large, native, multi-subunit complex and provide insight into the mechanism of transcriptional repression by B2 RNA. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Replication of a pathogenic non-coding RNA increases DNA methylation in plants associated with a bromodomain-containing viroid-binding protein

    PubMed Central

    Lv, Dian-Qiu; Liu, Shang-Wu; Zhao, Jian-Hua; Zhou, Bang-Jun; Wang, Shao-Peng; Guo, Hui-Shan; Fang, Yuan-Yuan

    2016-01-01

    Viroids are plant-pathogenic molecules made up of single-stranded circular non-coding RNAs. How replicating viroids interfere with host silencing remains largely unknown. In this study, we investigated the effects of a nuclear-replicating Potato spindle tuber viroid (PSTVd) on interference with plant RNA silencing. Using transient induction of silencing in GFP transgenic Nicotiana benthamiana plants (line 16c), we found that PSTVd replication accelerated GFP silencing and increased Virp1 mRNA, which encodes bromodomain-containing viroid-binding protein 1 and is required for PSTVd replication. DNA methylation was increased in the GFP transgene promoter of PSTVd-replicating plants, indicating involvement of transcriptional gene silencing. Consistently, accelerated GFP silencing and increased DNA methylation in the of GFP transgene promoter were detected in plants transiently expressing Virp1. Virp1 mRNA was also increased upon PSTVd infection in natural host potato plants. Reduced transcript levels of certain endogenous genes were also consistent with increases in DNA methylation in related gene promoters in PSTVd-infected potato plants. Together, our data demonstrate that PSTVd replication interferes with the nuclear silencing pathway in that host plant, and this is at least partially attributable to Virp1. This study provides new insights into the plant-viroid interaction on viroid pathogenicity by subverting the plant cell silencing machinery. PMID:27767195

  20. Molecular cloning and expression of the cDNA coding for a new member of the S100 protein family from porcine cardiac muscle.

    PubMed

    Ohta, H; Sasaki, T; Naka, M; Hiraoka, O; Miyamoto, C; Furuichi, Y; Tanaka, T

    1991-12-16

    We isolated a new calcium-binding protein from porcine cardiac muscle by calcium-dependent hydrophobic and dye-affinity chromatography. It showed an apparent molecular weight of 11,000 on SDS-PAGE. Amino acid sequence determination revealed that the protein contained two calcium-binding domains of the EF-hand motif. The cDNA gene coding for this protein was cloned from the porcine lung cDNA library. Sequence analysis of the cloned cDNA showed that the protein was composed of 99 amino acid residues and its molecular weight was estimated to be 11,179. Immunological and functional characterization showed that the recombinant S100C protein expressed in Escherichia coli was identical to the natural protein. Homologies to calpactin light chain, S100 alpha and beta protein were 41.1%, 40.9% and 37.5%, respectively. The protein was expressed at high levels in lung and kidney, and low levels in liver and brain. The tissue distribution was apparently different from those of the other S100 protein family. These results indicate that this protein represents a new member of the S100 protein family, and thus we refer to it as S100C protein.

  1. The Stat3/GR interaction code: predictive value of direct/indirect DNA recruitment for transcription outcome.

    PubMed

    Langlais, David; Couture, Catherine; Balsalobre, Aurélio; Drouin, Jacques

    2012-07-13

    Transcription factor recruitment to genomic sites of action is primarily due to direct protein:DNA interactions. The subsequent recruitment of coregulatory complexes leads to either transcriptional activation or repression. In contrast to this canonical scheme, some transcription factors, such as the glucocorticoid receptor (GR), behave as transcriptional repressors when recruited to target genes through protein tethering. We have investigated the genome-wide prevalence of tethering between GR and Stat3 and found nonreciprocal interactions, namely that GR tethering to DNA-bound Stat3 results in transcriptional repression, whereas Stat3 tethering to GR results in synergism. Further, other schemes of GR and Stat3 corecruitment to regulatory modules result in transcriptional synergism, including neighboring and composite binding sites. The results indicate extensive transcriptional interactions between Stat3 and GR; further, they provide a genome-wide assessment of transcriptional regulation by tethering and a molecular basis for integration of signals mediated by GR and Stats in health and disease.

  2. Cloning and expression of a cDNA coding for the anticoagulant hirudin from the bloodsucking leech, Hirudo medicinalis.

    PubMed Central

    Harvey, R P; Degryse, E; Stefani, L; Schamber, F; Cazenave, J P; Courtney, M; Tolstoshev, P; Lecocq, J P

    1986-01-01

    Cloned cDNAs have been isolated that encode a variant of hirudin, a potent thrombin inhibitor that is secreted by the salivary glands of the medicinal leech, Hirudo medicinalis. This variant probably corresponds to a form that has been purified from leech heads but differs in amino acid sequence from the hirudin purified from whole leeches. There are at least three hirudin transcripts detectable in leech RNAs that are different in size, site of synthesis, inducibility by starvation, and relationship to hirudin activity. The new hirudin variant predicted by the cDNA and the heterodisperse transcription products suggest a hirudin protein family. The hirudin cDNA was expressed in Escherichia coli under the control of the bacteriophage lambda PL promoter. The recombinant product is biologically active, inhibiting the cleavage by thrombin of fibrinogen and a synthetic tripeptide substrate. Images PMID:3513162

  3. Effective Protective Immunity to Yersinia pestis Infection Conferred by DNA Vaccine Coding for Derivatives of the F1 Capsular Antigen

    PubMed Central

    Grosfeld, Haim; Cohen, Sara; Bino, Tamar; Flashner, Yehuda; Ber, Raphael; Mamroud, Emanuelle; Kronman, Chanoch; Shafferman, Avigdor; Velan, Baruch

    2003-01-01

    Three plasmids expressing derivatives of the Yersinia pestis capsular F1 antigen were evaluated for their potential as DNA vaccines. These included plasmids expressing the full-length F1, F1 devoid of its putative signal peptide (deF1), and F1 fused to the signal-bearing E3 polypeptide of Semliki Forest virus (E3/F1). Expression of these derivatives in transfected HEK293 cells revealed that deF1 is expressed in the cytosol, E3/F1 is targeted to the secretory cisternae, and the nonmodified F1 is rapidly eliminated from the cell. Intramuscular vaccination of mice with these plasmids revealed that the vector expressing deF1 was the most effective in eliciting anti-F1 antibodies. This response was not limited to specific mouse strains or to the mode of DNA administration, though gene gun-mediated vaccination was by far more effective than intramuscular needle injection. Vaccination of mice with deF1 DNA conferred protection against subcutaneous infection with the virulent Y. pestis Kimberley53 strain, even at challenge amounts as high as 4,000 50% lethal doses. Antibodies appear to play a major role in mediating this protection, as demonstrated by passive transfer of anti-deF1 DNA antiserum. Taken together, these observations indicate that a tailored genetic vaccine based on a bacterial protein can be used to confer protection against plague in mice without resorting to regimens involving the use of purified proteins. PMID:12496187

  4. Structural basis for the dual coding potential of 8-oxoguanosine by a high-fidelity DNA polymerase

    PubMed Central

    Brieba, Luis G; Eichman, Brandt F; Kokoska, Robert J; Doublié, Sylvie; Kunkel, Tom A; Ellenberger, Tom

    2004-01-01

    Accurate DNA replication involves polymerases with high nucleotide selectivity and proofreading activity. We show here why both fidelity mechanisms fail when normally accurate T7 DNA polymerase bypasses the common oxidative lesion 8-oxo-7, 8-dihydro-2′-deoxyguanosine (8oG). The crystal structure of the polymerase with 8oG templating dC insertion shows that the O8 oxygen is tolerated by strong kinking of the DNA template. A model of a corresponding structure with dATP predicts steric and electrostatic clashes that would reduce but not eliminate insertion of dA. The structure of a postinsertional complex shows 8oG(syn)·dA (anti) in a Hoogsteen-like base pair at the 3′ terminus, and polymerase interactions with the minor groove surface of the mismatch that mimic those with undamaged, matched base pairs. This explains why translesion synthesis is permitted without proofreading of an 8oG·dA mismatch, thus providing insight into the high mutagenic potential of 8oG. PMID:15297882

  5. Cytoplasmic polymorphism and evolutionary history of plum cultivars: Insights from chloroplast DNA sequence variation of trnL-trnF spacer and aggregated trnL intron & trnL-trnF spacer.

    PubMed

    Mustapha, S B; Ben Tamarzizt, H; Baraket, G; Abdallah, D; Salhi-Hannachi, A

    2015-04-27

    We screened for polymorphisms of the non-coding region of plastid DNA in plum trees. Sequencing data from the trnL-trnF chloroplast region were used to reveal a pattern of diversity, establish phylogenetic relationships, and test the selection pressure or evolutionary demography scenario for plastome DNA. The size of the non-coding regions varied from 398 to 563 and 865 to 1084 bases pairs for the trnL-trnF spacer and combined sequences, respectively. The average GC contents were 33.8 and 34.4% in the spacer and pooled sequences, respectively. Genetic distances calculated within the plums were 0.077 and 0.254, on average, for the trnL spacer and combined sequences, respectively. The neighbor-joining trees showed clustering relationships among cultivars that were independent of their geographic origins and designations. The neutrality tests and site-frequency spectra indicated that spacer and pooled sequences fit the neutral theory model at equilibrium between mutation and genetic drift and reject the hypothesis of a recent demographic expansion. The mismatch distribution shows variation patterns, thus providing evidence of an important genetic diversity explained by an excess of intermediate variants that occurred in the sequences analyzed. Further implications of the findings with regard to plum germplasm management and its utilization in breeding programs are also discussed.

  6. Interspecific comparison of the period gene of Drosophila reveals large blocks of non-conserved coding DNA.

    PubMed Central

    Colot, H V; Hall, J C; Rosbash, M

    1988-01-01

    We have cloned and sequenced the coding region of the period (per) gene from Drosophila pseudoobscura and D. virilis. A comparison with that of D. melanogaster reveals that the conceptual translation products consist of interspersed blocks of conserved and non-conserved amino acid sequence. The non-conserved portion, comprising approximately 33% of the protein sequence, includes the perfect Thr-Gly repeat of D. melanogaster, which is absent from the D. pseudoobscura and D. virilis proteins. Based on these observations and cross-species transformation experiments, we suggest that the interspecific variability in the per primary amino acid sequence contributes to the control of species-specific behaviors. PMID:3208754

  7. Sequence of a novel cytochrome CYP2B cDNA coding for a protein which is expressed in a sebaceous gland, but not in the liver.

    PubMed Central

    Friedberg, T; Grassow, M A; Bartlomowicz-Oesch, B; Siegert, P; Arand, M; Adesnik, M; Oesch, F

    1992-01-01

    The major phenobarbital-inducible rat hepatic cytochromes P-450, CYP2B1 and CYP2B2, are the paradigmatic members of a cytochrome P-450 gene subfamily that contains at least seven additional members. Specific oligonucleotide probes for these genomic members of the CYP2B subfamily were used to assess their tissue-specific expression. In Northern-blot analysis a probe specific to gene 4 (which is designated now as CYP2B12) hybridized to a single mRNA present in the preputial gland, an organ which is used as a model for sebaceous glands, but did not hybridize to mRNA isolated from the liver or from five other tissues of untreated or Aroclor 1254-treated rats. The cDNA sequence for the CYP2B12 RNA was determined from overlapping cDNA clones and contained a long open reading frame of 1476 bp. The nucleotide sequence of the CYP2B12 cDNA was 85% similar to the sequence of the CYP2B1 cDNA in its coding region and was different from any CYP2B cDNA characterized until now. The cDNA-derived primary structure of the CYP2B12 protein contains a signal sequence for its insertion into the endoplasmic reticulum and the putative haem-binding site characteristic of cytochromes P-450. A part of the potential haem pocket of CYP2B12 was identical with a similar structure in a bacterial protocatechuate dioxygenase. In immunoblot analysis of preputial-gland microsomes, antibodies against CYP2B1 recognized a single abundant protein with a lower apparent molecular mass than that of CYP2B1. Our results demonstrate that the CYP2B12 protein has the potential to be enzymically active and are the first demonstration that a member of the CYP2B subfamily is expressed exclusively and at high levels in an extrahepatic organ. Images Fig. 1. Fig. 5. Fig. 6. PMID:1445240

  8. Molecular cloning of the cDNA coding for the (R)-(+)-mandelonitrile lyase of Prunus amygdalus: temporal and spatial expression patterns in flowers and mature seeds.

    PubMed

    Suelves, M; Puigdomènech, P

    1998-10-01

    A gene highly expressed in the floral organs of almond (Prunus amygdalus Batsch), and coding for the cyanogenic enzyme (R)-(+)-mandelonitrile lyase (EC 4.1.2.10), has been identified and the full-length cDNA sequenced. The temporal expression pattern in maturing seeds and during floral development was analyzed by RNA blot, and the highest mRNA levels were detected in floral tissues. The spatial mRNA accumulation pattern in almond flower buds was also analyzed by in-situ hybridization. The mRNA levels were compared during seed maturation and floral development in fruit and floral samples from cultivars classified as homozygous or heterozygous for the sweet-almond trait or homozygous for the bitter trait. No correlation was found between these characteristics and levels of mandelonitrile lyase mRNA, suggesting that the presence of this protein is not the limiting factor in the production of hydrogen cyanide.

  9. Rare Failures of DNA Bar Codes to Separate Morphologically Distinct Species in a Biodiversity Survey of Iberian Leaf Beetles

    PubMed Central

    Baselga, Andrés; Gómez-Rodríguez, Carola; Novoa, Francisco; Vogler, Alfried P.

    2013-01-01

    During a survey of genetic and species diversity patterns of leaf beetle (Coleoptera: Chrysomelidae) assemblages across the Iberian Peninsula we found a broad congruence between morphologically delimited species and variation in the cytochrome oxidase (cox1) gene. However, one species pair each in the genera Longitarsus Berthold and Pachybrachis Chevrolat was inseparable using molecular methods, whereas diagnostic morphological characters (including male or female genitalia) unequivocally separated the named species. Parsimony haplotype networks and maximum likelihood trees built from cox1 showed high genetic structure within each species pair, but no correlation with the morphological types and neither with geographic distributions. This contrasted with all analysed congeneric species, which were recovered as monophyletic. A limited number of specimens were sequenced for the nuclear 18S rRNA gene, which showed no or very limited variation within the species pair and no separation of morphological types. These results suggest that processes of lineage sorting for either group are lagging behind the clear morphological and presumably reproductive separation. In the Iberian chrysomelids, incongruence between DNA-based and morphological delimitations is a rare exception, but the discovery of these species pairs may be useful as an evolutionary model for studying the process of speciation in this ecological and geographical setting. In addition, the study of biodiversity patterns based on DNA requires an evolutionary understanding of these incongruences and their potential causes. PMID:24040352

  10. Application of DNA Bar Codes for Screening of Industrially Important Fungi: the Haplotype of Trichoderma harzianum Sensu Stricto Indicates Superior Chitinase Formation▿

    PubMed Central

    Nagy, Viviana; Seidl, Verena; Szakacs, George; Komoń-Zelazowska, Monika; Kubicek, Christian P.; Druzhinina, Irina S.

    2007-01-01

    Selection of suitable strains for biotechnological purposes is frequently a random process supported by high-throughput methods. Using chitinase production by Hypocrea lixii/Trichoderma harzianum as a model, we tested whether fungal strains with superior enzyme formation may be diagnosed by DNA bar codes. We analyzed sequences of two phylogenetic marker loci, internal transcribed spacer 1 (ITS1) and ITS2 of the rRNA-encoding gene cluster and the large intron of the elongation factor 1-alpha gene, tef1, from 50 isolates of H. lixii/T. harzianum, which were also tested to determine their ability to produce chitinases in solid-state fermentation (SSF). Statistically supported superior chitinase production was obtained for strains carrying one of the observed ITS1 and ITS2 and tef1 alleles corresponding to an allele of T. harzianum type strain CBS 226.95. A tef1-based DNA bar code tool, TrichoCHIT, for rapid identification of these strains was developed. The geographic origin of the strains was irrelevant for chitinase production. The improved chitinase production by strains containing this haplotype was not due to better growth on N-acetyl-β-d-glucosamine or glucosamine. Isoenzyme electrophoresis showed that neither the isoenzyme profile of N-acetyl-β-glucosaminidases or the endochitinases nor the intensity of staining of individual chitinase bands correlated with total chitinase in the culture filtrate. The superior chitinase producers did not exhibit similarly increased cellulase formation. Biolog Phenotype MicroArray analysis identified lack of N-acetyl-β-d-mannosamine utilization as a specific trait of strains with the chitinase-overproducing haplotype. This observation was used to develop a plate screening assay for rapid microbiological identification of the strains. The data illustrate that desired industrial properties may be an attribute of certain populations within a species, and screening procedures should thus include a balanced mixture of all

  11. Application of DNA bar codes for screening of industrially important fungi: the haplotype of Trichoderma harzianum sensu stricto indicates superior chitinase formation.

    PubMed

    Nagy, Viviana; Seidl, Verena; Szakacs, George; Komoń-Zelazowska, Monika; Kubicek, Christian P; Druzhinina, Irina S

    2007-11-01

    Selection of suitable strains for biotechnological purposes is frequently a random process supported by high-throughput methods. Using chitinase production by Hypocrea lixii/Trichoderma harzianum as a model, we tested whether fungal strains with superior enzyme formation may be diagnosed by DNA bar codes. We analyzed sequences of two phylogenetic marker loci, internal transcribed spacer 1 (ITS1) and ITS2 of the rRNA-encoding gene cluster and the large intron of the elongation factor 1-alpha gene, tef1, from 50 isolates of H. lixii/T. harzianum, which were also tested to determine their ability to produce chitinases in solid-state fermentation (SSF). Statistically supported superior chitinase production was obtained for strains carrying one of the observed ITS1 and ITS2 and tef1 alleles corresponding to an allele of T. harzianum type strain CBS 226.95. A tef1-based DNA bar code tool, TrichoCHIT, for rapid identification of these strains was developed. The geographic origin of the strains was irrelevant for chitinase production. The improved chitinase production by strains containing this haplotype was not due to better growth on N-acetyl-beta-D-glucosamine or glucosamine. Isoenzyme electrophoresis showed that neither the isoenzyme profile of N-acetyl-beta-glucosaminidases or the endochitinases nor the intensity of staining of individual chitinase bands correlated with total chitinase in the culture filtrate. The superior chitinase producers did not exhibit similarly increased cellulase formation. Biolog Phenotype MicroArray analysis identified lack of N-acetyl-beta-D-mannosamine utilization as a specific trait of strains with the chitinase-overproducing haplotype. This observation was used to develop a plate screening assay for rapid microbiological identification of the strains. The data illustrate that desired industrial properties may be an attribute of certain populations within a species, and screening procedures should thus include a balanced mixture of all

  12. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence.

    PubMed

    Neme, Rafik; Tautz, Diethard

    2016-02-02

    Deep sequencing analyses have shown that a large fraction of genomes is transcribed, but the significance of this transcription is much debated. Here, we characterize the phylogenetic turnover of poly-adenylated transcripts in a comprehensive sampling of taxa of the mouse (genus Mus), spanning a phylogenetic distance of 10 Myr. Using deep RNA sequencing we find that at a given sequencing depth transcriptome coverage becomes saturated within a taxon, but keeps extending when compared between taxa, even at this very shallow phylogenetic level. Our data show a high turnover of transcriptional states between taxa and that no major transcript-free islands exist across evolutionary time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. We conclude that any part of the non-coding genome can potentially become subject to evolutionary functionalization via de novo gene evolution within relatively short evolutionary time spans.

  13. DNA

    ERIC Educational Resources Information Center

    Stent, Gunther S.

    1970-01-01

    This history for molecular genetics and its explanation of DNA begins with an analysis of the Golden Jubilee essay papers, 1955. The paper ends stating that the higher nervous system is the one major frontier of biological inquiry which still offers some romance of research. (Author/VW)

  14. DNA

    ERIC Educational Resources Information Center

    Stent, Gunther S.

    1970-01-01

    This history for molecular genetics and its explanation of DNA begins with an analysis of the Golden Jubilee essay papers, 1955. The paper ends stating that the higher nervous system is the one major frontier of biological inquiry which still offers some romance of research. (Author/VW)

  15. Isolation and functional characterization of a cDNA coding a hydroxycinnamoyltransferase involved in phenylpropanoid biosynthesis in Cynara cardunculus L

    PubMed Central

    Comino, Cinzia; Lanteri, Sergio; Portis, Ezio; Acquadro, Alberto; Romani, Annalisa; Hehn, Alain; Larbat, Romain; Bourgaud, Frédéric

    2007-01-01

    Background Cynara cardunculus L. is an edible plant of pharmaceutical interest, in particular with respect to the polyphenolic content of its leaves. It includes three taxa: globe artichoke, cultivated cardoon, and wild cardoon. The dominating phenolics are the di-caffeoylquinic acids (such as cynarin), which are largely restricted to Cynara species, along with their precursor, chlorogenic acid (CGA). The scope of this study is to better understand CGA synthesis in this plant. Results A gene sequence encoding a hydroxycinnamoyltransferase (HCT) involved in the synthesis of CGA, was identified. Isolation of the gene sequence was achieved by using a PCR strategy with degenerated primers targeted to conserved regions of orthologous HCT sequences available. We have isolated a 717 bp cDNA which shares 84% aminoacid identity and 92% similarity with a tobacco gene responsible for the biosynthesis of CGA from p-coumaroyl-CoA and quinic acid. In silico studies revealed the globe artichoke HCT sequence clustering with one of the main acyltransferase groups (i.e. anthranilate N-hydroxycinnamoyl/benzoyltransferase). Heterologous expression of the full length HCT (GenBank accession DQ104740) cDNA in E. coli demonstrated that the recombinant enzyme efficiently synthesizes both chlorogenic acid and p-coumaroyl quinate from quinic acid and caffeoyl-CoA or p-coumaroyl-CoA, respectively, confirming its identity as a hydroxycinnamoyl-CoA: quinate HCT. Variable levels of HCT expression were shown among wild and cultivated forms of C. cardunculus subspecies. The level of expression was correlated with CGA content. Conclusion The data support the predicted involvement of the Cynara cardunculus HCT in the biosynthesis of CGA before and/or after the hydroxylation step of hydroxycinnamoyl esters. PMID:17374149

  16. Structure and expression of the gene coding for the alpha-subunit of DNA-dependent RNA polymerase from the chloroplast genome of Zea mays.

    PubMed Central

    Ruf, M; Kössel, H

    1988-01-01

    The rpoA gene coding for the alpha-subunit of DNA-dependent RNA polymerase located on the DNA of Zea mays chloroplasts has been characterized with respect to its position on the chloroplast genome and its nucleotide sequence. The amino acid sequence derived for a 39 Kd polypeptide shows strong homology with sequences derived from the rpoA genes of other chloroplast species and with the amino acid sequence of the alpha-subunit from E. coli RNA polymerase. Transcripts of the rpoA gene were identified by Northern hybridization and characterized by S1 mapping using total RNA isolated from maize chloroplasts. Antibodies raised against a synthetic C-terminal heptapeptide show cross reactivity with a 39 Kd polypeptide contained in the stroma fraction of maize chloroplasts. It is concluded that the rpoA gene is a functional gene and that therefore, at least the alpha-subunit of plastidic RNA polymerase, is expressed in chloroplasts. Images PMID:3399379

  17. Analysis of Argonaute 4-Associated Long Non-Coding RNA in Arabidopsis thaliana Sheds Novel Insights into Gene Regulation through RNA-Directed DNA Methylation.

    PubMed

    Au, Phil Chi Khang; Dennis, Elizabeth S; Wang, Ming-Bo

    2017-08-07

    RNA-directed DNA methylation (RdDM) is a plant-specific de novo DNA methylation mechanism that requires long noncoding RNA (lncRNA) as scaffold to define target genomic loci. While the role of RdDM in maintaining genome stability is well established, how it regulates protein-coding genes remains poorly understood and few RdDM target genes have been identified. In this study, we obtained sequences of RdDM-associated lncRNAs using nuclear RNA immunoprecipitation against ARGONAUTE 4 (AGO4), a key component of RdDM that binds specifically with the lncRNA. Comparison of these lncRNAs with gene expression data of RdDM mutants identified novel RdDM target genes. Surprisingly, a large proportion of these target genes were repressed in RdDM mutants suggesting that they are normally activated by RdDM. These RdDM-activated genes are more enriched for gene body lncRNA than the RdDM-repressed genes. Histone modification and RNA analyses of several RdDM-activated stress response genes detected increased levels of active histone mark and short RNA transcript in the lncRNA-overlapping gene body regions in the ago4 mutant despite the repressed expression of these genes. These results suggest that RdDM, or AGO4, may play a role in maintaining or activating stress response gene expression by directing gene body chromatin modification preventing cryptic transcription.

  18. Analysis of Argonaute 4-Associated Long Non-Coding RNA in Arabidopsis thaliana Sheds Novel Insights into Gene Regulation through RNA-Directed DNA Methylation

    PubMed Central

    Au, Phil Chi Khang; Dennis, Elizabeth S.; Wang, Ming-Bo

    2017-01-01

    RNA-directed DNA methylation (RdDM) is a plant-specific de novo DNA methylation mechanism that requires long noncoding RNA (lncRNA) as scaffold to define target genomic loci. While the role of RdDM in maintaining genome stability is well established, how it regulates protein-coding genes remains poorly understood and few RdDM target genes have been identified. In this study, we obtained sequences of RdDM-associated lncRNAs using nuclear RNA immunoprecipitation against ARGONAUTE 4 (AGO4), a key component of RdDM that binds specifically with the lncRNA. Comparison of these lncRNAs with gene expression data of RdDM mutants identified novel RdDM target genes. Surprisingly, a large proportion of these target genes were repressed in RdDM mutants suggesting that they are normally activated by RdDM. These RdDM-activated genes are more enriched for gene body lncRNA than the RdDM-repressed genes. Histone modification and RNA analyses of several RdDM-activated stress response genes detected increased levels of active histone mark and short RNA transcript in the lncRNA-overlapping gene body regions in the ago4 mutant despite the repressed expression of these genes. These results suggest that RdDM, or AGO4, may play a role in maintaining or activating stress response gene expression by directing gene body chromatin modification preventing cryptic transcription. PMID:28783101

  19. Characterization of DNA polymerase β splicing variants in gastric cancer: the most frequent exon 2-deleted isoform is a non coding RNA

    PubMed Central

    Simonelli, Valeria; D’Errico, Mariarosaria; Palli, Domenico; Prasad, Rajendra; Wilson, Samuel H.; Dogliotti, Eugenia

    2009-01-01

    DNA repair polymerase β (Pol β) gene variants are frequently associated with tumor tissues. In this study a search for Pol β mutants and splice variants was conducted in matched normal and tumor gastric tissues and blood samples from healthy donors. No tumor associated mutations were found while a variety of alternative Pol β splicing variants were detected with high frequency in all the specimens analysed. Quantitative PCR of the Pol β variant lacking exon 2 (Ex2Δ) and the isoforms with exon 11 skipping allowed to clarify that these variants are not tumor- neither tissue-specific and their levels vary greatly among different individuals. The most frequent Ex2Δ variant was further characterized. We clearly demonstrated that this variant does not encode protein, as detected by both western blotting and immunofluorescence analysis of human AGS cells expressing HA tagged-Ex2Δ. The lack of translation was confirmed by comparing the DNA gap-filling capacity and alkylation sensitivity of wild type and Pol β null murine fibroblasts expressing the human Ex2Δ variant. We showed that the Ex2Δ transcript is polyadenylated and its half-life is significantly longer than that of the wild type mRNA as inferred by treating AGS cells with actinomycin D. Moreover, we found that it localizes to polyribosomes suggesting a role as post-transcriptional regulator. This study identifies a new type of DNA repair variants that do not give rise to functional proteins but to non coding RNAs that could either modulate target mRNAs or represent unproductive splicing events. PMID:19635489

  20. The landscape of DNA methylation-mediated regulation of long non-coding RNAs in breast cancer

    PubMed Central

    Li, Xuecang; Zhao, Ning; Wang, Yihan; Han, Xiaole; Ci, Ce; Zhang, Jian; Li, Meng; Zhang, Yan

    2017-01-01

    Although systematic studies have identified a host of long non-coding RNAs (lncRNAs) which are involved in breast cancer, the knowledge about the methyla-tion-mediated dysregulation of those lncRNAs remains limited. Here, we integrated multi-omics data to analyze the methylated alteration of lncRNAs in breast invasive carcinoma (BRCA). We found that lncRNAs showed diverse methylation patterns on promoter regions in BRCA. LncRNAs were divided into two categories and four subcategories based on their promoter methylation patterns and expression levels be-tween tumor and normal samples. Through cis-regulatory analysis and gene ontology network, abnormally methylated lncRNAs were identified to be associated with can-cer regulation, proliferation or expression of transcription factors. Competing endog-enous RNA network and functional enrichment analysis of abnormally methylated lncRNAs showed that lncRNAs with different methylation patterns were involved in several hallmarks and KEGG pathways of cancers significantly. Finally, survival analysis based on mRNA modules in networks revealed that lncRNAs silenced by high methylation were associated with prognosis significantly in BRCA. This study enhances the understanding of aberrantly methylated patterns of lncRNAs and pro-vides a novel insight for identifying cancer biomarkers and potential therapeutic tar-gets in breast cancer. PMID:28881636

  1. Lichenase and coding sequences

    SciTech Connect

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  2. Ribosomal DNA analysis of tsetse and non-tsetse transmitted Ethiopian Trypanosoma vivax strains in view of improved molecular diagnosis.

    PubMed

    Fikru, Regassa; Matetovici, Irina; Rogé, Stijn; Merga, Bekana; Goddeeris, Bruno Maria; Büscher, Philippe; Van Reet, Nick

    2016-04-15

    Animal trypanosomosis caused by Trypanosoma vivax (T. vivax) is a devastating disease causing serious economic losses. Most molecular diagnostics for T. vivax infection target the ribosomal DNA locus (rDNA) but are challenged by the heterogeneity among T. vivax strains. In this study, we investigated the rDNA heterogeneity of Ethiopian T. vivax strains in relation to their presence in tsetse-infested and tsetse-free areas and its effect on molecular diagnosis. We sequenced the rDNA loci of six Ethiopian (three from tsetse-infested and three from tsetse-free areas) and one Nigerian T. vivax strain. We analysed the obtained sequences in silico for primer-mismatches of some commonly used diagnostic PCR assays and for GC content. With these data, we selected some rDNA diagnostic PCR assays for evaluation of their diagnostic accuracy. Furthermore we constructed two phylogenetic networks based on sequences within the smaller subunit (SSU) of 18S and within the 5.8S and internal transcribed spacer 2 (ITS2) to assess the relatedness of Ethiopian T. vivax strains to strains from other African countries and from South America. In silico analysis of the rDNA sequence showed important mismatches of some published diagnostic PCR primers and high GC content of T. vivax rDNA. The evaluation of selected diagnostic PCR assays with specimens from cattle under natural T. vivax challenge showed that this high GC content interferes with the diagnostic accuracy of PCR, especially in cases of mixed infections with T. congolense. Adding betain to the PCR reaction mixture can enhance the amplification of T. vivax rDNA but decreases the sensitivity for T. congolense and Trypanozoon. The networks illustrated that Ethiopian T. vivax strains are considerably heterogeneous and two strains (one from tsetse-infested and one from tsetse-free area) are more related to the West African and South American strains than to the East African strains. The rDNA locus sequence of six Ethiopian T. vivax

  3. Sequence analysis of coding DNA fragments of pfcrt and pfmdr-1 genes in Plasmodium falciparum isolates from Odisha, India.

    PubMed

    Sutar, Sasmita Kumari Das; Gupta, Bhavna; Ranjit, Manoranjan; Kar, Shantanu Kumar; Das, Aparup

    2011-02-01

    The global emergence and spread of malaria parasites resistant to antimalarial drugs is the major problem in malaria control. The genetic basis of the parasite's resistance to the antimalarial drug chloroquine (CQ) is well-documented, allowing for the analysis of field isolates of malaria parasites to address evolutionary questions concerning the origin and spread of CQ-resistance. Here, we present DNA sequence analyses of both the second exon of the Plasmodium falciparum CQ-resistance transporter (pfcrt) gene and the 5' end of the P. falciparum multidrug-resistance 1 (pfmdr-1) gene in 40 P. falciparum field isolates collected from eight different localities of Odisha, India. First, we genotyped the samples for the pfcrt K76T and pfmdr-1 N86Y mutations in these two genes, which are the mutations primarily implicated in CQ-resistance. We further analyzed amino acid changes in codons 72-76 of the pfcrt haplotypes. Interestingly, both the K76T and N86Y mutations were found to co-exist in 32 out of the total 40 isolates, which were of either the CVIET or SVMNT haplotype, while the remaining eight isolates were of the CVMNK haplotype. In total, eight nonsynonymous single nucleotide polymorphisms (SNPs) were observed, six in the pfcrt gene and two in the pfmdr-1 gene. One poorly studied SNP in the pfcrt gene (A97T) was found at a high frequency in many P. falciparum samples. Using population genetics to analyze these two gene fragments, we revealed comparatively higher nucleotide diversity in the pfcrt gene than in the pfmdr-1 gene. Furthermore, linkage disequilibrium was found to be tight between closely spaced SNPs of the pfcrt gene. Finally, both the pfcrt and the pfmdr-1 genes were found to evolve under the standard neutral model of molecular evolution.

  4. Testing the use of ITS rDNA and protein-coding genes in the generic and species delimitation of the lichen genus Usnea (Parmeliaceae, Ascomycota).

    PubMed

    Truong, Camille; Divakar, Pradeep K; Yahr, Rebecca; Crespo, Ana; Clerc, Philippe

    2013-08-01

    In lichen-forming fungi, traditional taxonomical concepts are frequently in conflict with molecular data, and identifying appropriate taxonomic characters to describe phylogenetic clades remains challenging in many groups. The selection of suitable markers for the reconstruction of solid phylogenetic hypotheses is therefore fundamental. The lichen genus Usnea is highly diverse, with more than 350 estimated species, distributed in polar, temperate and tropical regions. The phylogeny and classification of Usnea have been a matter of debate, given the lack of phenotypic characters to describe phylogenetic clades and the low degree of resolution of phylogenetic trees. In this study, we investigated the phylogenetic relationships of 52 Usnea species from across the genus, based on ITS rDNA, nuLSU, and two protein-coding genes RPB1 and MCM7. ITS comprised several highly variable regions, containing substantial genetic signal, but also susceptible to causing bias in the generation of the alignment. We compared several methods of alignment of ITS and found that a simultaneous optimization of alignment and phylogeny (using BAli-phy) improved significantly both the topology and the resolution of the phylogenetic tree. However the resolution was even better when using protein-coding genes, especially RPB1 although it is less variable. The phylogeny based on the concatenated dataset revealed that the genus Usnea is subdivided into four highly-supported clades, corresponding to the traditionally circumscribed subgenera Eumitria, Dolichousnea, Neuropogon and Usnea. However, characters that have been used to describe these clades are often homoplasious within the phylogeny and their parallel evolution is suggested. On the other hand, most of the species were reconstructed as monophyletic, indicating that combinations of phenotypic characters are suitable discriminators for delimitating species, but are inadequate to describe generic subdivisions.

  5. The effect of non-coding DNA variations on P53 and cMYC competitive inhibition at cis-overlapping motifs.

    PubMed

    Kin, Katherine; Chen, Xi; Gonzalez-Garay, Manuel; Fakhouri, Walid D

    2016-04-15

    Non-coding DNA variations play a critical role in increasing the risk for development of common complex diseases, and account for the majority of SNPs highly associated with cancer. However, it remains a challenge to identify etiologic variants and to predict their pathological effects on target gene expression for clinical purposes. Cis-overlapping motifs (COMs) are elements of enhancer regions that impact gene expression by enabling competitive binding and switching between transcription factors. Mutations within COMs are especially important when the involved transcription factors have opposing effects on gene regulation, like P53 tumor suppressor and cMYC proto-oncogene. In this study, genome-wide analysis of ChIP-seq data from human cancer and mouse embryonic cells identified a significant number of putative regulatory elements with signals for both P53 and cMYC. Each co-occupied element contains, on average, two COMs, and one common SNP every two COMs. Gene ontology of predicted target genes for COMs showed that the majority are involved in DNA damage, apoptosis, cell cycle regulation, and RNA processing. EMSA results showed that both cMYC and P53 bind to cis-overlapping motifs within a ChIP-seq co-occupied region in Chr12. In vitro functional analysis of selected co-occupied elements verified enhancer activity, and also showed that the occurrence of SNPs within three COMs significantly altered enhancer activity. We identified a list of COM-associated functional SNPs that are in close proximity to SNPs associated with common diseases in large population studies. These results suggest a potential molecular mechanism to identify etiologic regulatory mutations associated with common diseases.

  6. Cloning and characterization of a cDNA coding for Astacus embryonic astacin, a member of the astacin family of metalloproteases from the crayfish Astacus astacus.

    PubMed

    Geier, G; Zwilling, R

    1998-05-01

    The astacin family of zinc endopeptidases was named after the digestive enzyme astacin isolated from the crayfish Astacus astacus. Employing a reverse transcription/PCR strategy with degenerate oligonucleotide primers specific for two signature seqences of the astacin family, we have isolated a 1602-bp cDNA from embryos of developing A. astacus eggs, which was designated Astacus embryonic astacin (AEA). This cDNA was found to code for an astacin-like protease domain which accounts for the N-terminal half of the predicted protein. The C-terminal half mainly consists of two complement subcomponent C1r/C1s/embryonic sea urchin protein Uegf/bone morphogenetic protein 1 (CUB) domains. The metalloprotease domain displays an amino acid sequence identity of 42% with astacin. A higher sequence similarity was found to astacin family members that act as hatching enzymes in different species, e.g. chorioallantoic membrane protein 1 (CAM-1; from quail) and Xenopus hatching enzyme (formerly UVS.2), both of which show 54% identity, and high and low choriolytic enzymes (HCE and LCE) from the teleost Oryzias latipes (52% and 48% identity, respectively). A relationship to astacin-like hatching enzymes is further supported by a phylogenetic analysis of the protease domains. Expression of AEA mRNA in developing embryos was found to be restricted to unhatched juveniles (larvae) during the last 8 days before hatching. AEA transcripts could not be detected in various tissues of adult animals or in eggs and embryos from an earlier developmental stage. AEA expression starts about 8 days prior to hatching, followed by a strong (18-fold) induction with a maximum at day 4 before hatching. Newly hatched juveniles were found not to express the AEA mRNA.

  7. Isolation and sequencing of cDNA clones coding for the catalytic unit of glucose-6-phosphatase from two haplochromine cichlid fishes.

    PubMed

    Nagl, S; Mayer, W E; Klein, J

    1999-01-01

    Complementary DNA clones coding for the catalytic unit of the enzyme glucose-6-phosphatase (G6Pase) were obtained from Haplochromis nubilus and Haplochromis xenognathus, two cichlid fish species from Lake Victoria. The translated sequence of these two cDNAs identifies a polypeptide consisting of 352 amino acid residues and showing a 54.4% similarity to the human form of G6Pase. The amino acid sequences of the two fish species are identical. The comparison of the fish amino acid sequence with the corresponding sequences of rat, mouse, and human G6Pase revealed that the amino acid residues, which are involved in G6Pase catalysis in humans, are also conserved in fish G6Pase. Northern blot analysis showed that G6Pase is expressed at the same level in 6- and 10-day-old fish. A three base pair insertion/deletion polymorphism was found in the 3'-untranslated region of the fish G6Pase gene. The polymorphism will be a useful marker in a phylogenetic study of Lake Victoria cichlids.

  8. Clinical coding. Code breakers.

    PubMed

    Mathieson, Steve

    2005-02-24

    --The advent of payment by results has seen the role of the clinical coder pushed to the fore in England. --Examinations for a clinical coding qualification began in 1999. In 2004, approximately 200 people took the qualification. --Trusts are attracting people to the role by offering training from scratch or through modern apprenticeships.

  9. A region of the polyoma virus genome between the replication origin and late protein coding sequences is required in cis for both early gene expression and viral DNA replication.

    PubMed Central

    Tyndall, C; La Mantia, G; Thacker, C M; Favaloro, J; Kamen, R

    1981-01-01

    Deletion mutants within the Py DNA region between the replication origin and the beginning of late protein coding sequences have been constructed and analysed for viability, early gene expression and viral DNA replication. Assay of replicative competence was facilitated by the use of Py transformed mouse cells (COP lines) which express functional large T-protein but contain no free viral DNA. Viable mutants defined three new nonessential regions of the genome. Certain deletions spanning the PvuII site at nt 5130 (67.4 mu) were unable to express early genes and had a cis-acting defect in DNA replication. Other mutants had intermediate phenotypes. Relevance of these results to eucaryotic "enhancer" elements is discussed. Images PMID:6275353

  10. DNA Dynamics.

    ERIC Educational Resources Information Center

    Warren, Michael D.

    1997-01-01

    Explains a method to enable students to understand DNA and protein synthesis using model-building and role-playing. Acquaints students with the triplet code and transcription. Includes copies of the charts used in this technique. (DDR)

  11. Single-molecule study of DNA polymerization activity of HIV-1 reverse transcriptase on DNA templates.

    PubMed

    Kim, Sangjin; Schroeder, Charles M; Xie, X Sunney

    2010-02-05

    HIV-1 RT (human immunodeficiency virus-1 reverse transcriptase) is a multifunctional polymerase responsible for reverse transcription of the HIV genome, including DNA replication on both RNA and DNA templates. During reverse transcription in vivo, HIV-1 RT replicates through various secondary structures on RNA and single-stranded DNA (ssDNA) templates without the need for a nucleic acid unwinding protein, such as a helicase. In order to understand the mechanism of polymerization through secondary structures, we investigated the DNA polymerization activity of HIV-1 RT on long ssDNA templates using a multiplexed single-molecule DNA flow-stretching assay. We observed that HIV-1 RT performs fast primer extension DNA synthesis on single-stranded regions of DNA (18.7 nt/s) and switches its activity to slow strand displacement synthesis at DNA hairpin locations (2.3 nt/s). Furthermore, we found that the rate of strand displacement synthesis is dependent on the GC content in hairpin stems and template stretching force. This indicates that the strand displacement synthesis occurs through a mechanism that is neither completely active nor passive: that is, the opening of the DNA hairpin is driven by a combination of free energy released during dNTP (deoxyribonucleotide triphosphate) hydrolysis and thermal fraying of base pairs. Our experimental observations provide new insight into the interchanging modes of DNA replication by HIV-1 RT on long ssDNA templates.

  12. The evolution of the mitochondrial genetic code in arthropods revisited.

    PubMed

    Abascal, Federico; Posada, David; Zardoya, Rafael

    2012-04-01

    A variant of the invertebrate mitochondrial genetic code was previously identified in arthropods (Abascal et al. 2006a, PLoS Biol 4:e127) in which, instead of translating the AGG codon as serine, as in other invertebrates, some arthropods translate AGG as lysine. Here, we revisit the evolution of the genetic code in arthropods taking into account that (1) the number of arthropod mitochondrial genomes sequenced has triplicated since the original findings were published; (2) the phylogeny of arthropods has been recently resolved with confidence for many groups; and (3) sophisticated probabilistic methods can be applied to analyze the evolution of the genetic code in arthropod mitochondria. According to our analyses, evolutionary shifts in the genetic code have been more common than previously inferred, with many taxonomic groups displaying two alternative codes. Ancestral character-state reconstruction using probabilistic methods confirmed that the arthropod ancestor most likely translated AGG as lysine. Point mutations at tRNA-Lys and tRNA-Ser correlated with the meaning of the AGG codon. In addition, we identified three variables (GC content, number of AGG codons, and taxonomic information) that best explain the use of each of the two alternative genetic codes.

  13. RNA at 92 °C: the non-coding transcriptome of the hyperthermophilic archaeon Pyrococcus abyssi.

    PubMed

    Toffano-Nioche, Claire; Ott, Alban; Crozat, Estelle; Nguyen, An N; Zytnicki, Matthias; Leclerc, Fabrice; Forterre, Patrick; Bouloc, Philippe; Gautheret, Daniel

    2013-07-01

    The non-coding transcriptome of the hyperthermophilic archaeon Pyrococcus abyssi is investigated using the RNA-seq technology. A dedicated computational pipeline analyzes RNA-seq reads and prior genome annotation to identify small RNAs, untranslated regions of mRNAs, and cis-encoded antisense transcripts. Unlike other archaea, such as Sulfolobus and Halobacteriales, P. abyssi produces few leaderless mRNA transcripts. Antisense transcription is widespread (215 transcripts) and targets protein-coding genes that are less conserved than average genes. We identify at least three novel H/ACA-like guide RNAs among the newly characterized non-coding RNAs. Long 5' UTRs in mRNAs of ribosomal proteins and amino-acid biosynthesis genes strongly suggest the presence of cis-regulatory leaders in these mRNAs. We selected a high-interest subset of non-coding RNAs based on their strong promoters, high GC-content, phylogenetic conservation, or abundance. Some of the novel small RNAs and long 5' UTRs display high GC contents, suggesting unknown structural RNA functions. However, we were surprised to observe that most of the high-interest RNAs are AU-rich, which suggests an absence of stable secondary structure in the high-temperature environment of P. abyssi. Yet, these transcripts display other hallmarks of functionality, such as high expression or high conservation, which leads us to consider possible RNA functions that do not require extensive secondary structure.

  14. Systematic biases in DNA copy number originate from isolation procedures

    PubMed Central

    2013-01-01

    Background The ability to accurately detect DNA copy number variation in both a sensitive and quantitative manner is important in many research areas. However, genome-wide DNA copy number analyses are complicated by variations in detection signal. Results While GC content has been used to correct for this, here we show that coverage biases are tissue-specific and independent of the detection method as demonstrated by next-generation sequencing and array CGH. Moreover, we show that DNA isolation stringency affects the degree of equimolar coverage and that the observed biases coincide with chromatin characteristics like gene expression, genomic isochores, and replication timing. Conclusion These results indicate that chromatin organization is a main determinant for differential DNA retrieval. These findings are highly relevant for germline and somatic DNA copy number variation analyses. PMID:23618369

  15. SINGLE-MOLECULE STUDY OF DNA POLYMERIZATION ACTIVITY OF HIV-1 REVERSE TRANSCRIPTASE ON DNA TEMPLATES

    PubMed Central

    Kim, Sangjin; Schroeder, Charles M.; Xie, X. Sunney

    2009-01-01

    Human Immunodeficiency Virus-1 reverse transcriptase (HIV-1 RT) is a multifunctional polymerase responsible for reverse transcription of the HIV genome, including DNA replication on both RNA and DNA templates. During reverse transcription in vivo, HIV-1 RT replicates through various secondary structures on RNA and single-stranded DNA templates without the need for a nucleic acid unwinding protein, such as a helicase. In order to understand the mechanism of polymerization through secondary structures, we investigated the DNA polymerization activity of HIV-1 RT on long single-stranded DNA templates using a multiplexed single-molecule DNA flow-stretching assay. We observed that HIV-1 RT performs fast primer extension DNA synthesis on single-stranded regions of DNA (18.7 nt/s) and switches its activity to slow strand displacement synthesis at DNA hairpin locations (2.3 nt/s). Furthermore, we found that the rate of strand displacement synthesis is dependent on the GC content in hairpin stems and template stretching force. This indicates that the strand displacement synthesis occurs through a mechanism that is neither completely active nor passive, i.e. the opening of the DNA hairpin is driven by a combination of free energy released during dNTP hydrolysis and thermal fraying of base pairs. Our experimental observations provide new insight into the interchanging modes of DNA replication by HIV-1 RT on long single-stranded DNA templates. PMID:19968999

  16. [A DNA study of rat liver oligonucleosomes enriched by transcriptionally active genes during induction due to the administration of an amino acid mixture].

    PubMed

    Vardevanian, P O; Davtian, A M; Tiratsuian, S G; Vardevanian, A O

    1990-01-01

    A highly active fraction of rat liver oligonucleosome DNA has been isolated and studied by means of thermal denaturation after induction by amino acid mixture or hydrocortisone. A considerable redistribution of DNA content has been shown in sucrose gradient fractions during these forms of induction. The changes are revealed in melting temperature, differential melting profile of DNA, isolated from actively transcribed chromatine fractions. Analysis of melting profiles shows changes of GC content of oligonucleosome DNA, suggesting that there are differences in activation during two studied forms of induction.

  17. The Cipher Code of Simple Sequence Repeats in "Vampire Pathogens".

    PubMed

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-07-28

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like "vampire pathogens" (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation.

  18. Study on sequences of ribosomal DNA internal transcribed spacers of clams belonging to the Veneridae family (Mollusca: Bivalvia).

    PubMed

    Cheng, Han-Liang; Xia, De-Quan; Wu, Ting-Ting; Meng, Xue-Ping; Ji, Hong-Ju; Dong, Zhi-Guo

    2006-08-01

    The first and second internal transcribed spacer (ITS1 and ITS2) regions of the ribosomal DNA from four species, Meretrix meretrix L., Cyclina sinensis G., Mercenaria mercenaria L., and Protothaca jedoensis L., belonging to the family Veneridae were amplified by PCR and sequenced. The size of the ITS1 PCR amplification product ranged from 663 bp to 978 bp, with GC contents ranging from 60.78% to 64.97%. The size of the ITS1 sequence ranged from 585 bp to 900 bp, which is the largest range reported thus far in bivalve species, with GC contents ranging from 61.03% to 65.62%. The size of the ITS2 PCR amplification product ranged from 513 bp to 644 bp, with GC contents ranging from 61.29% to 62.73%. The size of the ITS2 sequence ranged from 281 bp to 412 bp, with GC contents ranging from 65.21% to 67.87%. Extensive sequence variation and obvious length polymorphisms were noted for both regions in these species, and sequence similarity of ITS2 was higher than that of ITS1 across species. The complete sequences of 5.8S ribosomal RNA gene were obtained by assembling ITS1 and ITS2 sequences, and the sequence length in all species was 157 bp. The phylogenetic tree of Veneridae clams was reconstructed using ITS2-containing partial sequences of both 5.8S and 28S ribosomal DNA as markers and the corresponding sequence information in Arctica islandica as the outgroup. Tree topologies indicated that P. jedoensis shared a close relationship with M. mercenaria and C. sinensis, a distant relationship with other species.

  19. DNA nucleoside composition and methylation in several species of microalgae

    SciTech Connect

    Jarvis, E.E.; Dunahay, T.G.; Brown, L.M. )

    1992-06-01

    Total DNA was isolated from 10 species of microalgae, including representatives of the Chlorophyceae (Chlorella ellipsoidea, Chlamydomonas reinhardtii, and Monoraphidium minutum), Bacillariophyceae (Cyclotella cryptica, Navicula saprophila, Nitzschia pusilla, and Phaeodactylum tricornutum), Charophyceae (Stichococcus sp.), Dinophyceae (Crypthecodinium cohnii), and Prasinophyceae (Tetraselmis suecica). Control samples of Escherichia coli and calf thymus DNA were also analyzed. The nucleoside base composition of each DNA sample was determined by reversed-phase high performance liquid chromatography. All samples contained 5-methyldeoxycytidine, although at widely varying levels. In M. minutum, about one-third of the cytidine residues were methylated. Restriction analysis supported this high degree of methylation in M. minutum and suggested that methylation is biased toward 5[prime]-CG dinucleotides. The guanosine + cytosine (GC) contents of the green algae were, with the exception of Stichococcus sp., consistently higher than those of the diatoms. Monoraphidium minutum exhibited an extremely high GC content of 71%. Such a value is rare among eukaryotic organisms and might indicate an unusual codon usage. This work is important for developing strategies for transformation and gene cloning in these algae. 46 refs., 1 fig., 2 tabs.

  20. Superstatistical model of bacterial DNA architecture

    PubMed Central

    Bogachev, Mikhail I.; Markelov, Oleg A.; Kayumov, Airat R.; Bunde, Armin

    2017-01-01

    Understanding the physical principles that govern the complex DNA structural organization as well as its mechanical and thermodynamical properties is essential for the advancement in both life sciences and genetic engineering. Recently we have discovered that the complex DNA organization is explicitly reflected in the arrangement of nucleotides depicted by the universal power law tailed internucleotide interval distribution that is valid for complete genomes of various prokaryotic and eukaryotic organisms. Here we suggest a superstatistical model that represents a long DNA molecule by a series of consecutive ~150 bp DNA segments with the alternation of the local nucleotide composition between segments exhibiting long-range correlations. We show that the superstatistical model and the corresponding DNA generation algorithm explicitly reproduce the laws governing the empirical nucleotide arrangement properties of the DNA sequences for various global GC contents and optimal living temperatures. Finally, we discuss the relevance of our model in terms of the DNA mechanical properties. As an outlook, we focus on finding the DNA sequences that encode a given protein while simultaneously reproducing the nucleotide arrangement laws observed from empirical genomes, that may be of interest in the optimization of genetic engineering of long DNA molecules. PMID:28225058

  1. Superstatistical model of bacterial DNA architecture

    NASA Astrophysics Data System (ADS)

    Bogachev, Mikhail I.; Markelov, Oleg A.; Kayumov, Airat R.; Bunde, Armin

    2017-02-01

    Understanding the physical principles that govern the complex DNA structural organization as well as its mechanical and thermodynamical properties is essential for the advancement in both life sciences and genetic engineering. Recently we have discovered that the complex DNA organization is explicitly reflected in the arrangement of nucleotides depicted by the universal power law tailed internucleotide interval distribution that is valid for complete genomes of various prokaryotic and eukaryotic organisms. Here we suggest a superstatistical model that represents a long DNA molecule by a series of consecutive ~150 bp DNA segments with the alternation of the local nucleotide composition between segments exhibiting long-range correlations. We show that the superstatistical model and the corresponding DNA generation algorithm explicitly reproduce the laws governing the empirical nucleotide arrangement properties of the DNA sequences for various global GC contents and optimal living temperatures. Finally, we discuss the relevance of our model in terms of the DNA mechanical properties. As an outlook, we focus on finding the DNA sequences that encode a given protein while simultaneously reproducing the nucleotide arrangement laws observed from empirical genomes, that may be of interest in the optimization of genetic engineering of long DNA molecules.

  2. Determination of Trichuris muris from murid hosts and T. arvicolae (Nematoda) from arvicolid rodents by amplification and sequentiation of the ITS1-5.8S-ITS2 segment of the ribosomal DNA.

    PubMed

    Cutillas, C; Oliveros, R; de Rojas, M; Guevara, D C

    2002-06-01

    Trichuris muris has been isolated from murid hosts ( Apodemus sylvaticus and Mus musculus) and Trichuris arvicolae from arvicolid rodents in Barcelona, Spain. Genomic DNA was isolated and the ITS1-5.8S-ITS2 segment from the ribosomal DNA (rDNA) was amplified and sequenced using polymerase chain reaction techniques. The ITS2 of both populations isolated from Apodemus and Mus was 382 nucleotides in length and had a GC content of about 60.73%, while the ITS2 of T. arvicolae was 442 nucleotides in length and had a GC content of about 59.8%. Furthermore, the ITS1 of Trichuris from murids was 448 nucleotides in length and had a GC content of about 56.47%, while T. arvicolae was 446 nucleotides in length and had 57.62% of GC content. A total of 161 and 173 nucleotides were observed along the 5.8S gene of T. murisand T. arvicolae, respectively; This difference in nucleotides was due to the insertion of a DNA segment (transposon) in the 5.8S sequence of the latter species. Slight intraindividual and intraspecific variations were detected in the rDNA of both species. The presence of microsatellites was observed in all of the individuals assayed. Sequence analysis of the internal transcribed spacers and the 5.8S gene demonstrated no sequence differences between T. muris isolated from both of its murid hosts. Nevertheless, clear differences were detected between the ITS2, ITS1 and 5.8S gene of T. muris and T. arvicolae. This corroborates the existence of two separate Trichuris species in murid and arvicolid hosts. Furthermore, a phylogenetic analysis was carried out and endonucleases restriction maps were elaborated for both species.

  3. DNA barcoding Australia's fish species

    PubMed Central

    Ward, Robert D; Zemlak, Tyler S; Innes, Bronwyn H; Last, Peter R; Hebert, Paul D.N

    2005-01-01

    Two hundred and seven species of fish, mostly Australian marine fish, were sequenced (barcoded) for a 655 bp region of the mitochondrial cytochrome oxidase subunit I gene (cox1). Most species were represented by multiple specimens, and 754 sequences were generated. The GC content of the 143 species of teleosts was higher than the 61 species of sharks and rays (47.1% versus 42.2%), largely due to a higher GC content of codon position 3 in the former (41.1% versus 29.9%). Rays had higher GC than sharks (44.7% versus 41.0%), again largely due to higher GC in the 3rd codon position in the former (36.3% versus 26.8%). Average within-species, genus, family, order and class Kimura two parameter (K2P) distances were 0.39%, 9.93%, 15.46%, 22.18% and 23.27%, respectively. All species could be differentiated by their cox1 sequence, although single individuals of each of two species had haplotypes characteristic of a congener. Although DNA barcoding aims to develop species identification systems, some phylogenetic signal was apparent in the data. In the neighbour-joining tree for all 754 sequences, four major clusters were apparent: chimaerids, rays, sharks and teleosts. Species within genera invariably clustered, and generally so did genera within families. Three taxonomic groups—dogfishes of the genus Squalus, flatheads of the family Platycephalidae, and tunas of the genus Thunnus—were examined more closely. The clades revealed after bootstrapping generally corresponded well with expectations. Individuals from operational taxonomic units designated as Squalus species B through F formed individual clades, supporting morphological evidence for each of these being separate species. We conclude that cox1 sequencing, or ‘barcoding’, can be used to identify fish species. PMID:16214743

  4. A 269-amino-acid segment with a pseudo-leucine zipper and a helix-turn-helix motif codes for the sequence-specific DNA-binding domain of herpes simplex virus type 1 origin-binding protein.

    PubMed Central

    Deb, S; Deb, S P

    1991-01-01

    The UL9 gene of herpes simplex virus (HSV) codes for a DNA-binding protein (OBP) that interacts sequence specifically with the origin of replication. This protein is essential for HSV DNA replication in cultured cells. The UL9 gene was cloned into a plasmid vector downstream of the SP6 RNA polymerase promoter. By using in vitro transcription and translation systems, a full-length OBP was synthesized. This synthetic protein is recognized by an antiserum generated against the C-terminal decapeptide of OBP and is functionally active in binding to OriS sequence specifically. The in vitro-synthesized protein has sequence specificity for binding similar to that found for the in vivo-generated OBP. A total of 14 in-frame deletion and insertion mutants of the UL9 gene were generated and expressed in vitro. Using these deletion mutants, we determined that the 269-amino-acid stretch defined by amino acids 564 to 832 localizes the OriS-specific DNA-binding domain. The N-terminal boundary is between amino acids 565 and 596, while the C terminus lies between amino acids 833 and 805. This segment contains a helix-turn-helix moiety and a pseudo-leucine zipper, neither of which alone can support DNA binding. The other leucine zipper from amino acids 150 to 173 is not required for the in vitro sequence-specific DNA-binding activity of OBP. Images PMID:1851856

  5. Ethical coding.

    PubMed

    Resnik, Barry I

    2009-01-01

    It is ethical, legal, and proper for a dermatologist to maximize income through proper coding of patient encounters and procedures. The overzealous physician can misinterpret reimbursement requirements or receive bad advice from other physicians and cross the line from aggressive coding to coding fraud. Several of the more common problem areas are discussed.

  6. Uplink Coding

    NASA Technical Reports Server (NTRS)

    Pollara, Fabrizio; Hamkins, Jon; Dolinar, Sam; Andrews, Ken; Divsalar, Dariush

    2006-01-01

    This viewgraph presentation reviews uplink coding. The purpose and goals of the briefing are (1) Show a plan for using uplink coding and describe benefits (2) Define possible solutions and their applicability to different types of uplink, including emergency uplink (3) Concur with our conclusions so we can embark on a plan to use proposed uplink system (4) Identify the need for the development of appropriate technology and infusion in the DSN (5) Gain advocacy to implement uplink coding in flight projects Action Item EMB04-1-14 -- Show a plan for using uplink coding, including showing where it is useful or not (include discussion of emergency uplink coding).

  7. A putative insect intracellular endosymbiont stem clade, within the Enterobacteriaceae, infered from phylogenetic analysis based on a heterogeneous model of DNA evolution.

    PubMed

    Charles, H; Heddi, A; Rahbe, Y

    2001-05-01

    Insect intracellular symbiotic bacteria (intracellular endosymbionts, or endocytobionts) were positioned within the gamma 3-Proteobacteria using a non-homogeneous model of DNA evolution, allowing for rate variability among sites, for GC content heterogeneity among sequences, and applied to a maximum likelihood framework. Most of them were found to be closely related within the Enterobacteriaceae family, located between Proteus and Yersinia. These results suggest that such a bacterial group might possess several traits allowing for insect infection and the stable establishment of symbiotic relationships and that this could represent a stem clade for numerous insect endocytobionts. Based on the estimations of the equilibrium GC content and branch lengths in the phylogenetic tree, we have made comparisons of the relative ages of these different symbioses.

  8. Cross-species analysis of genic GC3 content and DNA methylation patterns.

    PubMed

    Tatarinova, Tatiana; Elhaik, Eran; Pellegrini, Matteo

    2013-01-01

    The GC content in the third codon position (GC(3)) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC(3) was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC(3) from 5' to 3'. Moreover, GC(3)-rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC(3) bimodal distribution we hypothesize that GC(3) has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC(3) distribution and tested the association between GC(3), DNA methylation, and gene expression. We examine the relationship between cytosine methylation levels and GC(3), gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson's correlation coefficient r = -0.67, P value < 0.0001) between GC(3) and genic CpG methylation. The comparison between 5'-3' gradients of CG(3)-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee, and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationships between GC(3) and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC(3)-poor and GC(3)-rich genes are the products of several competing processes.

  9. Cross-Species Analysis of Genic GC3 Content and DNA Methylation Patterns

    PubMed Central

    Tatarinova, Tatiana; Elhaik, Eran; Pellegrini, Matteo

    2013-01-01

    The GC content in the third codon position (GC3) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC3 was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC3 from 5′ to 3′. Moreover, GC3-rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC3 bimodal distribution we hypothesize that GC3 has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC3 distribution and tested the association between GC3, DNA methylation, and gene expression. We examine the relationship between cytosine methylation levels and GC3, gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson’s correlation coefficient r = −0.67, P value < 0.0001) between GC3 and genic CpG methylation. The comparison between 5′-3′ gradients of CG3-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee, and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationships between GC3 and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC3-poor and GC3-rich genes are the products of several competing processes. PMID:23833164

  10. Analysis of the complete DNA sequence of murine cytomegalovirus.

    PubMed Central

    Rawlinson, W D; Farrell, H E; Barrell, B G

    1996-01-01

    The complete DNA sequence of the Smith strain of murine cytomegalovirus (MCMV) was determined from virion DNA by using a whole-genome shotgun approach. The genome has an overall G+C content of 58.7%, consists of 230,278 bp, and is arranged as a single unique sequence with short (31-bp) terminal direct repeats and several short internal repeats. Significant similarity to the genome of the sequenced human cytomegalovirus (HCMV) strain AD169 is evident, particularly for 78 open reading frames encoded by the central part of the genome. There is a very similar distribution of G+C content across the two genomes. Sequences toward the ends of the MCMV genome encode tandem arrays of homologous glycoproteins (gps) arranged as two gene families. The left end encodes 15 gps that represent one family, and the right end encodes a different family of 11 gps. A homolog (m144) of cellular major histocompatibility complex (MHC) class I genes is located at the end of the genome opposite the HCMV MHC class I homolog (UL18). G protein-coupled receptor (GCR) homologs (M33 and M78) occur in positions congruent with two (UL33 and UL78) of the four putative HCMV GCR homologs. Counterparts of all of the known enzyme homologs in HCMV are present in the MCMV genome, including the phosphotransferase gene (M97), whose product phosphorylates ganciclovir in HCMV-infected cells, and the assembly protein (M80). PMID:8971012

  11. Time scale for cyclostome evolution inferred with a phylogenetic diagnosis of hagfish and lamprey cDNA sequences.

    PubMed

    Kuraku, Shigehiro; Kuratani, Shigeru

    2006-12-01

    The Cyclostomata consists of the two orders Myxiniformes (hagfishes) and Petromyzoniformes (lampreys), and its monophyly has been unequivocally supported by recent molecular phylogenetic studies. Under this updated vertebrate phylogeny, we performed in silico evolutionary analyses using currently available cDNA sequences of cyclostomes. We first calculated the GC-content at four-fold degenerate sites (GC(4)), which revealed that an extremely high GC-content is shared by all the lamprey species we surveyed, whereas no striking pattern in GC-content was observed in any of the hagfish species surveyed. We then estimated the timing of diversification in cyclostome evolution using nucleotide and amino acid sequences. We obtained divergence times of 470-390 million years ago (Mya) in the Ordovician-Silurian-Devonian Periods for the interordinal split between Myxiniformes and Petromyzoniformes; 90-60 Mya in the Cretaceous-Tertiary Periods for the split between the two hagfish subfamilies, Myxininae and Eptatretinae; 280-220 Mya in the Permian-Triassic Periods for the split between the two lamprey subfamilies, Geotriinae and Petromyzoninae; and 30-10 Mya in the Tertiary Period for the split between the two lamprey genera, Petromyzon and Lethenteron. This evolutionary configuration indicates that Myxiniformes and Petromyzoniformes diverged shortly after the common ancestor of cyclostomes split from the future gnathostome lineage. Our results also suggest that intra-subfamilial diversification in hagfish and lamprey lineages (especially those distributed in the northern hemisphere) occurred in the Cretaceous or Tertiary Periods.

  12. Sharing code.

    PubMed

    Kubilius, Jonas

    2014-01-01

    Sharing code is becoming increasingly important in the wake of Open Science. In this review I describe and compare two popular code-sharing utilities, GitHub and Open Science Framework (OSF). GitHub is a mature, industry-standard tool but lacks focus towards researchers. In comparison, OSF offers a one-stop solution for researchers but a lot of functionality is still under development. I conclude by listing alternative lesser-known tools for code and materials sharing.

  13. The information capacity of the genetic code: Is the natural code optimal?

    PubMed

    Kuruoglu, Ercan E; Arndt, Peter F

    2017-04-21

    We envision the molecular evolution process as an information transfer process and provide a quantitative measure for information preservation in terms of the channel capacity according to the channel coding theorem of Shannon. We calculate Information capacities of DNA on the nucleotide (for non-coding DNA) and the amino acid (for coding DNA) level using various substitution models. We extend our results on coding DNA to a discussion about the optimality of the natural codon-amino acid code. We provide the results of an adaptive search algorithm in the code domain and demonstrate the existence of a large number of genetic codes with higher information capacity. Our results support the hypothesis of an ancient extension from a 2-nucleotide codon to the current 3-nucleotide codon code to encode the various amino acids. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. DNA polymorphism in morels: complete sequences of the internal transcribed spacer of genes coding for rRNA in Morchella esculenta (yellow morel) and Morchella conica (black morel).

    PubMed

    Wipf, D; Munch, J C; Botton, B; Buscot, F

    1996-09-01

    The internal transcribed spacer (ITS) of the gene coding for rRNA was sequenced in both directions with the gene walking technique in a black morel (Morchella conica) and a yellow morel (M. esculenta) to elucidate the ITS length discrepancy between the two species groups (750-bp ITS in black morels and 1,150-bp ITS in yellow morels.

  15. Isolation and expression of a novel chick G-protein cDNA coding for a G alpha i3 protein with a G alpha 0 N-terminus.

    PubMed Central

    Kilbourne, E J; Galper, J B

    1994-01-01

    We have cloned cDNAs coding for G-protein alpha subunits from a chick brain cDNA library. Based on sequence similarity to G-protein alpha subunits from other eukaryotes, one clone was designated G alpha i3. A second clone, G alpha i3-o, was identical to the G alpha i3 clone over 932 bases on the 3' end. The 5' end of G alpha i3-o, however, contained an alternative sequence in which the first 45 amino acids coded for are 100% identical to the conserved N-terminus of G alpha o from species such as rat, mouse, human, bovine and hamster. Both clones were found to be expressed in all tissues studied. The unusual alpha o-alpha i3-like G-protein chimera, G alpha i3-o, was found to be expressed at significantly lower levels than G alpha i3. In vitro transcription and translation of the G alpha i3-o cDNA clone gave a protein of approx. 41 kDa which stably bound guanosine 5'-[gamma-thio]triphosphate. G alpha i3-o appears to be the first G-protein alpha subunit cloned which contains ends that are homologous to two different alpha subunit isoforms, G alpha o and G alpha i3. Images Figure 4 Figure 5 Figure 6 Figure 7 PMID:8297335

  16. Isolation and characterization of cDNA clones for rat ribophorin I: complete coding sequence and in vitro synthesis and insertion of the encoded product into endoplasmic reticulum membranes

    PubMed Central

    1987-01-01

    Ribophorins I and II are two transmembrane glycoproteins that are characteristic of the rough endoplasmic reticulum and are thought to be part of the apparatus that affects the co-translational translocation of polypeptides synthesized on membrane-bound polysomes. A ribophorin I cDNA clone containing a 0.6-kb insert was isolated from a rat liver lambda gtll cDNA library by immunoscreening with specific antibodies. This cDNA was used to isolate a clone (2.3 kb) from a rat brain lambda gtll cDNA library that contains the entire ribophorin I coding sequence. SP6 RNA transcripts of the insert in this clone directed the in vitro synthesis of a polypeptide of the expected size that was immunoprecipitated with anti-ribophorin I antibodies. When synthesized in the presence of microsomes, this polypeptide, like the translation product of the natural ribophorin I mRNA, underwent membrane insertion, signal cleavage, and co-translational glycosylation. The complete amino acid sequence of the polypeptide encoded in the cDNA insert was derived from the nucleotide sequence and found to contain a segment that corresponds to a partial amino terminal sequence of ribophorin I that was obtained by Edman degradation. This confirmed the identity of the cDNA clone and established that ribophorin I contains 583 amino acids and is synthesized with a cleavable amino terminal insertion signal of 22 residues. Analysis of the amino acid sequence of ribophorin I suggested that the polypeptide has a simple transmembrane disposition with a rather hydrophilic carboxy terminal segment of 150 amino acids exposed on the cytoplasmic face of the membrane, and a luminal domain of 414 amino acids containing three potential N-glycosylation sites. Hybridization measurements using the cloned cDNA as a probe showed that ribophorin I mRNA levels increase fourfold 15 h after partial hepatectomy, in confirmation of measurements made by in vitro translation of liver mRNA. Southern blot analysis of rat genomic

  17. cDNA-based gene mapping and GC3 profiling in the soft-shelled turtle suggest a chromosomal size-dependent GC bias shared by sauropsids.

    PubMed

    Kuraku, Shigehiro; Ishijima, Junko; Nishida-Umehara, Chizuko; Agata, Kiyokazu; Kuratani, Shigeru; Matsuda, Yoichi

    2006-01-01

    Mammalian and avian genomes comprise several classes of chromosomal segments that vary dramatically in GC-content. Especially in chicken, microchromosomes exhibit a higher GC-content and a higher gene density than macrochromosomes. To understand the evolutionary history of the intra-genome GC heterogeneity in amniotes, it is necessary to examine the equivalence of this GC heterogeneity at the nucleotide level between these animals including reptiles, from which birds diverged. We isolated cDNAs for 39 protein-coding genes from the Chinese soft-shelled turtle, Pelodiscus sinensis, and performed chromosome mapping of 31 genes. The GC-content of exonic third positions (GC3) of P. sinensis genes showed a heterogeneous distribution, and exhibited a significant positive correlation with that of chicken and human orthologs, indicating that the last common ancestor of extant amniotes had already established a GC-compartmentalized genomic structure. Furthermore, chromosome mapping in P. sinensis revealed that microchromosomes tend to contain more GC-rich genes than GC-poor genes, as in chicken. These results illustrate two modes of genome evolution in amniotes: mammals elaborated the genomic configuration in which GC-rich and GC-poor regions coexist in individual chromosomes, whereas sauropsids (reptiles and birds) refined the chromosomal size-dependent GC compartmentalization in which GC-rich genomic fractions tend to be confined to microchromosomes.

  18. Lesch-Nyhan syndrome: mRNA expression of HPRT in patients with enzyme proven deficiency of HPRT and normal HPRT coding region of the DNA.

    PubMed

    Nguyen, Khue Vu; Naviaux, Robert K; Paik, Kacie K; Nyhan, William L

    2012-08-01

    Inherited mutation of the purine salvage enzyme, hypoxanthine guanine phosphoribosyltransferase (HPRT) gives rise to Lesch-Nyhan syndrome (LNS) or Lesch-Nyhan variants (LNV). We report a case of two LNS affected members of a family with deficiency of activity of HPRT in intact cultured fibroblasts in whom mutation could not be found in the HPRT coding sequence but there was markedly decreased HPRT expression of mRNA. Published by Elsevier Inc.

  19. Sharing code

    PubMed Central

    Kubilius, Jonas

    2014-01-01

    Sharing code is becoming increasingly important in the wake of Open Science. In this review I describe and compare two popular code-sharing utilities, GitHub and Open Science Framework (OSF). GitHub is a mature, industry-standard tool but lacks focus towards researchers. In comparison, OSF offers a one-stop solution for researchers but a lot of functionality is still under development. I conclude by listing alternative lesser-known tools for code and materials sharing. PMID:25165519

  20. Direct DNA amplification from crude clinical samples using a PCR enhancer cocktail and novel mutants of Taq.

    PubMed

    Zhang, Zhian; Kermekchiev, Milko B; Barnes, Wayne M

    2010-03-01

    PCR-based clinical and forensic tests often have low sensitivity or even false-negative results caused by potent PCR inhibitors found in blood and soil. It is widely accepted that purification of target DNA before PCR is necessary for successful amplification. In an attempt to overcome PCR inhibition, enhance PCR amplification, and simplify the PCR protocol, we demonstrate improved PCR-enhancing cocktails containing nonionic detergent, l-carnitine, d-(+)-trehalose, and heparin. These cocktails, in combination with two inhibitor-resistant Taq mutants, OmniTaq and Omni Klentaq, enabled efficient amplification of exogenous, endogenous, and high-GC content DNA targets directly from crude samples containing human plasma, serum, and whole blood without DNA purification. In the presence of these enhancer cocktails, the mutant enzymes were able to tolerate at least 25% plasma, serum, or whole blood and as high as 80% GC content templates in PCR reactions. These enhancer cocktails also improved the performance of the novel Taq mutants in real-time PCR amplification using crude samples, both in SYBR Green fluorescence detection and TaqMan assays. The novel enhancer mixes also facilitated DNA amplification from crude samples with various commercial Taq DNA polymerases.

  1. Direct measurement of sequence-dependent transition path times and conformational diffusion in DNA duplex formation.

    PubMed

    Neupane, Krishna; Wang, Feng; Woodside, Michael T

    2017-02-07

    The conformational diffusion coefficient, D, sets the timescale for microscopic structural changes during folding transitions in biomolecules like nucleic acids and proteins. D encodes significant information about the folding dynamics such as the roughness of the energy landscape governing the folding and the level of internal friction in the molecule, but it is challenging to measure. The most sensitive measure of D is the time required to cross the energy barrier that dominates folding kinetics, known as the transition path time. To investigate the sequence dependence of D in DNA duplex formation, we measured individual transition paths from equilibrium folding trajectories of single DNA hairpins held under tension in high-resolution optical tweezers. Studying hairpins with the same helix length but with G:C base-pair content varying from 0 to 100%, we determined both the average time to cross the transition paths, τtp, and the distribution of individual transit times, PTP(t). We then estimated D from both τtp and PTP(t) from theories assuming one-dimensional diffusive motion over a harmonic barrier. τtp decreased roughly linearly with the G:C content of the hairpin helix, being 50% longer for hairpins with only A:T base pairs than for those with only G:C base pairs. Conversely, D increased linearly with helix G:C content, roughly doubling as the G:C content increased from 0 to 100%. These results reveal that G:C base pairs form faster than A:T base pairs because of faster conformational diffusion, possibly reflecting lower torsional barriers, and demonstrate the power of transition path measurements for elucidating the microscopic determinants of folding.

  2. Genome size and DNA base composition of geophytes: the mirror of phenology and ecology?

    PubMed

    Veselý, Pavel; Bures, Petr; Smarda, Petr; Pavlícek, Tomás

    2012-01-01

    Genome size is known to affect various plant traits such as stomatal size, seed mass, and flower or shoot phenology. However, these associations are not well understood for species with very large genomes, which are laregly represented by geophytic plants. No detailed associations are known between DNA base composition and genome size or species ecology. Genome sizes and GC contents were measured in 219 geophytes together with tentative morpho-anatomical and ecological traits. Increased genome size was associated with earliness of flowering and tendency to grow in humid conditions, and there was a positive correlation between an increase in stomatal size in species with extremely large genomes. Seed mass of geophytes was closely related to their ecology, but not to genomic parameters. Genomic DNA GC content showed a unimodal relationship with genome size but no relationship with species ecology. Evolution of genome size in geophytes is closely related to their ecology and phenology and is also associated with remarkable changes in DNA base composition. Although geophytism together with producing larger cells appears to be an advantageous strategy for fast development of an organism in seasonal habitats, the drought sensitivity of large stomata may restrict the occurrence of geophytes with very large genomes to regions not subject to water stress.

  3. Genome size and DNA base composition of geophytes: the mirror of phenology and ecology?

    PubMed Central

    Veselý, Pavel; Bureš, Petr; Šmarda, Petr; Pavlíček, Tomáš

    2012-01-01

    Background and Aims Genome size is known to affect various plant traits such as stomatal size, seed mass, and flower or shoot phenology. However, these associations are not well understood for species with very large genomes, which are laregly represented by geophytic plants. No detailed associations are known between DNA base composition and genome size or species ecology. Methods Genome sizes and GC contents were measured in 219 geophytes together with tentative morpho-anatomical and ecological traits. Key Results Increased genome size was associated with earliness of flowering and tendency to grow in humid conditions, and there was a positive correlation between an increase in stomatal size in species with extremely large genomes. Seed mass of geophytes was closely related to their ecology, but not to genomic parameters. Genomic DNA GC content showed a unimodal relationship with genome size but no relationship with species ecology. Conclusions Evolution of genome size in geophytes is closely related to their ecology and phenology and is also associated with remarkable changes in DNA base composition. Although geophytism together with producing larger cells appears to be an advantageous strategy for fast development of an organism in seasonal habitats, the drought sensitivity of large stomata may restrict the occurrence of geophytes with very large genomes to regions not subject to water stress. PMID:22021815

  4. Terminal repetitive sequences in herpesvirus saimiri virion DNA.

    PubMed

    Bankier, A T; Dietrich, W; Baer, R; Barrell, B G; Colbère-Garapin, F; Fleckenstein, B; Bodemer, W

    1985-07-01

    The H-DNA repeat unit of Herpesvirus saimiri strain 11 was cloned in plasmid vector pAGO, and the nucleotide sequence was determined by the dideoxy chain termination method. One unit of repetitive DNA has 1,444 base pairs with 70.8% G+C content. The structural features of repeat DNA sequences at the termini of intact virion M-DNA (160 kilobases) and orientation of reiterated DNA were analyzed by radioactive end labeling of M-DNA, followed by cleavage of the end fragments with restriction endonucleases. The termini appeared to be blunt ended with a 5'-phosphate group, probably generated during encapsidation by cleavage in the immediate vicinity of the single ApaI recognition site in the H-DNA repeat unit. The sequence did not reveal sizeable open reading frames, the longest hypothetical peptide from H-DNA being 85 amino acids. There was no evidence for an mRNA promoter or terminator element, and H-DNA-specific transcription could not be found in productively infected cells.

  5. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  6. Molecular cloning of a cDNA coding biliary glycoprotein I: Primary structure of a glycoprotein immunologically crossreactive with carcinoembryonic antigen

    SciTech Connect

    Hinoda, Y.; Neumaier, M.; Hefta, S.A.; Drzeniek, Z.; Wagener, C.; Shively, L.; Hefta, L.J.F.; Shively, J.E.; Paxton, R.J.

    1988-09-01

    The authors have isolated and sequenced four overlapping cDNA clones from a normal adult human colon library, which together gave the entire nucleotide sequence for biliary glycoprotein I (BGPI). BGPI is a member of the carcinoembryonic antigen (CEA) gene family, which is a subfamily in the immunoglobulin gene superfamily. The deduced amino acid sequence of the combined clones for BGP I revealed a 34-residue leader sequence followed by a 108-residue N-terminal domain, a 178-residue immunoglobulin-like domain, a 108-residue region specific to BGP I, a 24-residue transmembrane domain, and a 35-residue cytoplasmic domain. The nucleotide sequence of BGP I exhibited greater than 80% identity with CEA and nonspecific crossreacting antigen (NCA) in the leader peptide, N-terminal domain, and immunoglobulin-like domain. They propose that BGP I diverged from NCA by acquiring an immunoglobulin-like domain substantially different from the domains found in NCA or CEA and also a new cytoplasmic domain. The latter feature should result in a substantially different membrane anchorage mechanism of BGP I compared to CEA, which lacks the cytoplasmic domain and is anchored via a phosphatidylinositol-glycan structure. Protein structural analysis of BGP I isolated from human bile revealed a blocked N terminus, 129 amino acids of internal sequence that are in agreement with the translated cDNA sequence, and five glycosylation sites in the peptides sequenced.

  7. The coding region of the UFGT gene is a source of diagnostic SNP markers that allow single-locus DNA genotyping for the assessment of cultivar identity and ancestry in grapevine (Vitis vinifera L.)

    PubMed Central

    2013-01-01

    Background Vitis vinifera L. is one of society’s most important agricultural crops with a broad genetic variability. The difficulty in recognizing grapevine genotypes based on ampelographic traits and secondary metabolites prompted the development of molecular markers suitable for achieving variety genetic identification. Findings Here, we propose a comparison between a multi-locus barcoding approach based on six chloroplast markers and a single-copy nuclear gene sequencing method using five coding regions combined with a character-based system with the aim of reconstructing cultivar-specific haplotypes and genotypes to be exploited for the molecular characterization of 157 V. vinifera accessions. The analysis of the chloroplast target regions proved the inadequacy of the DNA barcoding approach at the subspecies level, and hence further DNA genotyping analyses were targeted on the sequences of five nuclear single-copy genes amplified across all of the accessions. The sequencing of the coding region of the UFGT nuclear gene (UDP-glucose: flavonoid 3-0-glucosyltransferase, the key enzyme for the accumulation of anthocyanins in berry skins) enabled the discovery of discriminant SNPs (1/34 bp) and the reconstruction of 130 V. vinifera distinct genotypes. Most of the genotypes proved to be cultivar-specific, and only few genotypes were shared by more, although strictly related, cultivars. Conclusion On the whole, this technique was successful for inferring SNP-based genotypes of grapevine accessions suitable for assessing the genetic identity and ancestry of international cultivars and also useful for corroborating some hypotheses regarding the origin of local varieties, suggesting several issues of misidentification (synonymy/homonymy). PMID:24298902

  8. Nature's Code

    NASA Astrophysics Data System (ADS)

    Hill, Vanessa J.; Rowlands, Peter

    2008-10-01

    We propose that the mathematical structures related to the `universal rewrite system' define a universal process applicable to Nature, which we may describe as `Nature's code'. We draw attention here to such concepts as 4 basic units, 64- and 20-unit structures, symmetry-breaking and 5-fold symmetry, chirality, double 3-dimensionality, the double helix, the Van der Waals force and the harmonic oscillator mechanism, and our explanation of how they necessarily lead to self-aggregation, complexity and emergence in higher-order systems. Biological concepts, such as translation, transcription, replication, the genetic code and the grouping of amino acids appear to be driven by fundamental processes of this kind, and it would seem that the Platonic solids, pentagonal symmetry and Fibonacci numbers have significant roles in organizing `Nature's code'.

  9. Show Code.

    PubMed

    Shalev, Daniel

    2017-01-01

    "Let's get one thing straight: there is no such thing as a show code," my attending asserted, pausing for effect. "You either try to resuscitate, or you don't. None of this halfway junk." He spoke so loudly that the two off-service consultants huddled at computers at the end of the unit looked up… We did four rounds of compressions and pushed epinephrine twice. It was not a long code. We did good, strong compressions and coded this man in earnest until the end. Toward the final round, though, as I stepped up to do compressions, my attending looked at me in a deep way. It was a look in between willing me as some object under his command and revealing to me everything that lay within his brash, confident surface but could not be spoken. © 2017 The Hastings Center.

  10. Epigenetic DNA-methylation regulation of genes coding for lipid raft-associated components: a role for raft proteins in cell transformation and cancer progression (review).

    PubMed

    Patra, Samir K; Bettuzzi, Saverio

    2007-06-01

    Metastatic progression is the cause of most cancer deaths. Host tumour cell separation (fission) is accompanied by simultaneous acquisition of migrating capability of cancer cells, remodeling of cellular architecture and effective 'homing' in body host environment. Cell remodeling involves cytoskeletal protein-protein and lipid-protein interaction together with altered signaling. Alteration of signaling in tumour cells may affect expression of many genes also by DNA-methylation/demethylation. This would alter the steady-state intracellular level of structural proteins or metabolic enzymes, and notably enzymes involved in the biosynthesis of lipids, affecting the composition of membranes. Lipid rafts are small, heterogeneous, highly dynamic, sterol- and sphingolipid-enriched domains that compartmentalize cellular processes. Small rafts can be stabilized to form larger platforms through protein-protein and protein-lipid interactions. Lipid rafts play an important role in intracellular protein transport, membrane fusion and trans-cytosis, also being platforms for cell surface antigens and adhesion molecules which are crucial for cell activation, polarization and signaling. Detachment of individual tumour cells from the host tumour lump requires lipid-protein-lipid raft (LPLR) reordering. Lipid rafts are also involved in angiogenesis and local invasion, which occurs within the host tumour vicinity by exchange of enzymes, cytokines and motility factors that modify the surrounding extracellular matrix (ECM). Many cell surface adhesion, ECM, and signaling proteins (such as E-cadherin, catenin, CD44, MMP-9 and caveolin-1) are known to be absent or reduced following gene promoter-CpG-island hypermethylation in mid-stage growing tumours, but re-expressed (by gene promoter-mCpG-DNA demethylation) in carcinomas such as metastasized lung, prostate and sarcomas. The recent research acquisitions on lipid rafts have tremendous implications in understanding the genetic and

  11. Phylogenetic footprinting of non-coding RNA: hammerhead ribozyme sequences in a satellite DNA family of Dolichopoda cave crickets (Orthoptera, Rhaphidophoridae)

    PubMed Central

    2010-01-01

    Background The great variety in sequence, length, complexity, and abundance of satellite DNA has made it difficult to ascribe any function to this genome component. Recent studies have shown that satellite DNA can be transcribed and be involved in regulation of chromatin structure and gene expression. Some satellite DNAs, such as the pDo500 sequence family in Dolichopoda cave crickets, have a catalytic hammerhead (HH) ribozyme structure and activity embedded within each repeat. Results We assessed the phylogenetic footprints of the HH ribozyme within the pDo500 sequences from 38 different populations representing 12 species of Dolichopoda. The HH region was significantly more conserved than the non-hammerhead (NHH) region of the pDo500 repeat. In addition, stems were more conserved than loops. In stems, several compensatory mutations were detected that maintain base pairing. The core region of the HH ribozyme was affected by very few nucleotide substitutions and the cleavage position was altered only once among 198 sequences. RNA folding of the HH sequences revealed that a potentially active HH ribozyme can be found in most of the Dolichopoda populations and species. Conclusions The phylogenetic footprints suggest that the HH region of the pDo500 sequence family is selected for function in Dolichopoda cave crickets. However, the functional role of HH ribozymes in eukaryotic organisms is unclear. The possible functions have been related to trans cleavage of an RNA target by a ribonucleoprotein and regulation of gene expression. Whether the HH ribozyme in Dolichopoda is involved in similar functions remains to be investigated. Future studies need to demonstrate how the observed nucleotide changes and evolutionary constraint have affected the catalytic efficiency of the hammerhead. PMID:20047671

  12. QR Codes

    ERIC Educational Resources Information Center

    Lai, Hsin-Chih; Chang, Chun-Yen; Li, Wen-Shiane; Fan, Yu-Lin; Wu, Ying-Tien

    2013-01-01

    This study presents an m-learning method that incorporates Integrated Quick Response (QR) codes. This learning method not only achieves the objectives of outdoor education, but it also increases applications of Cognitive Theory of Multimedia Learning (CTML) (Mayer, 2001) in m-learning for practical use in a diverse range of outdoor locations. When…

  13. QR Codes

    ERIC Educational Resources Information Center

    Lai, Hsin-Chih; Chang, Chun-Yen; Li, Wen-Shiane; Fan, Yu-Lin; Wu, Ying-Tien

    2013-01-01

    This study presents an m-learning method that incorporates Integrated Quick Response (QR) codes. This learning method not only achieves the objectives of outdoor education, but it also increases applications of Cognitive Theory of Multimedia Learning (CTML) (Mayer, 2001) in m-learning for practical use in a diverse range of outdoor locations. When…

  14. Uplink Coding

    NASA Technical Reports Server (NTRS)

    Andrews, Ken; Divsalar, Dariush; Dolinar, Sam; Moision, Bruce; Hamkins, Jon; Pollara, Fabrizio

    2007-01-01

    This slide presentation reviews the objectives, meeting goals and overall NASA goals for the NASA Data Standards Working Group. The presentation includes information on the technical progress surrounding the objective, short LDPC codes, and the general results on the Pu-Pw tradeoff.

  15. DNA methylation mediated up-regulation of TERRA non-coding RNA is coincident with elongated telomeres in the human placenta.

    PubMed

    Novakovic, Boris; Napier, Christine E; Vryer, Regan; Dimitriadis, Eva; Manuelpillai, Ursula; Sharkey, Andrew; Craig, Jeffrey M; Reddel, Roger R; Saffery, Richard

    2016-11-01

    What factors regulate elongated telomere length in the human placenta? Hypomethylation of TERRA promoters in the human placenta is associated with high TERRA expression, however, no clear mechanistic link between these phenomena and elongated telomere length in the human placenta was found. Human placenta tissue and trophoblasts show longer telomere lengths compared to gestational age-matched somatic cells. However, telomerase (hTERT) expression and activity in the placenta is low, suggesting a role for an alternative lengthening of telomeres (ALT). While ALT is observed in 10-15% of human cancers and in some mouse stem cells, ALT has never been reported in non-cancerous human tissues. Human term placental tissue and matched cord blood mononuclear cells (CBMCs) were collected as part of the Peri/Postnatal Epigenetic Twins study (PETS). In addition, first trimester placental villi, purified cytotrophoblasts, choriocarcinoma cell lines and a panel of ALT-positive cancer cell lines were tested. Telomere length was determined using the Terminal Restriction Fragment (TRF) assay and a relative quantitative PCR method. DNA methylation levels at several CpG rich subtelomeric TERRA promoters were determined using bisulfite conversion and the SEQUENOM EpiTYPER platform. Expression of TERRA and hTERT was determined using quantitative RT-PCR. ALT was assessed using the C-circle assay (CCA). The human placenta tissue and purified first trimester trophoblasts showed low subtelomeric (TERRA) DNA methylation compared to matched CBMCs and other somatic cells. Interestingly placental TERRA methylation was lower than ALT-cancer cell lines, previously reported to be hypomethylated at these loci. Low TERRA methylation was associated with higher expression of TERRA RNA in placenta compared to matched CBMCs. Detectable levels of C-circles were observed in first trimester placental villi, but not term placenta, suggesting that the ALT mechanism may be active in specific placental cells in

  16. Characterization of the Dominant and Rare Members of a Young Hawaiian Soil Bacterial Community with Small-Subunit Ribosomal DNA Amplified from DNA Fractionated on the Basis of Its Guanine and Cytosine Composition

    PubMed Central

    Nüsslein, Klaus; Tiedje, James M.

    1998-01-01

    The small-subunit ribosomal DNA (rDNA) diversity was found to be very high in a Hawaiian soil community that might be expected to have lower diversity than the communities in continental soils because the Hawaiian soil is geographically isolated and only 200 years old, is subjected to a constant climate, and harbors low plant diversity. Since an underlying community structure could not be revealed by analyzing the total eubacterial rDNA, we first fractionated the DNA on the basis of guanine-plus-cytosine (G+C) content by using bis-benzimidazole and equilibrium centrifugation and then analyzed the bacterial rDNA amplified from a fraction with a high biomass (63% G+C fraction) and a fraction with a low biomass (35% G+C fraction). The rDNA clone libraries were screened by amplified rDNA restriction analysis to determine phylotype distribution. The dominant biomass reflected by the 63% G+C fraction contained several dominant phylotypes, while the community members that were less successful (35% G+C fraction) did not show dominance but there was a very high diversity of phylotypes. Nucleotide sequence analysis revealed taxa belonging to the groups expected for the G+C contents used. The dominant phylotypes in the 63% G+C fraction were members of the Pseudomonas, Rhizobium-Agrobacterium, and Rhodospirillum assemblages, while all of the clones sequenced from the 35% G+C fraction were affiliated with several Clostridium assemblages. The two-step rDNA analysis used here uncovered more diversity than can be detected by direct rDNA analysis of total community DNA. The G+C separation step is also a way to detect some of the less dominant organisms in a community. PMID:9546163

  17. The Cipher Code of Simple Sequence Repeats in “Vampire Pathogens”

    PubMed Central

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W.; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-01-01

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like “vampire pathogens” (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation. PMID:26215592

  18. Deppdb--DNA electrostatic potential properties database: electrostatic properties of genome DNA.

    PubMed

    Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G

    2010-06-01

    The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.

  19. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human

    PubMed Central

    Wu, Chengchao; Yao, Shixin; Li, Xinghao; Chen, Chujia; Hu, Xuehai

    2017-01-01

    DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation. PMID:28212312

  20. Schrödinger's code-script: not a genetic cipher but a code of development.

    PubMed

    Walsby, A E; Hodge, M J S

    2017-06-01

    In his book What is Life? Erwin Schrödinger coined the term 'code-script', thought by some to be the first published suggestion of a hereditary code and perhaps a forerunner of the genetic code. The etymology of 'code' suggests three meanings relevant to 'code-script which we distinguish as 'cipher-code', 'word-code' and 'rule-code'. Cipher-codes and word-codes entail translation of one set of characters into another. The genetic code comprises not one but two cipher-codes: the first is the DNA 'base-pairing cipher'; the second is the 'nucleotide-amino-acid cipher', which involves the translation of DNA base sequences into amino-acid sequences. We suggest that Schrödinger's code-script is a form of 'rule-code', a set of rules that, like the 'highway code' or 'penal code', requires no translation of a message. Schrödinger first relates his code-script to chromosomal genes made of protein. Ignorant of its properties, however, he later abandons 'protein' and adopts in its place a hypothetical, isomeric 'aperiodic solid' whose atoms he imagines rearranged in countless different conformations, which together are responsible for the patterns of ontogenetic development. In an attempt to explain the large number of combinations required, Schrödinger referred to the Morse code (a cipher) but in doing so unwittingly misled readers into believing that he intended a cipher-code resembling the genetic code. We argue that the modern equivalent of Schrödinger's code-script is a rule-code of organismal development based largely on the synthesis, folding, properties and interactions of numerous proteins, each performing a specific task. Copyright © 2016. Published by Elsevier Ltd.

  1. Breaking the DNA-binding code of Ralstonia solanacearum TAL effectors provides new possibilities to generate plant resistance genes against bacterial wilt disease.

    PubMed

    de Lange, Orlando; Schreiber, Tom; Schandry, Niklas; Radeck, Jara; Braun, Karl Heinz; Koszinowski, Julia; Heuer, Holger; Strauß, Annett; Lahaye, Thomas

    2013-08-01

    Ralstonia solanacearum is a devastating bacterial phytopathogen with a broad host range. Ralstonia solanacearum injected effector proteins (Rips) are key to the successful invasion of host plants. We have characterized Brg11(hrpB-regulated 11), the first identified member of a class of Rips with high sequence similarity to the transcription activator-like (TAL) effectors of Xanthomonas spp., collectively termed RipTALs. Fluorescence microscopy of in planta expressed RipTALs showed nuclear localization. Domain swaps between Brg11 and Xanthomonas TAL effector (TALE) AvrBs3 (avirulence protein triggering Bs3 resistance) showed the functional interchangeability of DNA-binding and transcriptional activation domains. PCR was used to determine the sequence of brg11 homologs from strains infecting phylogenetically diverse host plants. Brg11 localizes to the nucleus and activates promoters containing a matching effector-binding element (EBE). Brg11 and homologs preferentially activate promoters containing EBEs with a 5' terminal guanine, contrasting with the TALE preference for a 5' thymine. Brg11 and other RipTALs probably promote disease through the transcriptional activation of host genes. Brg11 and the majority of homologs identified in this study were shown to activate similar or identical target sequences, in contrast to TALEs, which generally show highly diverse target preferences. This information provides new options for the engineering of plants resistant to R. solanacearum. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.

  2. Isolation and characterization of an atypical LEA protein coding cDNA and its promoter from drought-tolerant plant Prosopis juliflora.

    PubMed

    George, Suja; Usha, B; Parida, Ajay

    2009-05-01

    Plant growth and productivity are adversely affected by various abiotic and biotic stress factors. Despite the wealth of information on abiotic stress and stress tolerance in plants, many aspects still remain unclear. Prosopis juliflora is a hardy plant reported to be tolerant to drought, salinity, extremes of soil pH, and heavy metal stress. In this paper, we report the isolation and characterization of the complementary DNA clone for an atypical late embryogenesis abundant (LEA) protein (Pj LEA3) and its putative promoter sequence from P. juliflora. Unlike typical LEA proteins, rich in glycine, Pj LEA3 has alanine as the most abundant amino acid followed by serine and shows an average negative hydropathy. Pj LEA3 is significantly different from other LEA proteins in the NCBI database and shows high similarity to indole-3 acetic-acid-induced protein ARG2 from Vigna radiata. Northern analysis for Pj LEA3 in P. juliflora leaves under 90 mM H2O2 stress revealed up-regulation of transcript at 24 and 48 h. A 1.5-kb fragment upstream the 5' UTR of this gene (putative promoter) was isolated and analyzed in silico. The possible reasons for changes in gene expression during stress in relation to the host plant's stress tolerance mechanisms are discussed.

  3. Double-coding nucleic acids: introduction of a nucleobase sequence in the major groove of the DNA duplex using double-headed nucleotides.

    PubMed

    Kumar, Pawan; Sorinas, Antoni Figueras; Nielsen, Lise J; Slot, Maria; Skytte, Kirstine; Nielsen, Annie S; Jensen, Michael D; Sharma, Pawan K; Vester, Birte; Petersen, Michael; Nielsen, Poul

    2014-09-05

    A series of double-headed nucleosides were synthesized using the Sonogashira cross-coupling reaction. In the reactions, additional nucleobases (thymine, cytosine, adenine, or guanine) were attached to the 5-position of 2'-deoxyuridine or 2'-deoxycytidine through a propyne linker. The modified nucleosides were incorporated into oligonucleotides, and these were combined in different duplexes that were analyzed by thermal denaturation studies. All of the monomers were well tolerated in the DNA duplexes and induced only small changes in the thermal stability. Consecutive incorporations of the monomers led to increases in duplex stability owing to increased stacking interactions. The modified nucleotide monomers maintained the Watson-Crick base pair fidelity. Stable duplexes were observed with heavily modified oligonucleotides featuring 14 consecutive incorporations of different double-headed nucleotide monomers. Thus, modified duplexes with an array of nucleobases on the exterior of the duplex were designed. Molecular dynamics simulations demonstrated that the additional nucleobases could expose their Watson-Crick and/or Hoogsteen faces for recognition in the major groove. This presentation of nucleobases may find applications in providing molecular information without unwinding the duplex.

  4. Sequencing of the coding exons of the LRP1 and LDLR genes on individual DNA samples reveals novel mutations in both genes.

    PubMed

    Van Leuven, F; Thiry, E; Lambrechts, M; Stas, L; Boon, T; Bruynseels, K; Muls, E; Descamps, O

    2001-02-15

    Five coding polymorphisms in de LRP1 gene, i.e. A217V, A775P, D2080N, D2632E and G4379S were discovered by sequencing its 89 exons in three test-groups of 22 healthy individuals, 29 Alzheimer patients and 18 individuals with different clinical and molecularly uncharacterized lipid metabolism problems. No genetic defect was evident in the LRP1 gene of any of the Alzheimer's disease (AD) patients, further excluding LRP1 as a major genetic problem in AD. Lipoprotein receptor related protein (LRP) A217V (exon 6) was clearly present in all groups as a polymorphism, while D2632E was observed only once in a healthy volunteer. On the other hand, LRP1 alleles A775P, D2080N, and G4379 were encountered only in patients with FH or with undefined problems of lipid metabolism. This finding forced one to also analyze the LDL receptor (LDLR) gene, for which a method was devised to sequence the entire region comprising LDLR exons 2-18. The resulting sequence contig of 33567 nucleotides yielded finally an exact physical map that corrects published and listed LDLR gene maps in many positions. In addition, next to known mutations in LDLR that cause FH, four novel LDLR defects were defined, i.e. del e7-10, exon 9 mutation N407T, a 20 bp insertion in exon 4, and a double mutation C292W/K290R in exon 6. No evidence for pathology connected to the LRP1 'mutations' was obtained by subsequent screening for the five LRP1 variants in larger groups of 110 FH patients and 118 patients with molecularly undefined, clinical problems of cholesterol and/or lipid metabolism. In three individuals with a mutant LDLR gene a variant LRP1 allele was also present, but without direct, obvious clinical compound effects, indicating that the variant LRP1 alleles must, for the present, be considered polymorphisms.

  5. Comparison of the measured phase diagrams in the force-temperature plane for the unzipping of two different natural DNA sequences

    NASA Astrophysics Data System (ADS)

    Lee, C. H.; Danilowicz, C.; Coljee, V. W.; Prentiss, M.

    2006-03-01

    In this work, we consider the critical force required to unzip two different naturally occurring sequences of double-stranded DNA (dsDNA) at temperatures ranging from 20 °C to 50 °C, where one of the sequences has a 53% average guanine-cytosine (GC) content and the other has a 40% GC content. We demonstrate that the force required to separate the dsDNA of the 53% GC sequence into single-stranded DNA (ssDNA) is approximately 0.5 pN, or approximately 5% greater than the critical force required to unzip the 40% GC sequence at the same temperature. In the temperature range between 20 and 40 °C the measured critical forces correspond reasonably well to predictions based on a simple theoretical homopolymeric model, but at temperatures above 40 °C the measured critical forces are much smaller than the predicted forces. The correspondence between theory and experiment is not improved by using Monte Carlo simulations that consider the heteropolymeric nature of the sequences.

  6. A primer design strategy for PCR amplification of GC-rich DNA sequences.

    PubMed

    Li, Li-Yan; Li, Qiang; Yu, Yan-Hong; Zhong, Mei; Yang, Lei; Wu, Qing-Hong; Qiu, Yu-Rong; Luo, Shen-Qiu

    2011-06-01

    To establish a primer design method for amplification of GC-rich DNA sequences. A group of 15 pairs of primers with higher T(m) (>79.7°C) and lower level ΔT(m) (<1°C) were designed to amplify GC-rich sequences (66.0%-84.0%). The statistical analysis of primer parameters and GC content of PCR products was performed and compared with literatures. Other control experiments were conducted using shortened primers for GC-rich PCR amplifications in this study, and the statistical analysis of shortened primer parameters and GC content of PCR products was performed compared with primers not shortened. A group of 26 pairs of primers were designed to test the applicability of this primer designing strategy in amplifications of non-GC-rich sequences (35.2%-53.5%). All the DNA sequences in this study were successfully amplified. Statistical analyses show that the T(m) and ΔT(m) were the main factors influencing amplifications. This primer designing strategy offered a perfect tool for amplification of GC-rich sequences. It proves that the secondary structures cannot be formed at higher annealing temperature conditions (>65°C), and we can overcome this difficulty easily by designing primers and using higher annealing temperature. Crown Copyright © 2011. Published by Elsevier Inc. All rights reserved.

  7. ITS1: a DNA barcode better than ITS2 in eukaryotes?

    PubMed

    Wang, Xin-Cun; Liu, Chang; Huang, Liang; Bengtsson-Palme, Johan; Chen, Haimei; Zhang, Jian-Hui; Cai, Dayong; Li, Jian-Qin

    2015-05-01

    A DNA barcode is a short piece of DNA sequence used for species determination and discovery. The internal transcribed spacer (ITS/ITS2) region has been proposed as the standard DNA barcode for fungi and seed plants and has been widely used in DNA barcoding analyses for other biological groups, for example algae, protists and animals. The ITS region consists of both ITS1 and ITS2 regions. Here, a large-scale meta-analysis was carried out to compare ITS1 and ITS2 from three aspects: PCR amplification, DNA sequencing and species discrimination, in terms of the presence of DNA barcoding gaps, species discrimination efficiency, sequence length distribution, GC content distribution and primer universality. In total, 85 345 sequence pairs in 10 major groups of eukaryotes, including ascomycetes, basidiomycetes, liverworts, mosses, ferns, gymnosperms, monocotyledons, eudicotyledons, insects and fishes, covering 611 families, 3694 genera, and 19 060 species, were analysed. Using similarity-based methods, we calculated species discrimination efficiencies for ITS1 and ITS2 in all major groups, families and genera. Using Fisher's exact test, we found that ITS1 has significantly higher efficiencies than ITS2 in 17 of the 47 families and 20 of the 49 genera, which are sample-rich. By in silico PCR amplification evaluation, primer universality of the extensively applied ITS1 primers was found superior to that of ITS2 primers. Additionally, shorter length of amplification product and lower GC content was discovered to be two other advantages of ITS1 for sequencing. In summary, ITS1 represents a better DNA barcode than ITS2 for eukaryotic species.

  8. A Method for the Annotation of Functional Similarities of Coding DNA Sequences: the Case of a Populated Cluster of Transmembrane Proteins.

    PubMed

    Fuertes, Miguel Angel; Rodrigo, José Ramón; Alonso, Carlos

    2017-01-01

    The analysis of a large number of human and mouse genes codifying for a populated cluster of transmembrane proteins revealed that some of the genes significantly vary in their primary nucleotide sequence inter-species and also intra-species. In spite of that divergence and of the fact that all these genes share a common parental function we asked the question of whether at DNA level they have some kind of common compositional structure, not evident from the analysis of their primary nucleotide sequence. To reveal the existence of gene clusters not based on primary sequence relationships we have analyzed 13574 human and 14047 mouse genes by the composon-clustering methodology. The data presented show that most of the genes from each one of the samples are distributed in 18 clusters sharing the common compositional features between the particular human and mouse clusters. It was observed, in addition, that between particular human and mouse clusters having similar composon-profiles large variations in gene population were detected as an indication that a significant amount of orthologs between both species differs in compositional features. A gene cluster containing exclusively genes codifying for transmembrane proteins, an important fraction of which belongs to the Rhodopsin G-protein coupled receptor superfamily, was also detected. This indicates that even though some of them display low sequence similarity, all of them, in both species, participate with similar compositional features in terms of composons. We conclude that in this family of transmembrane proteins in general and in the Rhodopsin G-protein coupled receptor in particular, the composon-clustering reveals the existence of a type of common compositional structure underlying the primary nucleotide sequence closely correlated to function.

  9. FY05 LDRD Fianl Report Investigation of AAA+ protein machines that participate in DNA replication, recombination, and in response to DNA damage LDRD Project Tracking Code: 04-LW-049

    SciTech Connect

    Sawicka, D; de Carvalho-Kavanagh, M S; Barsky, D; Venclovas, C

    2006-12-04

    The AAA+ proteins are remarkable macromolecules that are able to self-assemble into nanoscale machines. These protein machines play critical roles in many cellular processes, including the processes that manage a cell's genetic material, but the mechanism at the molecular level has remained elusive. We applied computational molecular modeling, combined with advanced sequence analysis and available biochemical and genetic data, to structurally characterize eukaryotic AAA+ proteins and the protein machines they form. With these models we have examined intermolecular interactions in three-dimensions (3D), including both interactions between the components of the AAA+ complexes and the interactions of these protein machines with their partners. These computational studies have provided new insights into the molecular structure and the mechanism of action for AAA+ protein machines, thereby facilitating a deeper understanding of processes involved in DNA metabolism.

  10. Eukaryotic transcriptomics in silico: optimizing cDNA-AFLP efficiency.

    PubMed

    Stölting, Kai N; Gort, Gerrit; Wüst, Christian; Wilson, Anthony B

    2009-11-30

    Complementary-DNA based amplified fragment length polymorphism (cDNA-AFLP) is a commonly used tool for assessing the genetic regulation of traits through the correlation of trait expression with cDNA expression profiles. In spite of the frequent application of this method, studies on the optimization of the cDNA-AFLP assay design are rare and have typically been taxonomically restricted. Here, we model cDNA-AFLPs on all 92 eukaryotic species for which cDNA pools are currently available, using all combinations of eight restriction enzymes standard in cDNA-AFLP screens. In silco simulations reveal that cDNA pool coverage is largely determined by the choice of individual restriction enzymes and that, through the choice of optimal enzyme combinations, coverage can be increased from <40% to 75% without changing the underlying experimental design. We find evidence of phylogenetic signal in the coverage data, which is largely mediated by organismal GC content. There is nonetheless a high degree of consistency in cDNA pool coverage for particular enzyme combinations, indicating that our recommendations should be applicable to most eukaryotic systems. We also explore the relationship between the average observed fragment number per selective AFLP-PCR reaction and the size of the underlying cDNA pool, and show how AFLP experiments can be used to estimate the number of genes expressed in a target tissue. The insights gained from in silico screening of cDNA-AFLPs from a broad sampling of eukaryotes provide a set of guidelines that should help to substantially increase the efficiency of future cDNA-AFLP experiments in eukaryotes. In silico simulations also suggest a novel use of cDNA-AFLP screens to determine the number of transcripts expressed in a target tissue, an application that should be invaluable as next-generation sequencing technologies are adapted for differential display.

  11. 5-Hydroxymethyluracil in the DNA of a Dinoflagellate

    PubMed Central

    Rae, Peter M. M.

    1973-01-01

    During the characterization of DNA from the dinoflagellate Gyrodinium cohnii, a large discrepancy was detected between the estimation of guanine + cytosine content from the buoyant density of the DNA in CsCl (56.1% G+C) and from the midpoint (Tm) of its hyperchromicity induced by a thermal gradient (35.6% G+C). Composition analyses of 32P-labeled nucleotides revealed an actual G+C content of 41.3%, and the presence of an unusual nucleotide amounting to about 37% of the expected thymidylate in unfractionated DNA-a feature that can explain the aberrant behavior of the DNA. The chromatographic properties of the unusual base and UV spectral analyses of the base and its corresponding nucleotide are consistent with its identification as hydroxymethyluracil. This base is not uniformly interspersed with thymine in the DNA. About 10% of Gyrodinium DNA is contributed by a fraction with low hydroxymethyluracil content, which behaves anomalously in Ag+-Cs2SO4 density gradients but not in CsCl. Images PMID:4515611

  12. DNA rearrangements located over 100 kb 5' of the Steel (Sl)-coding region in Steel-panda and Steel-contrasted mice deregulate Sl expression and cause female sterility by disrupting ovarian follicle development.

    PubMed

    Bedell, M A; Brannan, C I; Evans, E P; Copeland, N G; Jenkins, N A; Donovan, P J

    1995-02-15

    The Steel (Sl) locus is essential for the development of germ cells, hematopoietic cells, and melanocytes and encodes a growth factor (Mgf) that is the ligand for c-kit, a receptor tyrosine kinase encoded by the W locus. We have identified the molecular and germ cell defects in two mutant Sl alleles, Steel-panda (Slpan) and Steel-contrasted (Slcon), that cause sterility only in females. Unexpectedly, both mutant alleles are shown to contain DNA rearrangements, located > 100 kb 5' of Mgf-coding sequences, that lead to tissue-specific effects on Mgf mRNA expression. In Slpan embryos, decreased Mgf mRNA expression in the gonads causes a reduced number of primordial germ cells in both sexes. However, Mgf expression and spermatogenesis in the postnatal mutant tests is normal, and spermatogonial proliferation compensates for deficiencies in germ cell numbers. In Slpan and Slcon homozygous females, decreased Mgf mRNA expression causes sterility by affecting the initiation and maintenance of ovarian follicle development. Thus, regulated expression of Mgf is required for multiple stages of embryonic and postnatal germ cell development. Surprisingly, other areas of the Slcon female reproductive tract displayed ectopic expression of Mgf mRNA. We propose that the Slpan and Slcon rearrangements alter Mgf mRNA abundance through position effects on expression that act at a distance from the Sl gene.

  13. GC-Rich Extracellular DNA Induces Oxidative Stress, Double-Strand DNA Breaks, and DNA Damage Response in Human Adipose-Derived Mesenchymal Stem Cells

    PubMed Central

    Smirnova, Tatiana; Kameneva, Larisa; Porokhovnik, Lev; Speranskij, Anatolij; Ershova, Elizaveta; Stukalov, Sergey; Izevskaya, Vera; Veiko, Natalia

    2015-01-01

    Background. Cell free DNA (cfDNA) circulates throughout the bloodstream of both healthy people and patients with various diseases. CfDNA is substantially enriched in its GC-content as compared with human genomic DNA. Principal Findings. Exposure of haMSCs to GC-DNA induces short-term oxidative stress (determined with H2DCFH-DA) and results in both single- and double-strand DNA breaks (comet assay and γH2AX, foci). As a result in the cells significantly increases the expression of repair genes (BRCA1 (RT-PCR), PCNA (FACS)) and antiapoptotic genes (BCL2 (RT-PCR and FACS), BCL2A1, BCL2L1, BIRC3, and BIRC2 (RT-PCR)). Under the action of GC-DNA the potential of mitochondria was increased. Here we show that GC-rich extracellular DNA stimulates adipocyte differentiation of human adipose-derived mesenchymal stem cells (haMSCs). Exposure to GC-DNA leads to an increase in the level of RNAPPARG2 and LPL (RT-PCR), in the level of fatty acid binding protein FABP4 (FACS analysis) and in the level of fat (Oil Red O). Conclusions. GC-rich fragments in the pool of cfDNA can potentially induce oxidative stress and DNA damage response and affect the direction of mesenchymal stem cells differentiation in human adipose—derived mesenchymal stem cells. Such a response may be one of the causes of obesity or osteoporosis. PMID:26273425

  14. Structural evolution of nrDNA ITS in Pinaceae and its phylogenetic implications.

    PubMed

    Kan, Xian-Zhao; Wang, Shan-Shan; Ding, Xin; Wang, Xiao-Quan

    2007-08-01

    Nuclear ribosomal DNA (nrDNA) has been considered as an important tool for inferring phylogenetic relationships at many taxonomic levels. In comparison with its fast concerted evolution in angiosperms, nrDNA is symbolized by slow concerted evolution and substantial ITS region length variation in gymnosperms, particularly in Pinaceae. Here we studied structure characteristics, including subrepeat composition, size, GC content and secondary structure, of nrDNA ITS regions of all Pinaceae genera. The results showed that the ITS regions of all taxa studied contained subrepeat units, ranging from 2 to 9 in number, and these units could be divided into two types, longer subrepeat (LSR) without the motif (5'-GGCCACCCTAGTC) and shorter subrepeat (SSR) with the motif. Phylogenetic analyses indicate that the homology of some SSRs still can be recognized, providing important informations for the evolutionary history of nrDNA ITS and phylogeny of Pinaceae. In particular, the adjacent tandem SSRs are not more closely related to one another than they are to remote SSRs in some genera, which may imply that multiple structure variations such as recombination have occurred in the ITS1 region of these groups. This study also found that GC content in the ITS1 region is relevant to its sequence length and subrepeat number, and could provide some phylogenetic information, especially supporting the close relationships among Picea, Pinus, and Cathaya. Moreover, several characteristics of the secondary structure of Pinaceae ITS1 were found as follows: (1) the structure is dominated by several extended hairpins; (2) the configuration complexity is positively correlated with subrepeat number; (3) paired subrepeats often partially overlap at the conserved motif (5'-GGCCACCCTAGTC), and form a long stem, while other subrepeats fold onto itself, leaving part of the conserved motif exposed in hairpin loops.

  15. What Advances Are Being Made in DNA Sequencing?

    MedlinePlus

    ... of DNA building blocks (nucleotides) in an individual's genetic code, called DNA sequencing, has advanced the study of ... a breakthrough that helped scientists determine the human genetic code, but it is time-consuming and expensive. The ...

  16. Nucleotide sequence of the LuxC gene and the upstream DNA from the bioluminescent system of Vibrio harveyi.

    PubMed Central

    Miyamoto, C M; Graham, A F; Meighen, E A

    1988-01-01

    The nucleotide sequence of the luxC gene (1431 bp) and the upstream DNA (1049 bp) of the luminescent bacterium Vibrio harveyi has been determined. The luxC gene can be translated into a polypeptide of 55 kDa in excellent agreement with the molecular mass of the reductase polypeptide required for synthesis of the aldehyde substrate for the bioluminescent reaction. Analyses of codon usage showed a high frequency (1.9%) of the isoleucine codon, AUA, in the luxC gene compared to that found in Escherichia coli genes (0.2%) and its absence in the luxA, B and D genes. The low G/C content of the luxC gene and upstream DNA (38-39%) compared to that found in the other lux genes of V. harveyi (45%) was primarily due to a stretch of 500 nucleotides with only a 24% G/C content, extending from 200 bp inside lux C to 300 bp upstream. Moreover, an open reading frame did not extend for more than 48 codons between the luxC gene and 600 bp upstream at which point a gene transcribed in the opposite direction started. As the lux system in the luminescent bacterium, V. fischeri, contains a regulatory gene immediately upstream of luxC transcribed in the same direction, these results show that the organization and regulation of the lux genes have diverged in different luminescent bacteria. PMID:3347497

  17. Intra-genomic GC heterogeneity in sauropsids: evolutionary insights from cDNA mapping and GC3 profiling in snake

    PubMed Central

    2012-01-01

    Background Extant sauropsids (reptiles and birds) are divided into two major lineages, the lineage of Testudines (turtles) and Archosauria (crocodilians and birds) and the lineage of Lepidosauria (tuatara, lizards, worm lizards and snakes). Karyotypes of these sauropsidan groups generally consist of macrochromosomes and microchromosomes. In chicken, microchromosomes exhibit a higher GC-content than macrochromosomes. To examine the pattern of intra-genomic GC heterogeneity in lepidosaurian genomes, we constructed a cytogenetic map of the Japanese four-striped rat snake (Elaphe quadrivirgata) with 183 cDNA clones by fluorescence in situ hybridization, and examined the correlation between the GC-content of exonic third codon positions (GC3) of the genes and the size of chromosomes on which the genes were localized. Results Although GC3 distribution of snake genes was relatively homogeneous compared with those of the other amniotes, microchromosomal genes showed significantly higher GC3 than macrochromosomal genes as in chicken. Our snake cytogenetic map also identified several conserved segments between the snake macrochromosomes and the chicken microchromosomes. Cross-species comparisons revealed that GC3 of most snake orthologs in such macrochromosomal segments were GC-poor (GC3 < 50%) whereas those of chicken orthologs in microchromosomes were relatively GC-rich (GC3 ≥ 50%). Conclusion Our results suggest that the chromosome size-dependent GC heterogeneity had already occurred before the lepidosaur-archosaur split, 275 million years ago. This character was probably present in the common ancestor of lepidosaurs and but lost in the lineage leading to Anolis during the diversification of lepidosaurs. We also identified several genes whose GC-content might have been influenced by the size of the chromosomes on which they were harbored over the course of sauropsid evolution. PMID:23140509

  18. Error-correction coding

    NASA Technical Reports Server (NTRS)

    Hinds, Erold W. (Principal Investigator)

    1996-01-01

    This report describes the progress made towards the completion of a specific task on error-correcting coding. The proposed research consisted of investigating the use of modulation block codes as the inner code of a concatenated coding system in order to improve the overall space link communications performance. The study proposed to identify and analyze candidate codes that will complement the performance of the overall coding system which uses the interleaved RS (255,223) code as the outer code.

  19. Sensitive Detection of Polyalanine Expansions in PHOX2B by Polymerase Chain Reaction Using Bisulfite-Converted DNA

    PubMed Central

    Horiuchi, Hidekazu; Sasaki, Ayako; Osawa, Motoki; Kijima, Kazuki; Ino, Yukiko; Matoba, Ryoji; Hayasaka, Kiyoshi

    2005-01-01

    Congenital central hypoventilation syndrome, also known as Ondine’s curse, is characterized by idiopathic abnormal control of respiration during sleep. Recent studies indicate that a polyalanine expansion of PHOX2B is relevant to the pathogenesis of this disorder. However, it is difficult to detect the repeated tract because its high GC content inhibits conventional polymerase chain reaction (PCR) amplification. Here, we describe a bisulfite treatment for DNA in which uracil is obtained by deamination of unmethylated cytosine residues. Deamination of DNA permitted direct PCR amplification that yielded a product of 123 bp for the common 20-residue repetitive tract with replacement of C with T by sequencing. It settled allele dropouts accompanied by insufficient amplification of expanded alleles. The defined procedure dramatically improved detection of expansions to 9 of 10 congenital central hypoventilation syndrome patients examined in a previous study. The chemical conversion of DNA before PCR amplification facilitates effective detection of GC-rich polyalanine tracts. PMID:16258163

  20. An integrated, structure- and energy-based view of the genetic code

    PubMed Central

    Grosjean, Henri; Westhof, Eric

    2016-01-01

    The principles of mRNA decoding are conserved among all extant life forms. We present an integrative view of all the interaction networks between mRNA, tRNA and rRNA: the intrinsic stability of codon–anticodon duplex, the conformation of the anticodon hairpin, the presence of modified nucleotides, the occurrence of non-Watson–Crick pairs in the codon–anticodon helix and the interactions with bases of rRNA at the A-site decoding site. We derive a more information-rich, alternative representation of the genetic code, that is circular with an unsymmetrical distribution of codons leading to a clear segregation between GC-rich 4-codon boxes and AU-rich 2:2-codon and 3:1-codon boxes. All tRNA sequence variations can be visualized, within an internal structural and energy framework, for each organism, and each anticodon of the sense codons. The multiplicity and complexity of nucleotide modifications at positions 34 and 37 of the anticodon loop segregate meaningfully, and correlate well with the necessity to stabilize AU-rich codon–anticodon pairs and to avoid miscoding in split codon boxes. The evolution and expansion of the genetic code is viewed as being originally based on GC content with progressive introduction of A/U together with tRNA modifications. The representation we present should help the engineering of the genetic code to include non-natural amino acids. PMID:27448410

  1. Chilean Pitavia more closely related to Oceania and Old World Rutaceae than to Neotropical groups: evidence from two cpDNA non-coding regions, with a new subfamilial classification of the family

    PubMed Central

    Groppo, Milton; Kallunki, Jacquelyn A.; Pirani, José Rubens; Antonelli, Alexandre

    2012-01-01

    Abstract The position of the plant genus Pitavia within an infrafamilial phylogeny of Rutaceae (rue, or orange family) was investigated with the use of two non-coding regions from cpDNA, the trnL-trnF region and the rps16 intron. The only species of the genus, Pitavia punctata Molina, is restricted to the temperate forests of the Coastal Cordillera of Central-Southern Chile and threatened by loss of habitat. The genus traditionally has been treated as part of tribe Zanthoxyleae (subfamily Rutoideae) where it constitutes the monogeneric tribe Pitaviinae. This tribe and genus are characterized by fruits of 1 to 4 fleshy drupelets, unlike the dehiscent fruits typical of the subfamily. Fifty-five taxa of Rutaceae, representing 53 genera (nearly one-third of those in the family) and all subfamilies, tribes, and almost all subtribes of the family were included. Parsimony and Bayesian inference were used to infer the phylogeny; six taxa of Meliaceae, Sapindaceae, and Simaroubaceae, all members of Sapindales, were also used as out-groups. Results from both analyses were congruent and showed Pitavia as sister to Flindersia and Lunasia, both genera with species scattered through Australia, Philippines, Moluccas, New Guinea and the Malayan region, and phylogenetically far from other Neotropical Rutaceae, such as the Galipeinae (Galipeeae, Rutoideae) and Pteleinae (Toddalieae, former Toddalioideae). Additionally, a new circumscription of the subfamilies of Rutaceae is presented and discussed. Only two subfamilies (both monophyletic) are recognized: Cneoroideae (including Dictyolomatoideae, Spathelioideae, Cneoraceae, and Ptaeroxylaceae) and Rutoideae (including not only traditional Rutoideae but also Aurantioideae, Flindersioideae, and Toddalioideae). As a consequence, Aurantioideae (Citrus and allies) is reduced to tribal rank as Aurantieae. PMID:23717188

  2. TU-EF-304-10: Efficient Multiscale Simulation of the Proton Relative Biological Effectiveness (RBE) for DNA Double Strand Break (DSB) Induction and Bio-Effective Dose in the FLUKA Monte Carlo Radiation Transport Code

    SciTech Connect

    Moskvin, V; Tsiamas, P; Axente, M; Farr, J; Stewart, R

    2015-06-15

    Purpose: One of the more critical initiating events for reproductive cell death is the creation of a DNA double strand break (DSB). In this study, we present a computationally efficient way to determine spatial variations in the relative biological effectiveness (RBE) of proton therapy beams within the FLUKA Monte Carlo (MC) code. Methods: We used the independently tested Monte Carlo Damage Simulation (MCDS) developed by Stewart and colleagues (Radiat. Res. 176, 587–602 2011) to estimate the RBE for DSB induction of monoenergetic protons, tritium, deuterium, hellium-3, hellium-4 ions and delta-electrons. The dose-weighted (RBE) coefficients were incorporated into FLUKA to determine the equivalent {sup 6}°60Co γ-ray dose for representative proton beams incident on cells in an aerobic and anoxic environment. Results: We found that the proton beam RBE for DSB induction at the tip of the Bragg peak, including primary and secondary particles, is close to 1.2. Furthermore, the RBE increases laterally to the beam axis at the area of Bragg peak. At the distal edge, the RBE is in the range from 1.3–1.4 for cells irradiated under aerobic conditions and may be as large as 1.5–1.8 for cells irradiated under anoxic conditions. Across the plateau region, the recorded RBE for DSB induction is 1.02 for aerobic cells and 1.05 for cells irradiated under anoxic conditions. The contribution to total effective dose from secondary heavy ions decreases with depth and is higher at shallow depths (e.g., at the surface of the skin). Conclusion: Multiscale simulation of the RBE for DSB induction provides useful insights into spatial variations in proton RBE within pristine Bragg peaks. This methodology is potentially useful for the biological optimization of proton therapy for the treatment of cancer. The study highlights the need to incorporate spatial variations in proton RBE into proton therapy treatment plans.

  3. Determination of DNA Content of Aquatic Bacteria by Flow Cytometry

    PubMed Central

    Button, D. K.; Robertson, Betsy R.

    2001-01-01

    The distribution of DNA among bacterioplankton and bacterial isolates was determined by flow cytometry of DAPI (4′,6′-diamidino-2-phenylindole)-stained organisms. Conditions were optimized to minimize error from nonspecific staining, AT bias, DNA packing, changes in ionic strength, and differences in cell permeability. The sensitivity was sufficient to characterize the small 1- to 2-Mb-genome organisms in freshwater and seawater, as well as low-DNA cells (“dims”). The dims could be formed from laboratory cultivars; their apparent DNA content was 0.1 Mb and similar to that of many particles in seawater. Preservation with formaldehyde stabilized samples until analysis. Further permeabilization with Triton X-100 facilitated the penetration of stain into stain-resistant lithotrophs. The amount of DNA per cell determined by flow cytometry agreed with mean values obtained from spectrophotometric analyses of cultures. Correction for the DNA AT bias of the stain was made for bacterial isolates with known G+C contents. The number of chromosome copies per cell was determined with pure cultures, which allowed growth rate analyses based on cell cycle theory. The chromosome ratio was empirically related to the rate of growth, and the rate of growth was related to nutrient concentration through specific affinity theory to obtain a probe for nutrient kinetics. The chromosome size of a Marinobacter arcticus isolate was determined to be 3.0 Mb by this method. In a typical seawater sample the distribution of bacterial DNA revealed two major populations based on DNA content that were not necessarily similar to populations determined by using other stains or protocols. A mean value of 2.5 fg of DNA cell−1 was obtained for a typical seawater sample, and 90% of the population contained more than 1.1 fg of DNA cell−1. PMID:11282616

  4. Homological stabilizer codes

    SciTech Connect

    Anderson, Jonas T.

    2013-03-15

    In this paper we define homological stabilizer codes on qubits which encompass codes such as Kitaev's toric code and the topological color codes. These codes are defined solely by the graphs they reside on. This feature allows us to use properties of topological graph theory to determine the graphs which are suitable as homological stabilizer codes. We then show that all toric codes are equivalent to homological stabilizer codes on 4-valent graphs. We show that the topological color codes and toric codes correspond to two distinct classes of graphs. We define the notion of label set equivalencies and show that under a small set of constraints the only homological stabilizer codes without local logical operators are equivalent to Kitaev's toric code or to the topological color codes. - Highlights: Black-Right-Pointing-Pointer We show that Kitaev's toric codes are equivalent to homological stabilizer codes on 4-valent graphs. Black-Right-Pointing-Pointer We show that toric codes and color codes correspond to homological stabilizer codes on distinct graphs. Black-Right-Pointing-Pointer We find and classify all 2D homological stabilizer codes. Black-Right-Pointing-Pointer We find optimal codes among the homological stabilizer codes.

  5. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  6. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  7. Model Children's Code.

    ERIC Educational Resources Information Center

    New Mexico Univ., Albuquerque. American Indian Law Center.

    The Model Children's Code was developed to provide a legally correct model code that American Indian tribes can use to enact children's codes that fulfill their legal, cultural and economic needs. Code sections cover the court system, jurisdiction, juvenile offender procedures, minor-in-need-of-care, and termination. Almost every Code section is…

  8. Coding of Neuroinfectious Diseases.

    PubMed

    Barkley, Gregory L

    2015-12-01

    Accurate coding is an important function of neurologic practice. This contribution to Continuum is part of an ongoing series that presents helpful coding information along with examples related to the issue topic. Tips for diagnosis coding, Evaluation and Management coding, procedure coding, or a combination are presented, depending on which is most applicable to the subject area of the issue.

  9. Diagnostic Coding for Epilepsy.

    PubMed

    Williams, Korwyn; Nuwer, Marc R; Buchhalter, Jeffrey R

    2016-02-01

    Accurate coding is an important function of neurologic practice. This contribution to Continuum is part of an ongoing series that presents helpful coding information along with examples related to the issue topic. Tips for diagnosis coding, Evaluation and Management coding, procedure coding, or a combination are presented, depending on which is most applicable to the subject area of the issue.

  10. Phylogeny of genetic codes and punctuation codes within genetic codes.

    PubMed

    Seligmann, Hervé

    2015-03-01

    Punctuation codons (starts, stops) delimit genes, reflect translation apparatus properties. Most codon reassignments involve punctuation. Here two complementary approaches classify natural genetic codes: (A) properties of amino acids assigned to codons (classical phylogeny), coding stops as X (A1, antitermination/suppressor tRNAs insert unknown residues), or as gaps (A2, no translation, classical stop); and (B) considering only punctuation status (start, stop and other codons coded as -1, 0 and 1 (B1); 0, -1 and 1 (B2, reflects ribosomal translational dynamics); and 1, -1, and 0 (B3, starts/stops as opposites)). All methods separate most mitochondrial codes from most nuclear codes; Gracilibacteria consistently cluster with metazoan mitochondria; mitochondria co-hosted with chloroplasts cluster with nuclear codes. Method A1 clusters the euplotid nuclear code with metazoan mitochondria; A2 separates euplotids from mitochondria. Firmicute bacteria Mycoplasma/Spiroplasma and Protozoan (and lower metazoan) mitochondria share codon-amino acid assignments. A1 clusters them with mitochondria, they cluster with the standard genetic code under A2: constraints on amino acid ambiguity versus punctuation-signaling produced the mitochondrial versus bacterial versions of this genetic code. Punctuation analysis B2 converges best with classical phylogenetic analyses, stressing the need for a unified theory of genetic code punctuation accounting for ribosomal constraints.

  11. Direct Sequencing from the Minimal Number of DNA Molecules Needed to Fill a 454 Picotiterplate

    PubMed Central

    Martínez-Priego, Llúcia; D’Auria, Giussepe; Calafell, Francesc; Moya, Andrés

    2014-01-01

    The large amount of DNA needed to prepare a library in next generation sequencing protocols hinders direct sequencing of small DNA samples. This limitation is usually overcome by the enrichment of such samples with whole genome amplification (WGA), mostly by multiple displacement amplification (MDA) based on φ29 polymerase. However, this technique can be biased by the GC content of the sample and is prone to the development of chimeras as well as contamination during enrichment, which contributes to undesired noise during sequence data analysis, and also hampers the proper functional and/or taxonomic assignments. An alternative to MDA is direct DNA sequencing (DS), which represents the theoretical gold standard in genome sequencing. In this work, we explore the possibility of sequencing the genome of Escherichia coli from the minimum number of DNA molecules required for pyrosequencing, according to the notion of one-bead-one-molecule. Using an optimized protocol for DS, we constructed a shotgun library containing the minimum number of DNA molecules needed to fill a selected region of a picotiterplate. We gathered most of the reference genome extension with uniform coverage. We compared the DS method with MDA applied to the same amount of starting DNA. As expected, MDA yielded a sparse and biased read distribution, with a very high amount of unassigned and unspecific DNA amplifications. The optimized DS protocol allows unbiased sequencing to be performed from samples with a very small amount of DNA. PMID:24887077

  12. Genome-wide profiling of yeast DNA:RNA hybrid prone sites with DRIP-chip.

    PubMed

    Chan, Yujia A; Aristizabal, Maria J; Lu, Phoebe Y T; Luo, Zongli; Hamza, Akil; Kobor, Michael S; Stirling, Peter C; Hieter, Philip

    2014-04-01

    DNA:RNA hybrid formation is emerging as a significant cause of genome instability in biological systems ranging from bacteria to mammals. Here we describe the genome-wide distribution of DNA:RNA hybrid prone loci in Saccharomyces cerevisiae by DNA:RNA immunoprecipitation (DRIP) followed by hybridization on tiling microarray. These profiles show that DNA:RNA hybrids preferentially accumulated at rDNA, Ty1 and Ty2 transposons, telomeric repeat regions and a subset of open reading frames (ORFs). The latter are generally highly transcribed and have high GC content. Interestingly, significant DNA:RNA hybrid enrichment was also detected at genes associated with antisense transcripts. The expression of antisense-associated genes was also significantly altered upon overexpression of RNase H, which degrades the RNA in hybrids. Finally, we uncover mutant-specific differences in the DRIP profiles of a Sen1 helicase mutant, RNase H deletion mutant and Hpr1 THO complex mutant compared to wild type, suggesting different roles for these proteins in DNA:RNA hybrid biology. Our profiles of DNA:RNA hybrid prone loci provide a resource for understanding the properties of hybrid-forming regions in vivo, extend our knowledge of hybrid-mitigating enzymes, and contribute to models of antisense-mediated gene regulation. A summary of this paper was presented at the 26th International Conference on Yeast Genetics and Molecular Biology, August 2013.

  13. Assembly of large, high G+C bacterial DNA fragments in yeast.

    PubMed

    Noskov, Vladimir N; Karas, Bogumil J; Young, Lei; Chuang, Ray-Yuan; Gibson, Daniel G; Lin, Ying-Chi; Stam, Jason; Yonemoto, Isaac T; Suzuki, Yo; Andrews-Pfannkoch, Cynthia; Glass, John I; Smith, Hamilton O; Hutchison, Clyde A; Venter, J Craig; Weyman, Philip D

    2012-07-20

    The ability to assemble large pieces of prokaryotic DNA by yeast recombination has great application in synthetic biology, but cloning large pieces of high G+C prokaryotic DNA in yeast can be challenging. Additional considerations in cloning large pieces of high G+C DNA in yeast may be related to toxic genes, to the size of the DNA, or to the absence of yeast origins of replication within the sequence. As an example of our ability to clone high G+C DNA in yeast, we chose to work with Synechococcus elongatus PCC 7942, which has an average G+C content of 55%. We determined that no regions of the chromosome are toxic to yeast and that S. elongatus DNA fragments over ~200 kb are not stably maintained. DNA constructs with a total size under 200 kb could be readily assembled, even with 62 kb of overlapping sequence between pieces. Addition of yeast origins of replication throughout allowed us to increase the total size of DNA that could be assembled to at least 454 kb. Thus, cloning strategies utilizing yeast recombination with large, high G+C prokaryotic sequences should include yeast origins of replication as a part of the design process.

  14. To Code or Not To Code?

    ERIC Educational Resources Information Center

    Parkinson, Brian; Sandhu, Parveen; Lacorte, Manel; Gourlay, Lesley

    1998-01-01

    This article considers arguments for and against the use of coding systems in classroom-based language research and touches on some relevant considerations from ethnographic and conversational analysis approaches. The four authors each explain and elaborate on their practical decision to code or not to code events or utterances at a specific point…

  15. Biological basis of miRNA action when their targets are located in human protein coding region.

    PubMed

    Gu, Wanjun; Wang, Xiaofei; Zhai, Chuanying; Zhou, Tong; Xie, Xueying

    2013-01-01

    Recent analyses have revealed many functional microRNA (miRNA) targets in mammalian protein coding regions. But, the mechanisms that ensure miRNA function when their target sites are located in protein coding regions of mammalian mRNA transcripts are largely unknown. In this paper, we investigate some potential biological factors, such as target site accessibility and local translation efficiency. We computationally analyze these two factors using experimentally identified miRNA targets in human protein coding region. We find site accessibility is significantly increased in miRNA target region to facilitate miRNA binding. At the mean time, local translation efficiency is also selectively decreased near miRNA target region. GC-poor codons are preferred in the flank region of miRNA target sites to ease the access of miRNA targets. Within-genome analysis shows substantial variations of site accessibility and local translation efficiency among different miRNA targets in the genome. Further analyses suggest target gene's GC content and conservation level could explain some of the differences in site accessibility. On the other hand, target gene's functional importance and conservation level can affect local translation efficiency near miRNA target region. We hence propose both site accessibility and local translation efficiency are important in miRNA action when miRNA target sites are located in mammalian protein coding regions.

  16. Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA.

    PubMed

    Marine, Rachel; Polson, Shawn W; Ravel, Jacques; Hatfull, Graham; Russell, Daniel; Sullivan, Matthew; Syed, Fraz; Dumas, Michael; Wommack, K Eric

    2011-11-01

    Construction of DNA fragment libraries for next-generation sequencing can prove challenging, especially for samples with low DNA yield. Protocols devised to circumvent the problems associated with low starting quantities of DNA can result in amplification biases that skew the distribution of genomes in metagenomic data. Moreover, sample throughput can be slow, as current library construction techniques are time-consuming. This study evaluated Nextera, a new transposon-based method that is designed for quick production of DNA fragment libraries from a small quantity of DNA. The sequence read distribution across nine phage genomes in a mock viral assemblage met predictions for six of the least-abundant phages; however, the rank order of the most abundant phages differed slightly from predictions. De novo genome assemblies from Nextera libraries provided long contigs spanning over half of the phage genome; in four cases where full-length genome sequences were available for comparison, consensus sequences were found to match over 99% of the genome with near-perfect identity. Analysis of areas of low and high sequence coverage within phage genomes indicated that GC content may influence coverage of sequences from Nextera libraries. Comparisons of phage genomes prepared using both Nextera and a standard 454 FLX Titanium library preparation protocol suggested that the coverage biases according to GC content observed within the Nextera libraries were largely attributable to bias in the Nextera protocol rather than to the 454 sequencing technology. Nevertheless, given suitable sequence coverage, the Nextera protocol produced high-quality data for genomic studies. For metagenomics analyses, effects of GC amplification bias would need to be considered; however, the library preparation standardization that Nextera provides should benefit comparative metagenomic analyses.

  17. Bare Code Reader

    NASA Astrophysics Data System (ADS)

    Clair, Jean J.

    1980-05-01

    The Bare code system will be used, in every market and supermarket. The code, which is normalised in US and Europe (code EAN) gives informations on price, storage, nature and allows in real time the gestion of theshop.

  18. Generalized concatenated quantum codes

    SciTech Connect

    Grassl, Markus; Shor, Peter; Smith, Graeme; Smolin, John; Zeng Bei

    2009-05-15

    We discuss the concept of generalized concatenated quantum codes. This generalized concatenation method provides a systematical way for constructing good quantum codes, both stabilizer codes and nonadditive codes. Using this method, we construct families of single-error-correcting nonadditive quantum codes, in both binary and nonbinary cases, which not only outperform any stabilizer codes for finite block length but also asymptotically meet the quantum Hamming bound for large block length.

  19. DNA Polymerases of Low-GC Gram-Positive Eubacteria: Identification of the Replication-Specific Enzyme Encoded by dnaE

    PubMed Central

    Barnes, Marjorie H.; Miller, Shelley D.; Brown, Neal C.

    2002-01-01

    dnaE, the gene encoding one of the two replication-specific DNA polymerases (Pols) of low-GC-content gram-positive bacteria (E. Dervyn et al., Science 294:1716-1719, 2001; R. Inoue et al., Mol. Genet. Genomics 266:564-571, 2001), was cloned from Bacillus subtilis, a model low-GC gram-positive organism. The gene was overexpressed in Escherichia coli. The purified recombinant product displayed inhibitor responses and physical, catalytic, and antigenic properties indistinguishable from those of the low-GC gram-positive-organism-specific enzyme previously named DNA Pol II after the polB-encoded DNA Pol II of E. coli. Whereas a polB-like gene is absent from low-GC gram-positive genomes and whereas the low-GC gram-positive DNA Pol II strongly conserves a dnaE-like, Pol III primary structure, it is proposed that it be renamed DNA polymerase III E (Pol III E) to accurately reflect its replicative function and its origin from dnaE. It is also proposed that DNA Pol III, the other replication-specific Pol of low-GC gram-positive organisms, be renamed DNA polymerase III C (Pol III C) to denote its origin from polC. By this revised nomenclature, the DNA Pols that are expressed constitutively in low-GC gram-positive bacteria would include DNA Pol I, the dispensable repair enzyme encoded by polA, and the two essential, replication-specific enzymes Pol III C and Pol III E, encoded, respectively, by polC and dnaE. PMID:12081953

  20. Genome size and metabolic intensity in tetrapods: a tale of two lines.

    PubMed

    Vinogradov, Alexander E; Anatskaya, Olga V

    2006-01-07

    We show the negative link between genome size and metabolic intensity in tetrapods, using the heart index (relative heart mass) as a unified indicator of metabolic intensity in poikilothermal and homeothermal animals. We found two separate regression lines of heart index on genome size for reptiles-birds and amphibians-mammals (the slope of regression is steeper in reptiles-birds). We also show a negative correlation between GC content and nucleosome formation potential in vertebrate DNA, and, consistent with this relationship, a positive correlation between genome GC content and nuclear size (independent of genome size). It is known that there are two separate regression lines of genome GC content on genome size for reptiles-birds and amphibians-mammals: reptiles-birds have the relatively higher GC content (for their genome sizes) compared to amphibians-mammals. Our results suggest uniting all these data into one concept. The slope of negative regression between GC content and nucleosome formation potential is steeper in exons than in non-coding DNA (where nucleosome formation potential is generally higher), which indicates a special role of non-coding DNA for orderly chromatin organization. The chromatin condensation and nuclear size are supposed to be key parameters that accommodate the effects of both genome size and GC content and connect them with metabolic intensity. Our data suggest that the reptilian-birds clade evolved special relationships among these parameters, whereas mammals preserved the amphibian-like relationships. Surprisingly, mammals, although acquiring a more complex general organization, seem to retain certain genome-related properties that are similar to amphibians. At the same time, the slope of regression between nucleosome formation potential and GC content is steeper in poikilothermal than in homeothermal genomes, which suggests that mammals and birds acquired certain common features of genomic organization.

  1. Pliable DNA Conformation of Response Elements Bound to Transcription Factor p63

    SciTech Connect

    Chen, Chen; Gorlatova, Natalia; Herzberg, Osnat

    2012-05-02

    We show that changes in the nucleotide sequence alter the DNA conformation in the crystal structures of p63 DNA-binding domain (p63DBD) bound to its response element. The conformation of a 22-bp canonical response element containing an AT spacer between the two half-sites is unaltered compared with that containing a TA spacer, exhibiting superhelical trajectory. In contrast, a GC spacers abolishes the DNA superhelical trajectory and exhibits less bent DNA, suggesting that increased GC content accompanies increased double helix rigidity. A 19-bp DNA, representing an AT-rich response element with overlapping half-sites, maintains superhelical trajectory and reveals two interacting p63DBD dimers crossing one another at 120{sup o}. p63DBD binding assays to response elements of increasing length complement the structural studies. We propose that DNA deformation may affect promoter activity, that the ability of p63DBD to bind to superhelical DNA suggests that it is capable of binding to nucleosomes, and that overlapping response elements may provide a mechanism to distinguish between p63 and p53 promoters.

  2. Accumulate repeat accumulate codes

    NASA Technical Reports Server (NTRS)

    Abbasfar, Aliazam; Divsalar, Dariush; Yao, Kung

    2004-01-01

    In this paper we propose an innovative channel coding scheme called 'Accumulate Repeat Accumulate codes' (ARA). This class of codes can be viewed as serial turbo-like codes, or as a subclass of Low Density Parity Check (LDPC) codes, thus belief propagation can be used for iterative decoding of ARA codes on a graph. The structure of encoder for this class can be viewed as precoded Repeat Accumulate (RA) code or as precoded Irregular Repeat Accumulate (IRA) code, where simply an accumulator is chosen as a precoder. Thus ARA codes have simple, and very fast encoder structure when they representing LDPC codes. Based on density evolution for LDPC codes through some examples for ARA codes, we show that for maximum variable node degree 5 a minimum bit SNR as low as 0.08 dB from channel capacity for rate 1/2 can be achieved as the block size goes to infinity. Thus based on fixed low maximum variable node degree, its threshold outperforms not only the RA and IRA codes but also the best known LDPC codes with the dame maximum node degree. Furthermore by puncturing the accumulators any desired high rate codes close to code rate 1 can be obtained with thresholds that stay close to the channel capacity thresholds uniformly. Iterative decoding simulation results are provided. The ARA codes also have projected graph or protograph representation that allows for high speed decoder implementation.

  3. Coset Codes Viewed as Terminated Convolutional Codes

    NASA Technical Reports Server (NTRS)

    Fossorier, Marc P. C.; Lin, Shu

    1996-01-01

    In this paper, coset codes are considered as terminated convolutional codes. Based on this approach, three new general results are presented. First, it is shown that the iterative squaring construction can equivalently be defined from a convolutional code whose trellis terminates. This convolutional code determines a simple encoder for the coset code considered, and the state and branch labelings of the associated trellis diagram become straightforward. Also, from the generator matrix of the code in its convolutional code form, much information about the trade-off between the state connectivity and complexity at each section, and the parallel structure of the trellis, is directly available. Based on this generator matrix, it is shown that the parallel branches in the trellis diagram of the convolutional code represent the same coset code C(sub 1), of smaller dimension and shorter length. Utilizing this fact, a two-stage optimum trellis decoding method is devised. The first stage decodes C(sub 1), while the second stage decodes the associated convolutional code, using the branch metrics delivered by stage 1. Finally, a bidirectional decoding of each received block starting at both ends is presented. If about the same number of computations is required, this approach remains very attractive from a practical point of view as it roughly doubles the decoding speed. This fact is particularly interesting whenever the second half of the trellis is the mirror image of the first half, since the same decoder can be implemented for both parts.

  4. Concatenated Coding Using Trellis-Coded Modulation

    NASA Technical Reports Server (NTRS)

    Thompson, Michael W.

    1997-01-01

    In the late seventies and early eighties a technique known as Trellis Coded Modulation (TCM) was developed for providing spectrally efficient error correction coding. Instead of adding redundant information in the form of parity bits, redundancy is added at the modulation stage thereby increasing bandwidth efficiency. A digital communications system can be designed to use bandwidth-efficient multilevel/phase modulation such as Amplitude Shift Keying (ASK), Phase Shift Keying (PSK), Differential Phase Shift Keying (DPSK) or Quadrature Amplitude Modulation (QAM). Performance gain can be achieved by increasing the number of signals over the corresponding uncoded system to compensate for the redundancy introduced by the code. A considerable amount of research and development has been devoted toward developing good TCM codes for severely bandlimited applications. More recently, the use of TCM for satellite and deep space communications applications has received increased attention. This report describes the general approach of using a concatenated coding scheme that features TCM and RS coding. Results have indicated that substantial (6-10 dB) performance gains can be achieved with this approach with comparatively little bandwidth expansion. Since all of the bandwidth expansion is due to the RS code we see that TCM based concatenated coding results in roughly 10-50% bandwidth expansion compared to 70-150% expansion for similar concatenated scheme which use convolution code. We stress that combined coding and modulation optimization is important for achieving performance gains while maintaining spectral efficiency.

  5. Simple and efficient method for isolating cDNA fragments of lea3 genes with potential for wide application in the grasses (Poaceae).

    PubMed

    Yu, L; Wu, X; Tang, X; Yan, B

    2010-07-06

    cDNA fragments of lea3 genes with a high GC content (from 68 to 77%) were found in several Poaceae, including Sorghum vulgare, Saccharum officinarum, Oryza officinalis, Oryza meyeriana, Ampelocalamus calcareus, Cynodon dactylon, and Zizania latifoli. They were successfully isolated by means of optimal experimental parameters, which included dimethyl sulfoxide as additive and degenerate primers "AGETKAS" and "AGKDKTG", and their sequences were analyzed. Compared to the method of isolating genes by screening of a cDNA library using abscisic acid- and other stress-responsive cDNA clones, which is time-consuming and costly, this method is relatively easy and inexpensive. Using this new method, many new homologue lea3 genes were rapidly determined.

  6. Discussion on LDPC Codes and Uplink Coding

    NASA Technical Reports Server (NTRS)

    Andrews, Ken; Divsalar, Dariush; Dolinar, Sam; Moision, Bruce; Hamkins, Jon; Pollara, Fabrizio

    2007-01-01

    This slide presentation reviews the progress that the workgroup on Low-Density Parity-Check (LDPC) for space link coding. The workgroup is tasked with developing and recommending new error correcting codes for near-Earth, Lunar, and deep space applications. Included in the presentation is a summary of the technical progress of the workgroup. Charts that show the LDPC decoder sensitivity to symbol scaling errors are reviewed, as well as a chart showing the performance of several frame synchronizer algorithms compared to that of some good codes and LDPC decoder tests at ESTL. Also reviewed is a study on Coding, Modulation, and Link Protocol (CMLP), and the recommended codes. A design for the Pseudo-Randomizer with LDPC Decoder and CRC is also reviewed. A chart that summarizes the three proposed coding systems is also presented.

  7. Discussion on LDPC Codes and Uplink Coding

    NASA Technical Reports Server (NTRS)

    Andrews, Ken; Divsalar, Dariush; Dolinar, Sam; Moision, Bruce; Hamkins, Jon; Pollara, Fabrizio

    2007-01-01

    This slide presentation reviews the progress that the workgroup on Low-Density Parity-Check (LDPC) for space link coding. The workgroup is tasked with developing and recommending new error correcting codes for near-Earth, Lunar, and deep space applications. Included in the presentation is a summary of the technical progress of the workgroup. Charts that show the LDPC decoder sensitivity to symbol scaling errors are reviewed, as well as a chart showing the performance of several frame synchronizer algorithms compared to that of some good codes and LDPC decoder tests at ESTL. Also reviewed is a study on Coding, Modulation, and Link Protocol (CMLP), and the recommended codes. A design for the Pseudo-Randomizer with LDPC Decoder and CRC is also reviewed. A chart that summarizes the three proposed coding systems is also presented.

  8. Bar Codes for Libraries.

    ERIC Educational Resources Information Center

    Rahn, Erwin

    1984-01-01

    Discusses the evolution of standards for bar codes (series of printed lines and spaces that represent numbers, symbols, and/or letters of alphabet) and describes the two types most frequently adopted by libraries--Code-A-Bar and CODE 39. Format of the codes is illustrated. Six references and definitions of terminology are appended. (EJS)

  9. Manually operated coded switch

    DOEpatents

    Barnette, Jon H.

    1978-01-01

    The disclosure relates to a manually operated recodable coded switch in which a code may be inserted, tried and used to actuate a lever controlling an external device. After attempting a code, the switch's code wheels must be returned to their zero positions before another try is made.

  10. DNA structure and function.

    PubMed

    Travers, Andrew; Muskhelishvili, Georgi

    2015-06-01

    The proposal of a double-helical structure for DNA over 60 years ago provided an eminently satisfying explanation for the heritability of genetic information. But why is DNA, and not RNA, now the dominant biological information store? We argue that, in addition to its coding function, the ability of DNA, unlike RNA, to adopt a B-DNA structure confers advantages both for information accessibility and for packaging. The information encoded by DNA is both digital - the precise base specifying, for example, amino acid sequences - and analogue. The latter determines the sequence-dependent physicochemical properties of DNA, for example, its stiffness and susceptibility to strand separation. Most importantly, DNA chirality enables the formation of supercoiling under torsional stress. We review recent evidence suggesting that DNA supercoiling, particularly that generated by DNA translocases, is a major driver of gene regulation and patterns of chromosomal gene organization, and in its guise as a promoter of DNA packaging enables DNA to act as an energy store to facilitate the passage of translocating enzymes such as RNA polymerase.

  11. Improving the PCR protocol to amplify a repetitive DNA sequence.

    PubMed

    Riet, J; Ramos, L R V; Lewis, R V; Marins, L F

    2017-09-21

    Although PCR-based techniques have become an essential tool in the field of molecular and genetic research, the amplification of repetitive DNA sequences is limited. This is due to the truncated nature of the amplified sequences, which are also prone to errors during DNA polymerase-based amplification. The complex structure of repetitive DNA can form hairpin loops, which promote dissociation of the polymerase from the template, impairing complete amplification, and leading to the formation of incomplete fragments that serve as megaprimers. These megaprimers anneal with other sequences, generating unexpected fragments in each PCR cycle. Our gene model, MaSp1, is 1037-bp long, with 68% GC content, and its amino acid sequence is characterized by poly-alanine-glycine motifs, which represent the repetitive codon consensus. We describe the amplification of the MaSp1 gene through minor changes in the PCR program. The results show that a denaturation temperature of 98°C is the key determinant in the amplification of the MaSp1 partial gene sequence.

  12. Comparative DNA analysis of three South American marsupials.

    PubMed Central

    Heguy, A; Musto, H; Wettstein, R

    1982-01-01

    Published information on marsupials DNA is limited to a group of species belonging to only one genus. No previous reports have been written on South American species. In this paper we characterize the DNA of three out of the four marsupials found in Uruguay. Analytical and preparative ultracentrifugations in neutral CsCl gradients, including four intercalating agents and in Cs2SO4 gradients in presence of increasing amounts of Hg++ ion did not allow us to separate any satellite fraction. The buoyant density of the unique peak measured in CsCl gradients was in every case 1.697 g/cc with a G-C content of 37.7%. Digestion of total DNA with 11 restriction endonucleases produced a different pattern of bands for the three species, although some possible homologies could be established. Hybridization with 32P-rRNA of Southern blots of the gels containing digested DNAs demonstrated that the repeated sequences evidenced do not correspond to the ribosomal cistrons. Images PMID:6292862

  13. Investigating the dynamics of surface-immobilized DNA nanomachines

    PubMed Central

    Dunn, Katherine E.; Trefzer, Martin A.; Johnson, Steven; Tyrrell, Andy M.

    2016-01-01

    Surface-immobilization of molecules can have a profound influence on their structure, function and dynamics. Toehold-mediated strand displacement is often used in solution to drive synthetic nanomachines made from DNA, but the effects of surface-immobilization on the mechanism and kinetics of this reaction have not yet been fully elucidated. Here we show that the kinetics of strand displacement in surface-immobilized nanomachines are significantly different to those of the solution phase reaction, and we attribute this to the effects of intermolecular interactions within the DNA layer. We demonstrate that the dynamics of strand displacement can be manipulated by changing strand length, concentration and G/C content. By inserting mismatched bases it is also possible to tune the rates of the constituent displacement processes (toehold-binding and branch migration) independently, and information can be encoded in the time-dependence of the overall reaction. Our findings will facilitate the rational design of surface-immobilized dynamic DNA nanomachines, including computing devices and track-based motors. PMID:27387252

  14. Investigating the dynamics of surface-immobilized DNA nanomachines

    NASA Astrophysics Data System (ADS)

    Dunn, Katherine E.; Trefzer, Martin A.; Johnson, Steven; Tyrrell, Andy M.

    2016-07-01

    Surface-immobilization of molecules can have a profound influence on their structure, function and dynamics. Toehold-mediated strand displacement is often used in solution to drive synthetic nanomachines made from DNA, but the effects of surface-immobilization on the mechanism and kinetics of this reaction have not yet been fully elucidated. Here we show that the kinetics of strand displacement in surface-immobilized nanomachines are significantly different to those of the solution phase reaction, and we attribute this to the effects of intermolecular interactions within the DNA layer. We demonstrate that the dynamics of strand displacement can be manipulated by changing strand length, concentration and G/C content. By inserting mismatched bases it is also possible to tune the rates of the constituent displacement processes (toehold-binding and branch migration) independently, and information can be encoded in the time-dependence of the overall reaction. Our findings will facilitate the rational design of surface-immobilized dynamic DNA nanomachines, including computing devices and track-based motors.

  15. The Genomic Code for Nucleosome Positioning

    NASA Astrophysics Data System (ADS)

    Widom, Jonathan

    2008-03-01

    Eukaryotic genomes encode an additional layer of genetic information, superimposed on top of the regulatory and coding information, that controls the organization of the genomic DNA into arrays of nucleosomes. We have developed a partial ability to read this nucleosome positioning code and predict the in vivo locations of nucleosomes. Our results suggest that genomes utilize the nucleosome positioning code to facilitate specific chromosome functions including to delineate functional versus nonfunctional binding sites for key gene regulatory proteins, and to define the next higher level of chromosome structure itself.

  16. High resolution melting (HRM) analysis of DNA--its role and potential in food analysis.

    PubMed

    Druml, Barbara; Cichna-Markl, Margit

    2014-09-01

    DNA based methods play an increasing role in food safety control and food adulteration detection. Recent papers show that high resolution melting (HRM) analysis is an interesting approach. It involves amplification of the target of interest in the presence of a saturation dye by the polymerase chain reaction (PCR) and subsequent melting of the amplicons by gradually increasing the temperature. Since the melting profile depends on the GC content, length, sequence and strand complementarity of the product, HRM analysis is highly suitable for the detection of single-base variants and small insertions or deletions. The review gives an introduction into HRM analysis, covers important aspects in the development of an HRM analysis method and describes how HRM data are analysed and interpreted. Then we discuss the potential of HRM analysis based methods in food analysis, i.e. for the identification of closely related species and cultivars and the identification of pathogenic microorganisms.

  17. Large-scale oscillation of structure-related DNA sequence features in human chromosome 21

    NASA Astrophysics Data System (ADS)

    Li, Wentian; Miramontes, Pedro

    2006-08-01

    Human chromosome 21 is the only chromosome in the human genome that exhibits oscillation of the (G+C) content of a cycle length of hundreds kilobases (kb) ( 500kb near the right telomere). We aim at establishing the existence of a similar periodicity in structure-related sequence features in order to relate this (G+C)% oscillation to other biological phenomena. The following quantities are shown to oscillate with the same 500kb periodicity in human chromosome 21: binding energy calculated by two sets of dinucleotide-based thermodynamic parameters, AA/TT and AAA/TTT bi- and tri-nucleotide density, 5'-TA-3' dinucleotide density, and signal for 10- or 11-base periodicity of AA/TT or AAA/TTT. These intrinsic quantities are related to structural features of the double helix of DNA molecules, such as base-pair binding, untwisting or unwinding, stiffness, and a putative tendency for nucleosome formation.

  18. QR Codes 101

    ERIC Educational Resources Information Center

    Crompton, Helen; LaFrance, Jason; van 't Hooft, Mark

    2012-01-01

    A QR (quick-response) code is a two-dimensional scannable code, similar in function to a traditional bar code that one might find on a product at the supermarket. The main difference between the two is that, while a traditional bar code can hold a maximum of only 20 digits, a QR code can hold up to 7,089 characters, so it can contain much more…

  19. ARA type protograph codes

    NASA Technical Reports Server (NTRS)

    Divsalar, Dariush (Inventor); Abbasfar, Aliazam (Inventor); Jones, Christopher R. (Inventor); Dolinar, Samuel J. (Inventor); Thorpe, Jeremy C. (Inventor); Andrews, Kenneth S. (Inventor); Yao, Kung (Inventor)

    2008-01-01

    An apparatus and method for encoding low-density parity check codes. Together with a repeater, an interleaver and an accumulator, the apparatus comprises a precoder, thus forming accumulate-repeat-accumulate (ARA codes). Protographs representing various types of ARA codes, including AR3A, AR4A and ARJA codes, are described. High performance is obtained when compared to the performance of current repeat-accumulate (RA) or irregular-repeat-accumulate (IRA) codes.

  20. FACIL: Fast and Accurate Genetic Code Inference and Logo.

    PubMed

    Dutilh, Bas E; Jurgelenaite, Rasa; Szklarczyk, Radek; van Hijum, Sacha A F T; Harhangi, Harry R; Schmid, Markus; de Wild, Bart; Françoijs, Kees-Jan; Stunnenberg, Hendrik G; Strous, Marc; Jetten, Mike S M; Op den Camp, Huub J M; Huynen, Martijn A

    2011-07-15

    The intensification of DNA sequencing will increasingly unveil uncharacterized species with potential alternative genetic codes. A total of 0.65% of the DNA sequences currently in Genbank encode their proteins with a variant genetic code, and these exceptions occur in many unrelated taxa. We introduce FACIL (Fast and Accurate genetic Code Inference and Logo), a fast and reliable tool to evaluate nucleic acid sequences for their genetic code that detects alternative codes even in species distantly related to known organisms. To illustrate this, we apply FACIL to a set of mitochondrial genomic contigs of Globobulimina pseudospinescens. This foraminifer does not have any sequenced close relative in the databases, yet we infer its alternative genetic code with high confidence values. Results are intuitively visualized in a Genetic Code Logo. FACIL is available as a web-based service at http://www.cmbi.ru.nl/FACIL/ and as a stand-alone program.

  1. Genomics dataset of unidentified disclosed isolates.

    PubMed

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  2. Draft Genome Sequence of Deinococcus sp. Strain RL Isolated from Sediments of a Hot Water Spring

    PubMed Central

    Mahato, Nitish Kumar; Tripathi, Charu; Verma, Helianthous; Singh, Neha

    2014-01-01

    Deinococcus sp. strain RL, a moderately thermophilic bacterium, was isolated from sediments of a hot water spring in Manikaran, India. Here, we report the draft genome (2.79 Mbp) of this strain, which contains 62 contigs and 2,614 coding DNA sequences, with an average G+C content of 69.4%. PMID:25035332

  3. Draft Genome Sequence of the Bacteriocinogenic Strain Enterococcus faecalis DBH18, Isolated from Mallard Ducks (Anas platyrhynchos)

    PubMed Central

    Arbulu, Sara; Jimenez, Juan J.; Borrero, Juan; Sánchez, Jorge; Frantzen, Cyril; Herranz, Carmen; Nes, Ingolf F.; Cintas, Luis M.; Diep, Dzung B.

    2016-01-01

    Here, we report the draft genome sequence of Enterococcus faecalis DBH18, a bacteriocinogenic lactic acid bacterium (LAB) isolated from mallard ducks (Anas platyrhynchos). The assembly contains 2,836,724 bp, with a G+C content of 37.6%. The genome is predicted to contain 2,654 coding DNA sequences (CDSs) and 50 RNAs. PMID:27417838

  4. Group-specific amplification of cDNA from DRB1 genes. Complete coding sequences of partially defined alleles and identification of the new alleles DRB1*040602, DRB1*111102, DRB1*080103, and DRB1*0113.

    PubMed

    Balas, Antonio; Vilches, Carlos; Rodríguez, Miguel A; Fernández, Begoña; Martinez, Maria Paz; de Pablo, Rosario; García-Sánchez, Félix; Vicario, Jose L

    2006-12-01

    We present here the complete coding sequences, previously unavailable, of the DRB1 alleles DRB1*030102, *0306, *040701, *0408, *1327, *1356, *1411, *1446, *1503, *1504, *0806, *0813, and *0818. For cDNA isolation, new group-specific primers located at the 5'UT and 3'UT regions were used to carry out allele-specific amplification and a convenient method for determining full-length sequences for DRB1 alleles. Complete coding sequencing of samples previously typed as DRB1*0406, DRB1*080101, and DRB1*1111 revealed new alleles with noncoding nucleotide changes at exons 1 and 3. In addition, we found a novel allele, DRB1*0113, whose second exon carries a sequence motif characteristic of DRB1*07 alleles. The predicted class II haplotypic associations of all alleles are reported and discussed.

  5. Sequence-dependent nanometer-scale conformational dynamics of individual RecBCD–DNA complexes

    PubMed Central

    Carter, Ashley R.; Seaberg, Maasa H.; Fan, Hsiu-Fang; Sun, Gang; Wilds, Christopher J.; Li, Hung-Wen; Perkins, Thomas T.

    2016-01-01

    RecBCD is a multifunctional enzyme that possesses both helicase and nuclease activities. To gain insight into the mechanism of its helicase function, RecBCD unwinding at low adenosine triphosphate (ATP) (2–4 μM) was measured using an optical-trapping assay featuring 1 base-pair (bp) precision. Instead of uniformly sized steps, we observed forward motion convolved with rapid, large-scale (∼4 bp) variations in DNA length. We interpret this motion as conformational dynamics of the RecBCD–DNA complex in an unwinding-competent state, arising, in part, by an enzyme-induced, back-and-forth motion relative to the dsDNA that opens and closes the duplex. Five observations support this interpretation. First, these dynamics were present in the absence of ATP. Second, the onset of the dynamics was coupled to RecBCD entering into an unwinding-competent state that required a sufficiently long 5′ strand to engage the RecD helicase. Third, the dynamics were modulated by the GC-content of the dsDNA. Fourth, the dynamics were suppressed by an engineered interstrand cross-link in the dsDNA that prevented unwinding. Finally, these dynamics were suppressed by binding of a specific non-hydrolyzable ATP analog. Collectively, these observations show that during unwinding, RecBCD binds to DNA in a dynamic mode that is modulated by the nucleotide state of the ATP-binding pocket. PMID:27220465

  6. Genome-wide quantitative assessment of variation in DNA methylation patterns

    PubMed Central

    Xie, Hehuang; Wang, Min; de Andrade, Alexandre; de F. Bonaldo, Maria; Galat, Vasil; Arndt, Kelly; Rajaram, Veena; Goldman, Stewart; Tomita, Tadanori; Soares, Marcelo B.

    2011-01-01

    Genomic DNA methylation contributes substantively to transcriptional regulations that underlie mammalian development and cellular differentiation. Much effort has been made to decipher the molecular mechanisms governing the establishment and maintenance of DNA methylation patterns. However, little is known about genome-wide variation of DNA methylation patterns. In this study, we introduced the concept of methylation entropy, a measure of the randomness of DNA methylation patterns in a cell population, and exploited it to assess the variability in DNA methylation patterns of Alu repeats and promoters. A few interesting observations were made: (i) within a cell population, methylation entropy varies among genomic loci; (ii) among cell populations, the methylation entropies of most genomic loci remain constant; (iii) compared to normal tissue controls, some tumors exhibit greater methylation entropies; (iv) Alu elements with high methylation entropy are associated with high GC content but depletion of CpG dinucleotides and (v) Alu elements in the intronic regions or far from CpG islands are associated with low methylation entropy. We further identified 12 putative allelic-specific methylated genomic loci, including four Alu elements and eight promoters. Lastly, using subcloned normal fibroblast cells, we demonstrated the highly variable methylation patterns are resulted from low fidelity of DNA methylation inheritance. PMID:21278160

  7. Determination of 5-methylcytosine from plant DNA by high-performance liquid chromatography.

    PubMed

    Wagner, I; Capesius, I

    1981-06-26

    The relative amounts of the five nucleosides (deoxycytidine, 5-methyldeoxycytidine, deoxyadenosine, deoxyguanosine and thymidine) in the DNA of nine plant species, one plant satellite DNA, and one animal species were determined by high performance liquid chromatography. The method allows the clean separation of the nucleosides from 10 microgram samples with 15 min. The following values for the proportion of methylated cytosines among all cytosines were obtained: Lobularia maritima 18.5%, Nicotiana tabacum 32.6%, Pisum sativum 23.2%, Rhinanthus minor 29.2%, Sinapsis alba 12.2%, Vicia faba 30.5%, Viscum album 23.2%, Cymbidium pumilum 18.8%, Cymbidium pumilum AT-rich satellite DNA 15.8%, Triticum aestivum 22.4%. DNA of an animal, the gerbil, Meriones unguiculatus, had a methylation percentage of 3.1%. An estimate of the GC content based on the buoyant density of DNA tends to be lower than the actual value, an estimate based on the melting temperature tends to be higher. This supports the finding by other authors that DNA methylation decreases the buoyant density and may increase the melting temperature at high m5C concentration.

  8. Efficient entropy coding for scalable video coding

    NASA Astrophysics Data System (ADS)

    Choi, Woong Il; Yang, Jungyoup; Jeon, Byeungwoo

    2005-10-01

    The standardization for the scalable extension of H.264 has called for additional functionality based on H.264 standard to support the combined spatio-temporal and SNR scalability. For the entropy coding of H.264 scalable extension, Context-based Adaptive Binary Arithmetic Coding (CABAC) scheme is considered so far. In this paper, we present a new context modeling scheme by using inter layer correlation between the syntax elements. As a result, it improves coding efficiency of entropy coding in H.264 scalable extension. In simulation results of applying the proposed scheme to encoding the syntax element mb_type, it is shown that improvement in coding efficiency of the proposed method is up to 16% in terms of bit saving due to estimation of more adequate probability model.

  9. CRITICA: coding region identification tool invoking comparative analysis

    NASA Technical Reports Server (NTRS)

    Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1999-01-01

    Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).

  10. CRITICA: coding region identification tool invoking comparative analysis

    NASA Technical Reports Server (NTRS)

    Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1999-01-01

    Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).

  11. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  12. A mathematical formulation of DNA computation.

    PubMed

    Zhang, Mingjun; Cheng, Maggie X; Tarn, Tzyh-Jong

    2006-03-01

    DNA computation is to use DNA molecules for information storing and processing. The task is accomplished by encoding and interpreting DNA molecules in suspended solutions before and after the complementary binding reactions. DNA computation is attractive, due to its fast parallel information processing, remarkable energy efficiency, and high storing capacity. Challenges currently faced by DNA computation are: 1) lack of theoretical computational models for applications and 2) high error rate for implementation. This paper attempts to address these problems from mathematical modeling and genetic coding aspects. The first part of this paper presents a mathematical formulation of DNA computation. The model may serve as a theoretical framework for DNA computation. In the second part, a genetic code based DNA computation approach is presented to reduce error rate for implementation, which has been a major concern for DNA computation. The method provides a promising alternative to reduce error rate for DNA computation.

  13. EMdeCODE: a novel algorithm capable of reading words of epigenetic code to predict enhancers and retroviral integration sites and to identify H3R2me1 as a distinctive mark of coding versus non-coding genes

    PubMed Central

    Santoni, Federico Andrea

    2013-01-01

    Existence of some extra-genetic (epigenetic) codes has been postulated since the discovery of the primary genetic code. Evident effects of histone post-translational modifications or DNA methylation over the efficiency and the regulation of DNA processes are supporting this postulation. EMdeCODE is an original algorithm that approximate the genomic distribution of given DNA features (e.g. promoter, enhancer, viral integration) by identifying relevant ChIPSeq profiles of post-translational histone marks or DNA binding proteins and combining them in a supermark. EMdeCODE kernel is essentially a two-step procedure: (i) an expectation-maximization process calculates the mixture of epigenetic factors that maximize the Sensitivity (recall) of the association with the feature under study; (ii) the approximated density is then recursively trimmed with respect to a control dataset to increase the precision by reducing the number of false positives. EMdeCODE densities improve significantly the prediction of enhancer loci and retroviral integration sites with respect to previous methods. Importantly, it can also be used to extract distinctive factors between two arbitrary conditions. Indeed EMdeCODE identifies unexpected epigenetic profiles specific for coding versus non-coding RNA, pointing towards a new role for H3R2me1 in coding regions. PMID:23234700

  14. What is Code Biology?

    PubMed

    Barbieri, Marcello

    2017-10-06

    Various independent discoveries have shown that many organic codes exist in living systems, and this implies that they came into being during the history of life and contributed to that history. The genetic code appeared in a population of primitive systems that has been referred to as the common ancestor, and it has been proposed that three distinct signal processing codes gave origin to the three primary kingdoms of Archaea, Bacteria and Eukarya. After the genetic code and the signal processing codes, on the other hand, only the ancestors of the eukaryotes continued to explore the coding space and gave origin to splicing codes, histone code, tubulin code, compartment codes and many others. A first theoretical consequence of this historical fact is the idea that the Eukarya became increasingly more complex because they maintained the potential to bring new organic codes into existence. A second theoretical consequence comes from the fact that the evolution of the individual rules of a code can take an extremely long time, but the origin of a new organic code corresponds to the appearance of a complete set of rules and from a geological point of view this amounts to a sudden event. The great discontinuities of the history of life, in other words, can be explained as the result of the appearance of new codes. A third theoretical consequence comes from the fact that the organic codes have been highly conserved in evolution, which shows that they are the great invariants of life, the sole entities that have gone intact through billions of years while everything else has changed. This tells us that the organic codes are fundamental components of life and their study - the new research field of Code Biology - is destined to become an increasingly relevant part of the life sciences. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. The consequences of base pair composition biases for regulatory network organization in prokaryotes.

    PubMed

    Cordero, Otto X; Hogeweg, Paulien

    2009-10-01

    Given the dramatic variation in guanine-cytosine (GC) content observed in prokaryotes, from approximately 20% to approximately 75% GC, one wonders if these extreme biases in base pair composition affect the evolution of transcription factor-binding sites (BS). This letter shows that, along the wide range of GC content variation in bacteria, bacterial BS keep a high frequency of AT bases, roughly independently of the background (BG) base pair composition of intergenic regions. As a result, the equilibrium base pair frequencies of BS depart the most from those of BS DNA in GC-rich genomes. This not only implies a higher specificity but also a higher coding barrier for BS in GC-rich genomes. In accordance, we observe that the average percentage of divergently transcribed regions increases with the GC content of the genome, suggesting the use of a more efficient coding strategy.

  16. DNA-based watermarks using the DNA-Crypt algorithm

    PubMed Central

    Heider, Dominik; Barnekow, Angelika

    2007-01-01

    Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434

  17. DNA-based watermarks using the DNA-Crypt algorithm.

    PubMed

    Heider, Dominik; Barnekow, Angelika

    2007-05-29

    The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.

  18. Revisiting the Physico-Chemical Hypothesis of Code Origin: An Analysis Based on Code-Sequence Coevolution in a Finite Population

    NASA Astrophysics Data System (ADS)

    Bandhu, Ashutosh Vishwa; Aggarwal, Neha; Sengupta, Supratim

    2013-12-01

    The origin of the genetic code marked a major transition from a plausible RNA world to the world of DNA and proteins and is an important milestone in our understanding of the origin of life. We examine the efficacy of the physico-chemical hypothesis of code origin by carrying out simulations of code-sequence coevolution in finite populations in stages, leading first to the emergence of ten amino acid code(s) and subsequently to 14 amino acid code(s). We explore two different scenarios of primordial code evolution. In one scenario, competition occurs between populations of equilibrated code-sequence sets while in another scenario; new codes compete with existing codes as they are gradually introduced into the population with a finite probability. In either case, we find that natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. The code whose structure is most consistent with the standard genetic code is often not among the codes that have a high fixation probability. However, we find that the composition of the code population affects the code fixation probability. A physico-chemically optimized code gets fixed with a significantly higher probability if it competes against a set of randomly generated codes. Our results suggest that physico-chemical optimization may not be the sole driving force in ensuring the emergence of the standard genetic code.

  19. DIANE multiparticle transport code

    NASA Astrophysics Data System (ADS)

    Caillaud, M.; Lemaire, S.; Ménard, S.; Rathouit, P.; Ribes, J. C.; Riz, D.

    2014-06-01

    DIANE is the general Monte Carlo code developed at CEA-DAM. DIANE is a 3D multiparticle multigroup code. DIANE includes automated biasing techniques and is optimized for massive parallel calculations.

  20. QR Code Mania!

    ERIC Educational Resources Information Center

    Shumack, Kellie A.; Reilly, Erin; Chamberlain, Nik

    2013-01-01

    space, has error-correction capacity, and can be read from any direction. These codes are used in manufacturing, shipping, and marketing, as well as in education. QR codes can be created to produce…

  1. Honesty and Honor Codes.

    ERIC Educational Resources Information Center

    McCabe, Donald; Trevino, Linda Klebe

    2002-01-01

    Explores the rise in student cheating and evidence that students cheat less often at schools with an honor code. Discusses effective use of such codes and creation of a peer culture that condemns dishonesty. (EV)

  2. QR Code Mania!

    ERIC Educational Resources Information Center

    Shumack, Kellie A.; Reilly, Erin; Chamberlain, Nik

    2013-01-01

    space, has error-correction capacity, and can be read from any direction. These codes are used in manufacturing, shipping, and marketing, as well as in education. QR codes can be created to produce…

  3. Practices in Code Discoverability

    NASA Astrophysics Data System (ADS)

    Teuben, P.; Allen, A.; Nemiroff, R. J.; Shamir, L.

    2012-09-01

    Much of scientific progress now hinges on the reliability, falsifiability and reproducibility of computer source codes. Astrophysics in particular is a discipline that today leads other sciences in making useful scientific components freely available online, including data, abstracts, preprints, and fully published papers, yet even today many astrophysics source codes remain hidden from public view. We review the importance and history of source codes in astrophysics and previous efforts to develop ways in which information about astrophysics codes can be shared. We also discuss why some scientist coders resist sharing or publishing their codes, the reasons for and importance of overcoming this resistance, and alert the community to a reworking of one of the first attempts for sharing codes, the Astrophysics Source Code Library (ASCL). We discuss the implementation of the ASCL in an accompanying poster paper. We suggest that code could be given a similar level of referencing as data gets in repositories such as ADS.

  4. Identification and Phylogenetic analysis of thermophilic sulfate-reducing bacteria in oil field samples by 16S rDNA gene cloning and sequencing.

    PubMed

    Leu, J Y; McGovern-Traa, C P; Porter, A J; Harris, W J; Hamilton, W A

    1998-06-01

    Thermophilic sulfate-reducing bacteria (SRB) have been recognized as an important source of hydrogen sulfide (H2S) in hydrocarbon reservoirs and in production systems. Four thermophilic SRB enrichment cultures from three different oil field samples (sandstone core, drilling mud, and production water) were investigated using 16S rDNA sequence comparative analysis. In total, 15 different clones were identified. We found spore-forming, low G+C content, thermophilic, sulfate-reducing Desulfotomaculum-related sequences present in all oil field samples, and additionally a clone originating from sandstone core which was assigned to the mesophilic Desulfomicrobium group. Furthermore, three clones related to Gram-positive, non-sulfate-reducing Thermoanaerobacter species and four clones close to Clostridium thermocopriae were found in enrichment cultures from sandstone core and from production water, respectively. In addition, the deeply rooted lineage of two of the clones suggested previously undescribed, Gram-positive, low G+C content, thermophilic, obligately anaerobic bacteria present in production water. Such thermophilic, non-sulfate-reducing microorganisms may play an important ecological role alongside SRB in oil field environments.

  5. STEEP32 computer code

    NASA Technical Reports Server (NTRS)

    Goerke, W. S.

    1972-01-01

    A manual is presented as an aid in using the STEEP32 code. The code is the EXEC 8 version of the STEEP code (STEEP is an acronym for shock two-dimensional Eulerian elastic plastic). The major steps in a STEEP32 run are illustrated in a sample problem. There is a detailed discussion of the internal organization of the code, including a description of each subroutine.

  6. Exons, Introns, and DNA Thermodynamics

    NASA Astrophysics Data System (ADS)

    Carlon, Enrico; Malki, Mehdi Lejard; Blossey, Ralf

    2005-05-01

    The genes of eukaryotes are characterized by protein coding fragments, the exons, interrupted by introns, i.e., stretches of DNA which do not carry useful information for protein synthesis. We have analyzed the melting behavior of randomly selected human cDNA sequences obtained from genomic DNA by removing all introns. A clear correspondence is observed between exons and melting domains. This finding may provide new insights into the physical mechanisms underlying the evolution of genes.

  7. Universal Noiseless Coding Subroutines

    NASA Technical Reports Server (NTRS)

    Schlutsmeyer, A. P.; Rice, R. F.

    1986-01-01

    Software package consists of FORTRAN subroutines that perform universal noiseless coding and decoding of integer and binary data strings. Purpose of this type of coding to achieve data compression in sense that coded data represents original data perfectly (noiselessly) while taking fewer bits to do so. Routines universal because they apply to virtually any "real-world" data source.

  8. Universal Noiseless Coding Subroutines

    NASA Technical Reports Server (NTRS)

    Schlutsmeyer, A. P.; Rice, R. F.

    1986-01-01

    Software package consists of FORTRAN subroutines that perform universal noiseless coding and decoding of integer and binary data strings. Purpose of this type of coding to achieve data compression in sense that coded data represents original data perfectly (noiselessly) while taking fewer bits to do so. Routines universal because they apply to virtually any "real-world" data source.

  9. Morse Code Activity Packet.

    ERIC Educational Resources Information Center

    Clinton, Janeen S.

    This activity packet offers simple directions for setting up a Morse Code system appropriate to interfacing with any of several personal computer systems. Worksheets are also included to facilitate teaching Morse Code to persons with visual or other disabilities including blindness, as it is argued that the code is best learned auditorily. (PB)

  10. EMF wire code research

    SciTech Connect

    Jones, T.

    1993-11-01

    This paper examines the results of previous wire code research to determines the relationship with childhood cancer, wire codes and electromagnetic fields. The paper suggests that, in the original Savitz study, biases toward producing a false positive association between high wire codes and childhood cancer were created by the selection procedure.

  11. Mapping Local Codes to Read Codes.

    PubMed

    Bonney, Wilfred; Galloway, James; Hall, Christopher; Ghattas, Mikhail; Tramma, Leandro; Nind, Thomas; Donnelly, Louise; Jefferson, Emily; Doney, Alexander

    2017-01-01

    Background & Objectives: Legacy laboratory test codes make it difficult to use clinical datasets for meaningful translational research, where populations are followed for disease risk and outcomes over many years. The Health Informatics Centre (HIC) at the University of Dundee hosts continuous biochemistry data from the clinical laboratories in Tayside and Fife dating back as far as 1987. However, the HIC-managed biochemistry dataset is coupled with incoherent sample types and unstandardised legacy local test codes, which increases the complexity of using the dataset for reasonable population health outcomes. The objective of this study was to map the legacy local test codes to the Scottish 5-byte Version 2 Read Codes using biochemistry data extracted from the repository of the Scottish Care Information (SCI) Store.

  12. Genetic coding and gene expression - new Quadruplet genetic coding model

    NASA Astrophysics Data System (ADS)

    Shankar Singh, Rama

    2012-07-01

    Successful demonstration of human genome project has opened the door not only for developing personalized medicine and cure for genetic diseases, but it may also answer the complex and difficult question of the origin of life. It may lead to making 21st century, a century of Biological Sciences as well. Based on the central dogma of Biology, genetic codons in conjunction with tRNA play a key role in translating the RNA bases forming sequence of amino acids leading to a synthesized protein. This is the most critical step in synthesizing the right protein needed for personalized medicine and curing genetic diseases. So far, only triplet codons involving three bases of RNA, transcribed from DNA bases, have been used. Since this approach has several inconsistencies and limitations, even the promise of personalized medicine has not been realized. The new Quadruplet genetic coding model proposed and developed here involves all four RNA bases which in conjunction with tRNA will synthesize the right protein. The transcription and translation process used will be the same, but the Quadruplet codons will help overcome most of the inconsistencies and limitations of the triplet codes. Details of this new Quadruplet genetic coding model and its subsequent potential applications including relevance to the origin of life will be presented.

  13. Diversity and distribution of single-stranded DNA phages in the North Atlantic Ocean

    PubMed Central

    Tucker, Kimberly P; Parsons, Rachel; Symonds, Erin M; Breitbart, Mya

    2011-01-01

    Knowledge of marine phages is highly biased toward double-stranded DNA (dsDNA) phages; however, recent metagenomic surveys have also identified single-stranded DNA (ssDNA) phages in the oceans. Here, we describe two complete ssDNA phage genomes that were reconstructed from a viral metagenome from 80 m depth at the Bermuda Atlantic Time-series Study (BATS) site in the northwestern Sargasso Sea and examine their spatial and temporal distributions. Both genomes (SARssφ1 and SARssφ2) exhibited similarity to known phages of the Microviridae family in terms of size, GC content, genome organization and protein sequence. PCR amplification of the replication initiation protein (Rep) gene revealed narrow and distinct depth distributions for the newly described ssDNA phages within the upper 200 m of the water column at the BATS site. Comparison of Rep gene sequences obtained from the BATS site over time revealed changes in the diversity of ssDNA phages over monthly time scales, although some nearly identical sequences were recovered from samples collected 4 years apart. Examination of ssDNA phage diversity along transects through the North Atlantic Ocean revealed a positive correlation between genetic distance and geographic distance between sampling sites. Together, the data suggest fundamental differences between the distribution of these ssDNA phages and the distribution of known marine dsDNA phages, possibly because of differences in host range, host distribution, virion stability, or viral evolution mechanisms and rates. Future work needs to elucidate the host ranges for oceanic ssDNA phages and determine their ecological roles in the marine ecosystem. PMID:21124487

  14. Improving the performance of true single molecule sequencing for ancient DNA

    PubMed Central

    2012-01-01

    Background Second-generation sequencing technologies have revolutionized our ability to recover genetic information from the past, allowing the characterization of the first complete genomes from past individuals and extinct species. Recently, third generation Helicos sequencing platforms, which perform true Single-Molecule DNA Sequencing (tSMS), have shown great potential for sequencing DNA molecules from Pleistocene fossils. Here, we aim at improving even further the performance of tSMS for ancient DNA by testing two novel tSMS template preparation methods for Pleistocene bone fossils, namely oligonucleotide spiking and treatment with DNA phosphatase. Results We found that a significantly larger fraction of the horse genome could be covered following oligonucleotide spiking however not reproducibly and at the cost of extra post-sequencing filtering procedures and skewed %GC content. In contrast, we showed that treating ancient DNA extracts with DNA phosphatase improved the amount of endogenous sequence information recovered per sequencing channel by up to 3.3-fold, while still providing molecular signatures of endogenous ancient DNA damage, including cytosine deamination and fragmentation by depurination. Additionally, we confirmed the existence of molecular preservation niches in large bone crystals from which DNA could be preferentially extracted. Conclusions We propose DNA phosphatase treatment as a mechanism to increase sequence coverage of ancient genomes when using Helicos tSMS as a sequencing platform. Together with mild denaturation temperatures that favor access to endogenous ancient templates over modern DNA contaminants, this simple preparation procedure can improve overall Helicos tSMS performance when damaged DNA templates are targeted. PMID:22574620

  15. Genome Calligrapher: A Web Tool for Refactoring Bacterial Genome Sequences for de Novo DNA Synthesis.

    PubMed

    Christen, Matthias; Deutsch, Samuel; Christen, Beat

    2015-08-21

    Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher  .

  16. Software Certification - Coding, Code, and Coders

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus; Holzmann, Gerard J.

    2011-01-01

    We describe a certification approach for software development that has been adopted at our organization. JPL develops robotic spacecraft for the exploration of the solar system. The flight software that controls these spacecraft is considered to be mission critical. We argue that the goal of a software certification process cannot be the development of "perfect" software, i.e., software that can be formally proven to be correct under all imaginable and unimaginable circumstances. More realistically, the goal is to guarantee a software development process that is conducted by knowledgeable engineers, who follow generally accepted procedures to control known risks, while meeting agreed upon standards of workmanship. We target three specific issues that must be addressed in such a certification procedure: the coding process, the code that is developed, and the skills of the coders. The coding process is driven by standards (e.g., a coding standard) and tools. The code is mechanically checked against the standard with the help of state-of-the-art static source code analyzers. The coders, finally, are certified in on-site training courses that include formal exams.

  17. Software Certification - Coding, Code, and Coders

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus; Holzmann, Gerard J.

    2011-01-01

    We describe a certification approach for software development that has been adopted at our organization. JPL develops robotic spacecraft for the exploration of the solar system. The flight software that controls these spacecraft is considered to be mission critical. We argue that the goal of a software certification process cannot be the development of "perfect" software, i.e., software that can be formally proven to be correct under all imaginable and unimaginable circumstances. More realistically, the goal is to guarantee a software development process that is conducted by knowledgeable engineers, who follow generally accepted procedures to control known risks, while meeting agreed upon standards of workmanship. We target three specific issues that must be addressed in such a certification procedure: the coding process, the code that is developed, and the skills of the coders. The coding process is driven by standards (e.g., a coding standard) and tools. The code is mechanically checked against the standard with the help of state-of-the-art static source code analyzers. The coders, finally, are certified in on-site training courses that include formal exams.

  18. Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

    PubMed Central

    Vuyisich, Momchilo; Arefin, Ayesha; Davenport, Karen; Feng, Shihai; Gleasner, Cheryl; McMurry, Kim; Parson-Quintana, Beverly; Price, Jennifer; Scholz, Matthew; Chain, Patrick

    2014-01-01

    Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing and de novo assembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing and de novo assembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderia spp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing and de novo assembly is not decreased when only 10 ng of input genomic DNA is used. PMID:25478564

  19. Experimental conditions improving in-solution target enrichment for ancient DNA.

    PubMed

    Cruz-Dávalos, Diana I; Llamas, Bastien; Gaunitz, Charleen; Fages, Antoine; Gamba, Cristina; Soubrier, Julien; Librado, Pablo; Seguin-Orlando, Andaine; Pruvost, Mélanie; Alfarhan, Ahmed H; Alquraishi, Saleh A; Al-Rasheid, Khaled A S; Scheu, Amelie; Beneke, Norbert; Ludwig, Arne; Cooper, Alan; Willerslev, Eske; Orlando, Ludovic

    2016-08-27

    High-throughput sequencing has dramatically fostered ancient DNA research in recent years. Shotgun sequencing, however, does not necessarily appear as the best-suited approach due to the extensive contamination of samples with exogenous environmental microbial DNA. DNA capture-enrichment methods represent cost-effective alternatives that increase the sequencing focus on the endogenous fraction, whether it is from mitochondrial or nuclear genomes, or parts thereof. Here, we explored experimental parameters that could impact the efficacy of MYbaits in-solution capture assays of ~5000 nuclear loci or the whole genome. We found that varying quantities of the starting probes had only moderate effect on capture outcomes. Starting DNA, probe tiling, the hybridization temperature and the proportion of endogenous DNA all affected the assay, however. Additionally, probe features such as their GC content, number of CpG dinucleotides, sequence complexity and entropy and self-annealing properties need to be carefully addressed during the design stage of the capture assay. The experimental conditions and probe molecular features identified in this study will improve the recovery of genetic information extracted from degraded and ancient remains.

  20. Structural diversity of supercoiled DNA

    PubMed Central

    Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn

    2015-01-01

    By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function. PMID:26455586

  1. Structural diversity of supercoiled DNA

    NASA Astrophysics Data System (ADS)

    Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn

    2015-10-01

    By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function.

  2. Tissue-Specific Evolution of Protein Coding Genes in Human and Mouse.

    PubMed

    Kryuchkova-Mostacci, Nadezda; Robinson-Rechavi, Marc

    2015-01-01

    Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.

  3. Gene and genon concept: coding versus regulation

    PubMed Central

    2007-01-01

    We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various

  4. Coding for Electronic Mail

    NASA Technical Reports Server (NTRS)

    Rice, R. F.; Lee, J. J.

    1986-01-01

    Scheme for coding facsimile messages promises to reduce data transmission requirements to one-tenth current level. Coding scheme paves way for true electronic mail in which handwritten, typed, or printed messages or diagrams sent virtually instantaneously - between buildings or between continents. Scheme, called Universal System for Efficient Electronic Mail (USEEM), uses unsupervised character recognition and adaptive noiseless coding of text. Image quality of resulting delivered messages improved over messages transmitted by conventional coding. Coding scheme compatible with direct-entry electronic mail as well as facsimile reproduction. Text transmitted in this scheme automatically translated to word-processor form.

  5. Evaluation of the Gibbs Free Energy Changes and Melting Temperatures of DNA/DNA Duplexes Using Hybridization Enthalpy Calculated by Molecular Dynamics Simulation.

    PubMed

    Lomzov, Alexander A; Vorobjev, Yury N; Pyshnyi, Dmitrii V

    2015-12-10

    A molecular dynamics simulation approach was applied for the prediction of the thermal stability of oligonucleotide duplexes. It was shown that the enthalpy of the DNA/DNA complex formation could be calculated using this approach. We have studied the influence of various simulation parameters on the secondary structure and the hybridization enthalpy value of Dickerson-Drew dodecamer. The optimal simulation parameters for the most reliable prediction of the enthalpy values were determined. The thermodynamic parameters (enthalpy and entropy changes) of a duplex formation were obtained experimentally for 305 oligonucleotides of various lengths and GC-content. The resulting database was studied with molecular dynamics (MD) simulation using the optimized simulation parameters. Gibbs free energy changes and the melting temperatures were evaluated using the experimental correlation between enthalpy and entropy changes of the duplex formation and the enthalpy values calculated by the MD simulation. The average errors in the predictions of enthalpy, the Gibbs free energy change, and the melting temperature of oligonucleotide complexes were 11%, 10%, and 4.4 °C, respectively. We have shown that the molecular dynamics simulation gives a possibility to calculate the thermal stability of native DNA/DNA complexes a priori with an unexpectedly high accuracy.

  6. XSOR codes users manual

    SciTech Connect

    Jow, Hong-Nian; Murfin, W.B.; Johnson, J.D.

    1993-11-01

    This report describes the source term estimation codes, XSORs. The codes are written for three pressurized water reactors (Surry, Sequoyah, and Zion) and two boiling water reactors (Peach Bottom and Grand Gulf). The ensemble of codes has been named ``XSOR``. The purpose of XSOR codes is to estimate the source terms which would be released to the atmosphere in severe accidents. A source term includes the release fractions of several radionuclide groups, the timing and duration of releases, the rates of energy release, and the elevation of releases. The codes have been developed by Sandia National Laboratories for the US Nuclear Regulatory Commission (NRC) in support of the NUREG-1150 program. The XSOR codes are fast running parametric codes and are used as surrogates for detailed mechanistic codes. The XSOR codes also provide the capability to explore the phenomena and their uncertainty which are not currently modeled by the mechanistic codes. The uncertainty distributions of input parameters may be used by an. XSOR code to estimate the uncertainty of source terms.

  7. DLLExternalCode

    SciTech Connect

    Greg Flach, Frank Smith

    2014-05-14

    DLLExternalCode is the a general dynamic-link library (DLL) interface for linking GoldSim (www.goldsim.com) with external codes. The overall concept is to use GoldSim as top level modeling software with interfaces to external codes for specific calculations. The DLLExternalCode DLL that performs the linking function is designed to take a list of code inputs from GoldSim, create an input file for the external application, run the external code, and return a list of outputs, read from files created by the external application, back to GoldSim. Instructions for creating the input file, running the external code, and reading the output are contained in an instructions file that is read and interpreted by the DLL.

  8. Defeating the coding monsters.

    PubMed

    Colt, Ross

    2007-02-01

    Accuracy in coding is rapidly becoming a required skill for military health care providers. Clinic staffing, equipment purchase decisions, and even reimbursement will soon be based on the coding data that we provide. Learning the complicated myriad of rules to code accurately can seem overwhelming. However, the majority of clinic visits in a typical outpatient clinic generally fall into two major evaluation and management codes, 99213 and 99214. If health care providers can learn the rules required to code a 99214 visit, then this will provide a 90% solution that can enable them to accurately code the majority of their clinic visits. This article demonstrates a step-by-step method to code a 99214 visit, by viewing each of the three requirements as a monster to be defeated.

  9. Transcription of mitochondrial DNA.

    PubMed

    Tabak, H F; Grivell, L A; Borst, P

    1983-01-01

    While mitochondrial DNA (mtDNA) is the simplest DNA in nature, coding for rRNAs and tRNAs, results of DNA sequence, and transcript analysis have demonstrated that both the synthesis and processing of mitochondrial RNAs involve remarkably intricate events. At one extreme, genes in animal mtDNAs are tightly packed, both DNA strands are completely transcribed (symmetric transcription), and the appearance of specific mRNAs is entirely dependent on processing at sites signalled by the sequences of the tRNAs, which abut virtually every gene. At the other extreme, gene organization in yeast (Saccharomyces) is anything but compact, with long stretches of AT-rich DNA interspaced between coding sequences and no obvious logic to the order of genes. Transcription is asymmetric and several RNAs are initiated de novo. Nevertheless, extensive RNA processing occurs due largely to the presence of split genes. RNA splicing is complex, is controlled by both mitochondrial and nuclear genes, and in some cases is accompanied by the formation of RNAs that behave as covalently closed circles. The present article reviews current knowledge of mitochondrial transcription and RNA processing in relation to possible mechanisms for the regulation of mitochondrial gene expression.

  10. The place of 'codes' in nonlinear neurodynamics.

    PubMed

    Freeman, Walter J

    2007-01-01

    A key problem in cognitive science is to explain the neural mechanisms of the rapid transposition between stimulus energy and abstract concept--between the specific and the generic--in both material and conceptual aspects, not between neural and psychic aspects. Three approaches by researchers to a solution in terms of neural codes are considered. Materialists seek rate and frequency codes in the interspike intervals of trains of action potentials induced by stimuli and carried by topologically organized axonal lines. Cognitivists refer to the symbol grounding problem and search for symbolic codes in firings of hierarchically organized feature-detector neurons of phonemes, lines, odorants, pressures, etc., that object-detector neurons bind into representations of probabilities of stimulus occurrence. Dynamicists seek neural correlates of stimuli and associated behaviors in spatial patterns of oscillatory fields of dendritic activity that self-organize and evolve as trajectories through high-dimensional brain state space; the codes are landscapes of chaotic attractors. Unlike codes in DNA and the periodic