2012-01-01
Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225
Franc, M A; Cohen, N; Warner, A W; Shaw, P M; Groenen, P; Snapir, A
2011-04-01
DNA samples collected in clinical trials and stored for future research are valuable to pharmaceutical drug development. Given the perceived higher risk associated with genetic research, industry has implemented complex coding methods for DNA. Following years of experience with these methods and with addressing questions from institutional review boards (IRBs), ethics committees (ECs) and health authorities, the industry has started reexamining the extent of the added value offered by these methods. With the goal of harmonization, the Industry Pharmacogenomics Working Group (I-PWG) conducted a survey to gain an understanding of company practices for DNA coding and to solicit opinions on their effectiveness at protecting privacy. The results of the survey and the limitations of the coding methods are described. The I-PWG recommends dialogue with key stakeholders regarding coding practices such that equal standards are applied to DNA and non-DNA samples. The I-PWG believes that industry standards for privacy protection should provide adequate safeguards for DNA and non-DNA samples/data and suggests a need for more universal standards for samples stored for future research.
Functional interrogation of non-coding DNA through CRISPR genome editing
Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.
2017-01-01
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
Functional interrogation of non-coding DNA through CRISPR genome editing.
Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H
2017-05-15
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge
2013-01-01
This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392
New t-gap insertion-deletion-like metrics for DNA hybridization thermodynamic modeling.
D'yachkov, Arkadii G; Macula, Anthony J; Pogozelski, Wendy K; Renz, Thomas E; Rykov, Vyacheslav V; Torney, David C
2006-05-01
We discuss the concept of t-gap block isomorphic subsequences and use it to describe new abstract string metrics that are similar to the Levenshtein insertion-deletion metric. Some of the metrics that we define can be used to model a thermodynamic distance function on single-stranded DNA sequences. Our model captures a key aspect of the nearest neighbor thermodynamic model for hybridized DNA duplexes. One version of our metric gives the maximum number of stacked pairs of hydrogen bonded nucleotide base pairs that can be present in any secondary structure in a hybridized DNA duplex without pseudoknots. Thermodynamic distance functions are important components in the construction of DNA codes, and DNA codes are important components in biomolecular computing, nanotechnology, and other biotechnical applications that employ DNA hybridization assays. We show how our new distances can be calculated by using a dynamic programming method, and we derive a Varshamov-Gilbert-like lower bound on the size of some of codes using these distance functions as constraints. We also discuss software implementation of our DNA code design methods.
Hiding message into DNA sequence through DNA coding and chaotic maps.
Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman
2014-09-01
The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.
Zhu, Debin; Tang, Yabing; Xing, Da; Chen, Wei R
2008-05-15
A bio bar code assay based on oligonucleotide-modified gold nanoparticles (Au-NPs) provides a PCR-free method for quantitative detection of nucleic acid targets. However, the current bio bar code assay requires lengthy experimental procedures including the preparation and release of bar code DNA probes from the target-nanoparticle complex and immobilization and hybridization of the probes for quantification. Herein, we report a novel PCR-free electrochemiluminescence (ECL)-based bio bar code assay for the quantitative detection of genetically modified organism (GMO) from raw materials. It consists of tris-(2,2'-bipyridyl) ruthenium (TBR)-labeled bar code DNA, nucleic acid hybridization using Au-NPs and biotin-labeled probes, and selective capture of the hybridization complex by streptavidin-coated paramagnetic beads. The detection of target DNA is realized by direct measurement of ECL emission of TBR. It can quantitatively detect target nucleic acids with high speed and sensitivity. This method can be used to quantitatively detect GMO fragments from real GMO products.
CRITICA: coding region identification tool invoking comparative analysis
NASA Technical Reports Server (NTRS)
Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)
1999-01-01
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
DNA: Polymer and molecular code
NASA Astrophysics Data System (ADS)
Shivashankar, G. V.
1999-10-01
The thesis work focusses upon two aspects of DNA, the polymer and the molecular code. Our approach was to bring single molecule micromanipulation methods to the study of DNA. It included a home built optical microscope combined with an atomic force microscope and an optical tweezer. This combined approach led to a novel method to graft a single DNA molecule onto a force cantilever using the optical tweezer and local heating. With this method, a force versus extension assay of double stranded DNA was realized. The resolution was about 10 picoN. To improve on this force measurement resolution, a simple light backscattering technique was developed and used to probe the DNA polymer flexibility and its fluctuations. It combined the optical tweezer to trap a DNA tethered bead and the laser backscattering to detect the beads Brownian fluctuations. With this technique the resolution was about 0.1 picoN with a millisecond access time, and the whole entropic part of the DNA force-extension was measured. With this experimental strategy, we measured the polymerization of the protein RecA on an isolated double stranded DNA. We observed the progressive decoration of RecA on the l DNA molecule, which results in the extension of l , due to unwinding of the double helix. The dynamics of polymerization, the resulting change in the DNA entropic elasticity and the role of ATP hydrolysis were the main parts of the study. A simple model for RecA assembly on DNA was proposed. This work presents a first step in the study of genetic recombination. Recently we have started a study of equilibrium binding which utilizes fluorescence polarization methods to probe the polymerization of RecA on single stranded DNA. In addition to the study of material properties of DNA and DNA-RecA, we have developed experiments for which the code of the DNA is central. We studied one aspect of DNA as a molecular code, using different techniques. In particular the programmatic use of template specificity makes gene expression a prime example of a biological code. We developed a novel method of making DNA micro- arrays, the so-called DNA chip. Using the optical tweezer concept, we were able to pattern biomolecules on a solid substrate, developing a new type of sub-micron laser lithography. A laser beam is focused onto a thin gold film on a glass substrate. Laser ablation of gold results in local aggregation of nanometer scale beads conjugated with small DNA oligonucleotides, with sub-micron resolution. This leads to specific detection of cDNA and RNA molecules. We built a simple micro-array fabrication and detection in the laboratory, based on this method, to probe addressable pools (genes, proteins or antibodies). We have lately used molecular beacons (single stranded DNA with a stem-loop structure containing a fluorophore and quencher), for the direct detection of unlabelled mRNA. As a first step towards a study of the dynamics of the biological code, we have begun to examine the patterns of gene expression during virus (T7 phage) infection of E-coli bacteria.
NASA Astrophysics Data System (ADS)
Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.
2017-07-01
DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
Decoding DNA labels by melting curve analysis using real-time PCR.
Balog, József A; Fehér, Liliána Z; Puskás, László G
2017-12-01
Synthetic DNA has been used as an authentication code for a diverse number of applications. However, existing decoding approaches are based on either DNA sequencing or the determination of DNA length variations. Here, we present a simple alternative protocol for labeling different objects using a small number of short DNA sequences that differ in their melting points. Code amplification and decoding can be done in two steps using quantitative PCR (qPCR). To obtain a DNA barcode with high complexity, we defined 8 template groups, each having 4 different DNA templates, yielding 158 (>2.5 billion) combinations of different individual melting temperature (Tm) values and corresponding ID codes. The reproducibility and specificity of the decoding was confirmed by using the most complex template mixture, which had 32 different products in 8 groups with different Tm values. The industrial applicability of our protocol was also demonstrated by labeling a drone with an oil-based paint containing a predefined DNA code, which was then successfully decoded. The method presented here consists of a simple code system based on a small number of synthetic DNA sequences and a cost-effective, rapid decoding protocol using a few qPCR reactions, enabling a wide range of authentication applications.
Transformable Rhodobacter strains, method for producing transformable Rhodobacter strains
Laible, Philip D.; Hanson, Deborah K.
2018-05-08
The invention provides an organism for expressing foreign DNA, the organism engineered to accept standard DNA carriers. The genome of the organism codes for intracytoplasmic membranes and features an interruption in at least one of the genes coding for restriction enzymes. Further provided is a system for producing biological materials comprising: selecting a vehicle to carry DNA which codes for the biological materials; determining sites on the vehicle's DNA sequence susceptible to restriction enzyme cleavage; choosing an organism to accept the vehicle based on that organism not acting upon at least one of said vehicle's sites; engineering said vehicle to contain said DNA; thereby creating a synthetic vector; and causing the synthetic vector to enter the organism so as cause expression of said DNA.
What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?
NASA Astrophysics Data System (ADS)
Liebovitch, Larry
1998-03-01
The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find evidence for such error correcting codes in these genes. However, we analyzed only a small amount of DNA and if digitial error correcting schemes are present in DNA, they may be more subtle than such simple linear block codes. The basic issue we raise here, is how information is stored in DNA and an appreciation that digital symbol sequences, such as DNA, admit of interesting schemes to store and protect the fidelity of their information content. Liebovitch, Tao, Todorov, Levine. 1996. Biophys. J. 71:1539-1544. Supported by NIH grant EY6234.
Yin, Changchuan
2015-04-01
To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
Caruccio, Nicholas
2011-01-01
DNA library preparation is a common entry point and bottleneck for next-generation sequencing. Current methods generally consist of distinct steps that often involve significant sample loss and hands-on time: DNA fragmentation, end-polishing, and adaptor-ligation. In vitro transposition with Nextera™ Transposomes simultaneously fragments and covalently tags the target DNA, thereby combining these three distinct steps into a single reaction. Platform-specific sequencing adaptors can be added, and the sample can be enriched and bar-coded using limited-cycle PCR to prepare di-tagged DNA fragment libraries. Nextera technology offers a streamlined, efficient, and high-throughput method for generating bar-coded libraries compatible with multiple next-generation sequencing platforms.
Cellulases and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2001-02-20
The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Cellulases and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2001-01-01
The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
2014-01-01
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895
[DNA prints instead of plantar prints in neonatal identification].
Rodríguez-Alarcón Gómez, J; Martińez de Pancorbo Gómez, M; Santillana Ferrer, L; Castro Espido, A; Melchor Maros, J C; Linares Uribe, M A; Fernández-Llebrez del Rey, L; Aranguren Dúo, G
1996-06-22
To check the possible usefulness in studying DNA in dried blood spots taken on filter paper blotters for newborn identification. It set out to establish: 1. The validity of the method for analysis; 2. The validity of all stored samples (such as those kept in clinical records); 3. Guarantee of non-intrusion in the genetic code; 4. Acceptable price and execution time. Forty (40) anonymous 13-year-old samples of 20 subjects (2 per subject) were studied. DNA was extracted using Chelex resin and the STR ("small tandem repeat") of microsatellite DNA was studies using the "polimerase chain reaction method" (PCR). Three non coding DNA loci (CSF1PO, TPOX and THO1) were analyzed by Multiplex amplification. It was possible to type 39 samples, making it possible to match the 20 cases (one by exclusion). The complete procedure yielded the results within 24 hours in all cases. The estimated final cost was found to be a fifth of that conventional maternity/paternity tests. The study carried out made matching possible in all 20 cases (directly in 19 cases). It was not necessary to study DNA coding areas. The validity of the method for analyzing samples stored for 13 years without any special care was also demonstrated. The technic was fast, producing the results within 24 hours, and at reasonable cost.
Zhang, Yuqin; Lin, Fanbo; Zhang, Youyu; Li, Haitao; Zeng, Yue; Tang, Hao; Yao, Shouzhuo
2011-01-01
A new method for the detection of point mutation in DNA based on the monobase-coded cadmium tellurium nanoprobes and the quartz crystal microbalance (QCM) technique was reported. A point mutation (single-base, adenine, thymine, cytosine, and guanine, namely, A, T, C and G, mutation in DNA strand, respectively) DNA QCM sensor was fabricated by immobilizing single-base mutation DNA modified magnetic beads onto the electrode surface with an external magnetic field near the electrode. The DNA-modified magnetic beads were obtained from the biotin-avidin affinity reaction of biotinylated DNA and streptavidin-functionalized core/shell Fe(3)O(4)/Au magnetic nanoparticles, followed by a DNA hybridization reaction. Single-base coded CdTe nanoprobes (A-CdTe, T-CdTe, C-CdTe and G-CdTe, respectively) were used as the detection probes. The mutation site in DNA was distinguished by detecting the decreases of the resonance frequency of the piezoelectric quartz crystal when the coded nanoprobe was added to the test system. This proposed detection strategy for point mutation in DNA is proved to be sensitive, simple, repeatable and low-cost, consequently, it has a great potential for single nucleotide polymorphism (SNP) detection. 2011 © The Japan Society for Analytical Chemistry
Multiple tag labeling method for DNA sequencing
Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.
1995-01-01
A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.
Investigation of a Sybr-Green-Based Method to Validate DNA Sequences for DNA Computing
2005-05-01
OF A SYBR-GREEN-BASED METHOD TO VALIDATE DNA SEQUENCES FOR DNA COMPUTING 6. AUTHOR(S) Wendy Pogozelski, Salvatore Priore, Matthew Bernard ...simulated annealing. Biochemistry, 35, 14077-14089. 15 Pogozelski, W.K., Bernard , M.P. and Macula, A. (2004) DNA code validation using...and Clark, B.F.C. (eds) In RNA Biochemistry and Biotechnology, NATO ASI Series, Kluwer Academic Publishers. Zucker, M. and Stiegler , P. (1981
Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain
2011-01-01
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
Mas, Sergi; Crescenti, Anna; Gassó, Patricia; Vidal-Taboada, Jose M; Lafuente, Amalia
2007-08-01
As pharmacogenetic studies frequently require establishment of DNA banks containing large cohorts with multi-centric designs, inexpensive methods for collecting and storing high-quality DNA are needed. The aims of this study were two-fold: to compare the amount and quality of DNA obtained from two different DNA cards (IsoCode Cards or FTA Classic Cards, Whatman plc, Brentford, Middlesex, UK); and to evaluate the effects of time and storage temperature, as well as the influence of anticoagulant ethylenediaminetetraacetic acid on the DNA elution procedure. The samples were genotyped by several methods typically used in pharmacogenetic studies: multiplex PCR, PCR-restriction fragment length polymorphism, single nucleotide primer extension, and allelic discrimination assay. In addition, they were amplified by whole genome amplification to increase genomic DNA mass. Time, storage temperature and ethylenediaminetetraacetic acid had no significant effects on either DNA card. This study reveals the importance of drying blood spots prior to isolation to avoid haemoglobin interference. Moreover, our results demonstrate that re-isolation protocols could be applied to increase the amount of DNA recovered. The samples analysed were accurately genotyped with all the methods examined herein. In conclusion, our study shows that both DNA cards, IsoCode Cards and FTA Classic Cards, facilitate genetic and pharmacogenetic testing for routine clinical practice.
Multiple tag labeling method for DNA sequencing
Mathies, R.A.; Huang, X.C.; Quesada, M.A.
1995-07-25
A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.
Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong
2012-01-01
Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.
Cao, Yinhe; Tung, Wen-Wen; Gao, J B
2004-01-01
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Lichenase and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2000-08-15
The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.
Lee, Hwan Young; Yoo, Ji-Eun; Park, Myung Jin; Chung, Ukhee; Kim, Chong-Youl; Shin, Kyoung-Jin
2006-11-01
The present study analyzed 21 coding region SNP markers and one deletion motif for the determination of East Asian mitochondrial DNA (mtDNA) haplogroups by designing three multiplex systems which apply single base extension methods. Using two multiplex systems, all 593 Korean mtDNAs were allocated into 15 haplogroups: M, D, D4, D5, G, M7, M8, M9, M10, M11, R, R9, B, A, and N9. As the D4 haplotypes occurred most frequently in Koreans, the third multiplex system was used to further define D4 subhaplogroups: D4a, D4b, D4e, D4g, D4h, and D4j. This method allowed the complementation of coding region information with control region mutation motifs and the resultant findings also suggest reliable control region mutation motifs for the assignment of East Asian mtDNA haplogroups. These three multiplex systems produce good results in degraded samples as they contain small PCR products (101-154 bp) for single base extension reactions. SNP scoring was performed in 101 old skeletal remains using these three systems to prove their utility in degraded samples. The sequence analysis of mtDNA control region with high incidence of haplogroup-specific mutations and the selective scoring of highly informative coding region SNPs using the three multiplex systems are useful tools for most applications involving East Asian mtDNA haplogroup determination and haplogroup-directed stringent quality control.
ERIC Educational Resources Information Center
Warren, Michael D.
1997-01-01
Explains a method to enable students to understand DNA and protein synthesis using model-building and role-playing. Acquaints students with the triplet code and transcription. Includes copies of the charts used in this technique. (DDR)
Resurrection of DNA Function In Vivo from an Extinct Genome
Pask, Andrew J.; Behringer, Richard R.; Renfree, Marilyn B.
2008-01-01
There is a burgeoning repository of information available from ancient DNA that can be used to understand how genomes have evolved and to determine the genetic features that defined a particular species. To assess the functional consequences of changes to a genome, a variety of methods are needed to examine extinct DNA function. We isolated a transcriptional enhancer element from the genome of an extinct marsupial, the Tasmanian tiger (Thylacinus cynocephalus or thylacine), obtained from 100 year-old ethanol-fixed tissues from museum collections. We then examined the function of the enhancer in vivo. Using a transgenic approach, it was possible to resurrect DNA function in transgenic mice. The results demonstrate that the thylacine Col2A1 enhancer directed chondrocyte-specific expression in this extinct mammalian species in the same way as its orthologue does in mice. While other studies have examined extinct coding DNA function in vitro, this is the first example of the restoration of extinct non-coding DNA and examination of its function in vivo. Our method using transgenesis can be used to explore the function of regulatory and protein-coding sequences obtained from any extinct species in an in vivo model system, providing important insights into gene evolution and diversity. PMID:18493600
A Radiation Chemistry Code Based on the Green's Function of the Diffusion Equation
NASA Technical Reports Server (NTRS)
Plante, Ianik; Wu, Honglu
2014-01-01
Stochastic radiation track structure codes are of great interest for space radiation studies and hadron therapy in medicine. These codes are used for a many purposes, notably for microdosimetry and DNA damage studies. In the last two decades, they were also used with the Independent Reaction Times (IRT) method in the simulation of chemical reactions, to calculate the yield of various radiolytic species produced during the radiolysis of water and in chemical dosimeters. Recently, we have developed a Green's function based code to simulate reversible chemical reactions with an intermediate state, which yielded results in excellent agreement with those obtained by using the IRT method. This code was also used to simulate and the interaction of particles with membrane receptors. We are in the process of including this program for use with the Monte-Carlo track structure code Relativistic Ion Tracks (RITRACKS). This recent addition should greatly expand the capabilities of RITRACKS, notably to simulate DNA damage by both the direct and indirect effect.
ERIC Educational Resources Information Center
King, Angela G.
2004-01-01
Nanotechnology are employed by researchers at Northwestern University to develop a method of labeling disease markers present in blood with unique DNA tags they have dubbed "bio-bar-codes". The preparation of nanoparticle and magnetic microparticle probes and a nanoparticle-based PSR-less DNA amplification scheme are involved by the DNA-BCA assay.
Facile and High-Throughput Synthesis of Functional Microparticles with Quick Response Codes.
Ramirez, Lisa Marie S; He, Muhan; Mailloux, Shay; George, Justin; Wang, Jun
2016-06-01
Encoded microparticles are high demand in multiplexed assays and labeling. However, the current methods for the synthesis and coding of microparticles either lack robustness and reliability, or possess limited coding capacity. Here, a massive coding of dissociated elements (MiCODE) technology based on innovation of a chemically reactive off-stoichimetry thiol-allyl photocurable polymer and standard lithography to produce a large number of quick response (QR) code microparticles is introduced. The coding process is performed by photobleaching the QR code patterns on microparticles when fluorophores are incorporated into the prepolymer formulation. The fabricated encoded microparticles can be released from a substrate without changing their features. Excess thiol functionality on the microparticle surface allows for grafting of amine groups and further DNA probes. A multiplexed assay is demonstrated using the DNA-grafted QR code microparticles. The MiCODE technology is further characterized by showing the incorporation of BODIPY-maleimide (BDP-M) and Nile Red fluorophores for coding and the use of microcontact printing for immobilizing DNA probes on microparticle surfaces. This versatile technology leverages mature lithography facilities for fabrication and thus is amenable to scale-up in the future, with potential applications in bioassays and in labeling consumer products. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Design pattern mining using distributed learning automata and DNA sequence alignment.
Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina
2014-01-01
Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.
Yu, Hong; Kong, Lingfeng; Li, Qi
2016-01-01
In this study, we evaluated the efficacy of 12 mitochondrial protein-coding genes from 238 mitochondrial genomes of 140 molluscan species as potential DNA barcodes for mollusks. Three barcoding methods (distance, monophyly and character-based methods) were used in species identification. The species recovery rates based on genetic distances for the 12 genes ranged from 70.83 to 83.33%. There were no significant differences in intra- or interspecific variability among the 12 genes. The monophyly and character-based methods provided higher resolution than the distance-based method in species delimitation. Especially in closely related taxa, the character-based method showed some advantages. The results suggested that besides the standard COI barcode, other 11 mitochondrial protein-coding genes could also be potentially used as a molecular diagnostic for molluscan species discrimination. Our results also showed that the combination of mitochondrial genes did not enhance the efficacy for species identification and a single mitochondrial gene would be fully competent.
Novel numerical and graphical representation of DNA sequences and proteins.
Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D
2006-12-01
We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.
Hua, Wei; Wang, Jiasong; Zhao, Jian
2014-01-01
Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.
DNA barcode goes two-dimensions: DNA QR code web server.
Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.
VaDiR: an integrated approach to Variant Detection in RNA.
Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy
2018-02-01
Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Samorì, Bruno; Zuccheri, Giampaolo
2005-02-11
The nanometer scale is a special place where all sciences meet and develop a particularly strong interdisciplinarity. While biology is a source of inspiration for nanoscientists, chemistry has a central role in turning inspirations and methods from biological systems to nanotechnological use. DNA is the biological molecule by which nanoscience and nanotechnology is mostly fascinated. Nature uses DNA not only as a repository of the genetic information, but also as a controller of the expression of the genes it contains. Thus, there are codes embedded in the DNA sequence that serve to control recognition processes on the atomic scale, such as the base pairing, and others that control processes taking place on the nanoscale. From the chemical point of view, DNA is the supramolecular building block with the highest informational content. Nanoscience has therefore the opportunity of using DNA molecules to increase the level of complexity and efficiency in self-assembling and self-directing processes.
DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information
ERIC Educational Resources Information Center
McCallister, Gary
2005-01-01
The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)
Design Pattern Mining Using Distributed Learning Automata and DNA Sequence Alignment
Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina
2014-01-01
Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns. PMID:25243670
DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server
Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113
Müller, J-M V; Wissemann, J; Meli, M L; Dasen, G; Lutz, H; Heinzerling, L; Feige, K
2011-11-01
Whole blood pharmacokinetics of intratumourally injected naked plasmid DNA coding for equine Interleukin 12 (IL-12) was assessed as a means of in vivo gene transfer in the treatment of melanoma in grey horses. The expression of induced interferon gamma (IFN-g) was evaluated in order to determine the pharmacodynamic properties of in vivo gene transduction. Seven grey horses bearing melanoma were injected intratumourally with 250 µg naked plasmid DNA coding for IL-12. Peripheral blood and biopsies from the injection site were taken at 13 time points until day 14 post injection (p.i.). Samples were analysed using quantitative real-time PCR. Plasmid DNA was quantified in blood samples and mRNA expression for IFN-g in tissue samples. Plasmid DNA showed fast elimination kinetics with more than 99 % of the plasmid disappearing within 36 hours. IFN-g expression increased quickly after IL-12 plasmid injection, but varied between individual horses. Intratumoural injection of plasmid DNA is a feasible method for inducing transgene expression in vivo. Biological activity of the transgene IL-12 was confirmed by measuring expression of IFN-g.
Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics
NASA Technical Reports Server (NTRS)
Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.
Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use
Lewis, Norman G.; Davin, Laurence B.; Dinkova-Kostova, Albena T.; Fujita, Masayuki; Gang, David R.; Sarkanen, Simo; Ford, Joshua D.
2001-04-03
Dirigent proteins and pinoresinol/lariciresinol reductases have been isolated, together with cDNAs encoding dirigent proteins and pinoresinol/lariciresinol reductases. Accordingly, isolated DNA sequences are provided which code for the expression of dirigent proteins and pinoresinol/lariciresinol reductases. In other aspects, replicable recombinant cloning vehicles are provided which code for dirigent proteins or pinoresinol/lariciresinol reductases or for a base sequence sufficiently complementary to at least a portion of dirigent protein or pinoresinol/lariciresinol reductase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding dirigent protein or pinoresinol/lariciresinol reductase. Thus, systems and methods are provided for the recombinant expression of dirigent proteins and/or pinoresinol/lariciresinol reductases.
Superimposed Code Theoretic Analysis of DNA Codes and DNA Computing
2008-01-01
complements of one another and the DNA duplex formed is a Watson - Crick (WC) duplex. However, there are many instances when the formation of non-WC...that the user’s requirements for probe selection are met based on the Watson - Crick probe locality within a target. The second type, called...AFRL-RI-RS-TR-2007-288 Final Technical Report January 2008 SUPERIMPOSED CODE THEORETIC ANALYSIS OF DNA CODES AND DNA COMPUTING
Strand, Janne M; Scheffler, Katja; Bjørås, Magnar; Eide, Lars
2014-06-01
The cellular genomes are continuously damaged by reactive oxygen species (ROS) from aerobic processes. The impact of DNA damage depends on the specific site as well as the cellular state. The steady-state level of DNA damage is the net result of continuous formation and subsequent repair, but it is unknown to what extent heterogeneous damage distribution is caused by variations in formation or repair of DNA damage. Here, we used a restriction enzyme/qPCR based method to analyze DNA damage in promoter and coding regions of four nuclear genes: the two house-keeping genes Gadph and Tbp, and the Ndufa9 and Ndufs2 genes encoding mitochondrial complex I subunits, as well as mt-Rnr1 encoded by mitochondrial DNA (mtDNA). The distribution of steady-state levels of damage varied in a site-specific manner. Oxidative stress induced damage in nDNA to a similar extent in promoter and coding regions, and more so in mtDNA. The subsequent removal of damage from nDNA was efficient and comparable with recovery times depending on the initial damage load, while repair of mtDNA was delayed with subsequently slower repair rate. The repair was furthermore found to be independent of transcription or the transcription-coupled repair factor CSB, but dependent on cellular ATP. Our results demonstrate that the capacity to repair DNA is sufficient to remove exogenously induced damage. Thus, we conclude that the heterogeneous steady-state level of DNA damage in promoters and coding regions is caused by site-specific DNA damage/modifications that take place under normal metabolism. Copyright © 2014 Elsevier B.V. All rights reserved.
Low-energy electron dose-point kernel simulations using new physics models implemented in Geant4-DNA
NASA Astrophysics Data System (ADS)
Bordes, Julien; Incerti, Sébastien; Lampe, Nathanael; Bardiès, Manuel; Bordage, Marie-Claude
2017-05-01
When low-energy electrons, such as Auger electrons, interact with liquid water, they induce highly localized ionizing energy depositions over ranges comparable to cell diameters. Monte Carlo track structure (MCTS) codes are suitable tools for performing dosimetry at this level. One of the main MCTS codes, Geant4-DNA, is equipped with only two sets of cross section models for low-energy electron interactions in liquid water (;option 2; and its improved version, ;option 4;). To provide Geant4-DNA users with new alternative physics models, a set of cross sections, extracted from CPA100 MCTS code, have been added to Geant4-DNA. This new version is hereafter referred to as ;Geant4-DNA-CPA100;. In this study, ;Geant4-DNA-CPA100; was used to calculate low-energy electron dose-point kernels (DPKs) between 1 keV and 200 keV. Such kernels represent the radial energy deposited by an isotropic point source, a parameter that is useful for dosimetry calculations in nuclear medicine. In order to assess the influence of different physics models on DPK calculations, DPKs were calculated using the existing Geant4-DNA models (;option 2; and ;option 4;), newly integrated CPA100 models, and the PENELOPE Monte Carlo code used in step-by-step mode for monoenergetic electrons. Additionally, a comparison was performed of two sets of DPKs that were simulated with ;Geant4-DNA-CPA100; - the first set using Geant4‧s default settings, and the second using CPA100‧s original code default settings. A maximum difference of 9.4% was found between the Geant4-DNA-CPA100 and PENELOPE DPKs. Between the two Geant4-DNA existing models, slight differences, between 1 keV and 10 keV were observed. It was highlighted that the DPKs simulated with the two Geant4-DNA's existing models were always broader than those generated with ;Geant4-DNA-CPA100;. The discrepancies observed between the DPKs generated using Geant4-DNA's existing models and ;Geant4-DNA-CPA100; were caused solely by their different cross sections. The different scoring and interpolation methods used in CPA100 and Geant4 to calculate DPKs showed differences close to 3.0% near the source.
Hu, Lin-Yong; Cui, Chen-Chen; Song, Yu-Jie; Wang, Xiang-Guo; Jin, Ya-Ping; Wang, Ai-Hua; Zhang, Yong
2012-07-01
cDNA is widely used in gene function elucidation and/or transgenics research but often suitable tissues or cells from which to isolate mRNA for reverse transcription are unavailable. Here, an alternative method for cDNA cloning is described and tested by cloning the cDNA of human LALBA (human alpha-lactalbumin) from genomic DNA. First, genomic DNA containing all of the coding exons was cloned from human peripheral blood and inserted into a eukaryotic expression vector. Next, by delivering the plasmids into either 293T or fibroblast cells, surrogate cells were constructed. Finally, the total RNA was extracted from the surrogate cells and cDNA was obtained by RT-PCR. The human LALBA cDNA that was obtained was compared with the corresponding mRNA published in GenBank. The comparison showed that the two sequences were identical. The novel method for cDNA cloning from surrogate eukaryotic cells described here uses well-established techniques that are feasible and simple to use. We anticipate that this alternative method will have widespread applications.
Recominant Pinoresino-Lariciresinol Reductase, Recombinant Dirigent Protein And Methods Of Use
Lewis, Norman G.; Davin, Laurence B.; Dinkova-Kostova, Albena T.; Fujita, Masayuki , Gang; David R. , Sarkanen; Simo , Ford; Joshua D.
2003-10-21
Dirigent proteins and pinoresinol/lariciresinol reductases have been isolated, together with cDNAs encoding dirigent proteins and pinoresinol/lariciresinol reductases. Accordingly, isolated DNA sequences are provided from source species Forsythia intermedia, Thuja plicata, Tsuga heterophylla, Eucommia ulmoides, Linum usitatissimum, and Schisandra chinensis, which code for the expression of dirigent proteins and pinoresinol/lariciresinol reductases. In other aspects, replicable recombinant cloning vehicles are provided which code for dirigent proteins or pinoresinol/lariciresinol reductases or for a base sequence sufficiently complementary to at least a portion of dirigent protein or pinoresinol/lariciresinol reductase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding dirigent protein or pinoresinol/lariciresinol reductase. Thus, systems and methods are provided for the recombinant expression of dirigent proteins and/or pinoresinol/lariciresinol reductases.
The cDNA-derived amino acid sequence of hemoglobin II from Lucina pectinata.
Torres-Mercado, Elineth; Renta, Jessicca Y; Rodríguez, Yolanda; López-Garriga, Juan; Cadilla, Carmen L
2003-11-01
Hemoglobin II from the clam Lucina pectinata is an oxygen-reactive protein with a unique structural organization in the heme pocket involving residues Gln65 (E7), Tyr30 (B10), Phe44 (CD1), and Phe69 (E11). We employed the reverse transcriptase-polymerase chain reaction (RT-PCR) and methods to synthesize various cDNA(HbII). An initial 300-bp cDNA clone was amplified from total RNA by RT-PCR using degenerate oligonucleotides. Gene-specific primers derived from the HbII-partial cDNA sequence were used to obtain the 5' and 3' ends of the cDNA by RACE. The length of the HbII cDNA, estimated from overlapping clones, was approximately 2114 bases. Northern blot analysis revealed that the mRNA size of HbII agrees with the estimated size using cDNA data. The coding region of the full-length HbII cDNA codes for 151 amino acids. The calculated molecular weight of HbII, including the heme group and acetylated N-terminal residue, is 17,654.07 Da.
Du, Ping; Li, Hongxia; Cao, Wei
2009-07-15
A novel and sensitive sandwich electrochemical biosensor based on the amplification of magnetic microbeads and Au nanoparticles (NPs) modified with bio bar code and PbS nanoparticles was constructed in the present work. In this method, the magnetic microspheres were coated with 4 layers polyelectrolytes in order to increase carboxyl groups on the surface of the magnetic microbeads, which enhanced the amount of the capture DNA. The amino-functionalized capture DNA on the surface of magnetic microbeads hybridized with one end of target DNA, the other end of which was hybridized with signal DNA probe labelled with Au NPs on the terminus. The Au NPs were modified with bio bar code and the PbS NPs were used as a marker for identifying the target oligoncleotide. The modification of magnetic microbeads could immobilize more amino-group terminal capture DNA, and the bio bar code could increase the amount of Au NPs that combined with the target DNA. The detection of lead ions performed by anodic stripping voltammetry (ASV) technology further improved the sensitivity of the biosensor. As a result, the present DNA biosensor showed good selectivity and sensitivity by the combined amplification. Under the optimum conditions, the linear relationship with the concentration of the target DNA was ranging from 2.0 x 10(-14) M to 1.0 x 10(-12)M and a detection limit as low as 5.0 x 10(-15)M was obtained.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.
Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E
1995-05-01
An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis
NASA Technical Reports Server (NTRS)
Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Statistical properties of DNA sequences
NASA Technical Reports Server (NTRS)
Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.
1995-01-01
We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Ancient DNA sequence revealed by error-correcting codes.
Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo
2015-07-10
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes
Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo
2015-01-01
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
McKernan, Kevin J.; Spangler, Jessica; Zhang, Lei; Tadigotla, Vasisht; McLaughlin, Stephen; Warner, Jason; Zare, Amir; Boles, Richard G.
2014-01-01
We have developed a PCR method, coined Déjà vu PCR, that utilizes six nucleotides in PCR with two methyl specific restriction enzymes that respectively digest these additional nucleotides. Use of this enzyme-and-nucleotide combination enables what we term a “DNA diode”, where DNA can advance in a laboratory in only one direction and cannot feedback into upstream assays. Here we describe aspects of this method that enable consecutive amplification with the introduction of a 5th and 6th base while simultaneously providing methylation dependent mitochondrial DNA enrichment. These additional nucleotides enable a novel DNA decontamination technique that generates ephemeral and easy to decontaminate DNA. PMID:24788618
Qiu, Guo-Hua
2016-01-01
In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense. Copyright © 2016 Elsevier B.V. All rights reserved.
Zhu, Shiyou; Li, Wei; Liu, Jingze; Chen, Chen-Hao; Liao, Qi; Xu, Ping; Xu, Han; Xiao, Tengfei; Cao, Zhongzheng; Peng, Jingyu; Yuan, Pengfei; Brown, Myles; Liu, Xiaole Shirley; Wei, Wensheng
2017-01-01
CRISPR/Cas9 screens have been widely adopted to analyse coding gene functions, but high throughput screening of non-coding elements using this method is more challenging, because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. Herein, we report a high throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We individually validated 9 lncRNAs using CRISPR/Cas9-mediated genomic deletion and functional rescue, CRISPR activation or inhibition, and gene expression profiling. Our high-throughput pgRNA genome deletion method should enable rapid identification of functional mammalian non-coding elements. PMID:27798563
ANN modeling of DNA sequences: new strategies using DNA shape code.
Parbhane, R V; Tambe, S S; Kulkarni, B D
2000-09-01
Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.
Nanoparticle based bio-bar code technology for trace analysis of aflatoxin B1 in Chinese herbs.
Yu, Yu-Yan; Chen, Yuan-Yuan; Gao, Xuan; Liu, Yuan-Yuan; Zhang, Hong-Yan; Wang, Tong-Ying
2018-04-01
A novel and sensitive assay for aflatoxin B1 (AFB1) detection has been developed by using bio-bar code assay (BCA). The method that relies on polyclonal antibodies encoded with DNA modified gold nanoparticle (NP) and monoclonal antibodies modified magnetic microparticle (MMP), and subsequent detection of amplified target in the form of bio-bar code using a fluorescent quantitative polymerase chain reaction (FQ-PCR) detection method. First, NP probes encoded with DNA that was unique to AFB1, MMP probes with monoclonal antibodies that bind AFB1 specifically were prepared. Then, the MMP-AFB1-NP sandwich compounds were acquired, dehybridization of the oligonucleotides on the nanoparticle surface allows the determination of the presence of AFB1 by identifying the oligonucleotide sequence released from the NP through FQ-PCR detection. The bio-bar code techniques system for detecting AFB1 was established, and the sensitivity limit was about 10 -8 ng/mL, comparable ELISA assays for detecting the same target, it showed that we can detect AFB1 at low attomolar levels with the bio-bar-code amplification approach. This is also the first demonstration of a bio-bar code type assay for the detection of AFB1 in Chinese herbs. Copyright © 2017. Published by Elsevier B.V.
Extraction of High Quality DNA from Seized Moroccan Cannabis Resin (Hashish)
El Alaoui, Moulay Abdelaziz; Melloul, Marouane; Alaoui Amine, Sanaâ; Stambouli, Hamid; El Bouri, Aziz; Soulaymani, Abdelmajid; El Fahime, Elmostafa
2013-01-01
The extraction and purification of nucleic acids is the first step in most molecular biology analysis techniques. The objective of this work is to obtain highly purified nucleic acids derived from Cannabis sativa resin seizure in order to conduct a DNA typing method for the individualization of cannabis resin samples. To obtain highly purified nucleic acids from cannabis resin (Hashish) free from contaminants that cause inhibition of PCR reaction, we have tested two protocols: the CTAB protocol of Wagner and a CTAB protocol described by Somma (2004) adapted for difficult matrix. We obtained high quality genomic DNA from 8 cannabis resin seizures using the adapted protocol. DNA extracted by the Wagner CTAB protocol failed to give polymerase chain reaction (PCR) amplification of tetrahydrocannabinolic acid (THCA) synthase coding gene. However, the extracted DNA by the second protocol permits amplification of THCA synthase coding gene using different sets of primers as assessed by PCR. We describe here for the first time the possibility of DNA extraction from (Hashish) resin derived from Cannabis sativa. This allows the use of DNA molecular tests under special forensic circumstances. PMID:24124454
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
Djebali, Sarah; Delaplace, Franck; Crollius, Hugues Roest
2006-01-01
Background Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusion We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. PMID:16925841
DNA rearrangements directed by non-coding RNAs in ciliates
Mochizuki, Kazufumi
2013-01-01
Extensive programmed rearrangement of DNA, including DNA elimination, chromosome fragmentation, and DNA descrambling, takes place in the newly developed macronucleus during the sexual reproduction of ciliated protozoa. Recent studies have revealed that two distant classes of ciliates use distinct types of non-coding RNAs to regulate such DNA rearrangement events. DNA elimination in Tetrahymena is regulated by small non-coding RNAs that are produced and utilized in an RNAi-related process. It has been proposed that the small RNAs produced from the micronuclear genome are used to identify eliminated DNA sequences by whole-genome comparison between the parental macronucleus and the micronucleus. In contrast, DNA descrambling in Oxytricha is guided by long non-coding RNAs that are produced from the parental macronuclear genome. These long RNAs are proposed to act as templates for the direct descrambling events that occur in the developing macronucleus. Both cases provide useful examples to study epigenetic chromatin regulation by non-coding RNAs. PMID:21956937
DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.
Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano
2018-01-01
Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .
BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.
Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie
2017-07-01
Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Introduction to the Natural Anticipator and the Artificial Anticipator
NASA Astrophysics Data System (ADS)
Dubois, Daniel M.
2010-11-01
This short communication deals with the introduction of the concept of anticipator, which is one who anticipates, in the framework of computing anticipatory systems. The definition of anticipation deals with the concept of program. Indeed, the word program, comes from "pro-gram" meaning "to write before" by anticipation, and means a plan for the programming of a mechanism, or a sequence of coded instructions that can be inserted into a mechanism, or a sequence of coded instructions, as genes or behavioural responses, that is part of an organism. Any natural or artificial programs are thus related to anticipatory rewriting systems, as shown in this paper. All the cells in the body, and the neurons in the brain, are programmed by the anticipatory genetic code, DNA, in a low-level language with four signs. The programs in computers are also computing anticipatory systems. It will be shown, at one hand, that the genetic code DNA is a natural anticipator. As demonstrated by Nobel laureate McClintock [8], genomes are programmed. The fundamental program deals with the DNA genetic code. The properties of the DNA consist in self-replication and self-modification. The self-replicating process leads to reproduction of the species, while the self-modifying process leads to new species or evolution and adaptation in existing ones. The genetic code DNA keeps its instructions in memory in the DNA coding molecule. The genetic code DNA is a rewriting system, from DNA coding to DNA template molecule. The DNA template molecule is a rewriting system to the Messenger RNA molecule. The information is not destroyed during the execution of the rewriting program. On the other hand, it will be demonstrated that Turing machine is an artificial anticipator. The Turing machine is a rewriting system. The head reads and writes, modifying the content of the tape. The information is destroyed during the execution of the program. This is an irreversible process. The input data are lost.
Soares, Ricardo J; Maglieri, Giulia; Gutschner, Tony; Lund, Anders H; Nielsen, Boye S
2018-01-01
Abstract Deciphering the functions of long non-coding RNAs (lncRNAs) is facilitated by visualization of their subcellular localization using in situ hybridization (ISH) techniques. We evaluated four different ISH methods for detection of MALAT1 and CYTOR in cultured cells: a multiple probe detection approach with or without enzymatic signal amplification, a branched-DNA (bDNA) probe and an LNA-modified probe with enzymatic signal amplification. All four methods adequately stained MALAT1 in the nucleus in all of three cell lines investigated, HeLa, NHDF and T47D, and three of the methods detected the less expressed CYTOR. The sensitivity of the four ISH methods was evaluated by image analysis. In all three cell lines, the two methods involving enzymatic amplification gave the most intense MALAT1 signal, but the signal-to-background ratios were not different. CYTOR was best detected using the bDNA method. All four ISH methods showed significantly reduced MALAT1 signal in knock-out cells, and siRNA-induced knock-down of CYTOR resulted in significantly reduced CYTOR ISH signal, indicating good specificity of the probe designs and detection systems. Our data suggest that the ISH methods allow detection of both abundant and less abundantly expressed lncRNAs, although the latter required the use of the most specific and sensitive probe detection system. PMID:29059327
Li, Junli; Li, Chunyan; Qiu, Rui; Yan, Congchong; Xie, Wenzhang; Wu, Zhen; Zeng, Zhi; Tung, Chuanjong
2015-09-01
The method of Monte Carlo simulation is a powerful tool to investigate the details of radiation biological damage at the molecular level. In this paper, a Monte Carlo code called NASIC (Nanodosimetry Monte Carlo Simulation Code) was developed. It includes physical module, pre-chemical module, chemical module, geometric module and DNA damage module. The physical module can simulate physical tracks of low-energy electrons in the liquid water event-by-event. More than one set of inelastic cross sections were calculated by applying the dielectric function method of Emfietzoglou's optical-data treatments, with different optical data sets and dispersion models. In the pre-chemical module, the ionised and excited water molecules undergo dissociation processes. In the chemical module, the produced radiolytic chemical species diffuse and react. In the geometric module, an atomic model of 46 chromatin fibres in a spherical nucleus of human lymphocyte was established. In the DNA damage module, the direct damages induced by the energy depositions of the electrons and the indirect damages induced by the radiolytic chemical species were calculated. The parameters should be adjusted to make the simulation results be agreed with the experimental results. In this paper, the influence study of the inelastic cross sections and vibrational excitation reaction on the parameters and the DNA strand break yields were studied. Further work of NASIC is underway. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode.
Cai, Yong; Li, Peng; Li, Xi-Wen; Zhao, Jing; Chen, Hai; Yang, Qing; Hu, Hao
2017-07-01
In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical basis for the development of a quality traceability system of traditional Chinese medicine based on a two-dimensional code.
King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach
2014-01-01
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).
Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai
2014-12-01
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.
Kawano, Tomonori
2013-03-01
There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.
A Rewritable, Random-Access DNA-Based Storage System.
Yazdi, S M Hossein Tabatabaei; Yuan, Yongbo; Ma, Jian; Zhao, Huimin; Milenkovic, Olgica
2015-09-18
We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.
A Rewritable, Random-Access DNA-Based Storage System
NASA Astrophysics Data System (ADS)
Tabatabaei Yazdi, S. M. Hossein; Yuan, Yongbo; Ma, Jian; Zhao, Huimin; Milenkovic, Olgica
2015-09-01
We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.
Informational structure of genetic sequences and nature of gene splicing
NASA Astrophysics Data System (ADS)
Trifonov, E. N.
1991-10-01
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
An algebraic hypothesis about the primeval genetic code architecture.
Sánchez, Robersy; Grau, Ricardo
2009-09-01
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
A deep learning method for lincRNA detection using auto-encoder algorithm.
Yu, Ning; Yu, Zeng; Pan, Yi
2017-12-06
RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
Ahmad, Muneer; Jung, Low Tan; Bhuiyan, Al-Amin
2017-10-01
Digital signal processing techniques commonly employ fixed length window filters to process the signal contents. DNA signals differ in characteristics from common digital signals since they carry nucleotides as contents. The nucleotides own genetic code context and fuzzy behaviors due to their special structure and order in DNA strand. Employing conventional fixed length window filters for DNA signal processing produce spectral leakage and hence results in signal noise. A biological context aware adaptive window filter is required to process the DNA signals. This paper introduces a biological inspired fuzzy adaptive window median filter (FAWMF) which computes the fuzzy membership strength of nucleotides in each slide of window and filters nucleotides based on median filtering with a combination of s-shaped and z-shaped filters. Since coding regions cause 3-base periodicity by an unbalanced nucleotides' distribution producing a relatively high bias for nucleotides' usage, such fundamental characteristic of nucleotides has been exploited in FAWMF to suppress the signal noise. Along with adaptive response of FAWMF, a strong correlation between median nucleotides and the Π shaped filter was observed which produced enhanced discrimination between coding and non-coding regions contrary to fixed length conventional window filters. The proposed FAWMF attains a significant enhancement in coding regions identification i.e. 40% to 125% as compared to other conventional window filters tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. This study proves that conventional fixed length window filters applied to DNA signals do not achieve significant results since the nucleotides carry genetic code context. The proposed FAWMF algorithm is adaptive and outperforms significantly to process DNA signal contents. The algorithm applied to variety of DNA datasets produced noteworthy discrimination between coding and non-coding regions contrary to fixed window length conventional filters. Copyright © 2017 Elsevier B.V. All rights reserved.
Deciphering the Epigenetic Code: An Overview of DNA Methylation Analysis Methods
Umer, Muhammad
2013-01-01
Abstract Significance: Methylation of cytosine in DNA is linked with gene regulation, and this has profound implications in development, normal biology, and disease conditions in many eukaryotic organisms. A wide range of methods and approaches exist for its identification, quantification, and mapping within the genome. While the earliest approaches were nonspecific and were at best useful for quantification of total methylated cytosines in the chunk of DNA, this field has seen considerable progress and development over the past decades. Recent Advances: Methods for DNA methylation analysis differ in their coverage and sensitivity, and the method of choice depends on the intended application and desired level of information. Potential results include global methyl cytosine content, degree of methylation at specific loci, or genome-wide methylation maps. Introduction of more advanced approaches to DNA methylation analysis, such as microarray platforms and massively parallel sequencing, has brought us closer to unveiling the whole methylome. Critical Issues: Sensitive quantification of DNA methylation from degraded and minute quantities of DNA and high-throughput DNA methylation mapping of single cells still remain a challenge. Future Directions: Developments in DNA sequencing technologies as well as the methods for identification and mapping of 5-hydroxymethylcytosine are expected to augment our current understanding of epigenomics. Here we present an overview of methodologies available for DNA methylation analysis with special focus on recent developments in genome-wide and high-throughput methods. While the application focus relates to cancer research, the methods are equally relevant to broader issues of epigenetics and redox science in this special forum. Antioxid. Redox Signal. 18, 1972–1986. PMID:23121567
Kawano, Tomonori
2013-01-01
There have been a wide variety of approaches for handling the pieces of DNA as the “unplugged” tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given “passwords” and/or secret numbers using DNA sequences. The “passwords” of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original “passwords.” The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303
Optimized pH method for DNA elution from buccal cells collected in Whatman FTA cards.
Lema, Carolina; Kohl-White, Kendra; Lewis, Laurie R; Dao, Dat D
2006-01-01
DNA is the most accessible biologic material for obtaining information from the human genome because of its molecular stability and its presence in every nucleated cell. Currently, single nucleotide polymorphism genotyping and DNA methylation are the main DNA-based approaches to deriving genomic and epigenomic disease biomarkers. Upon the discontinuation of the Schleicher & Schuell IsoCode product (Dassel, Germany), which was a treated paper system to elute DNA from several biologic sources for polymerase chain reaction (PCR) analysis, a high-yielding DNA elution method was imperative. We describe here an improved procedure of the not fully validated Whatman pH-based elution protocol. Our DNA elution procedure from buccal cells collected in Whatman FTA cards (Whatman Inc., Florham Park, NJ) yielded approximately 4 microg of DNA from a 6-mm FTA card punch and was successfully applied for HLA-DQB1 genotyping. The genotypes showed complete concordance with data obtained from blood of the same subjects. The achieved high DNA yield from buccal cells suggests a potential cost-effective tool for genomic and epigenomic disease biomarkers development.
Phylogenetic Network for European mtDNA
Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari
2001-01-01
The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229
Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions
Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize
2017-01-01
Background Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. Methods DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Results Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela, Landoltia, Lemna, Wolffiella, and Wolffia. Discussion This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds. PMID:29302399
GATA: A graphic alignment tool for comparative sequenceanalysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nix, David A.; Eisen, Michael B.
2005-01-01
Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Coutinho Moraes, Denise F; Still, David W; Lum, Michelle R; Hirsch, Ann M
2015-06-01
Herbal medicines and botanicals have long been used as sole or additional medical aids worldwide. Currently, billions of dollars are spent on botanicals and related products, but minimal regulation exists regarding their purity, integrity, and efficacy. Cases of adulteration and contamination have led to severe illness and even death in some cases. Identifying the plant material in botanicals and phytomedicines using organoleptic means or through microscopic observation of plant parts is not trivial, and plants are often misidentified. Recently, DNA-based methods have been applied to these products because DNA is not changed by growth conditions unlike the chemical constituents of many active pharmaceutical agents. In recent years, DNA barcoding methods, which are used to identify species diversity in the Tree of Life, have been also applied to botanicals and plant-derived dietary supplements. In this review, we recount the history of DNA-based methods for identification of botanicals and discuss some of the difficulties in defining a specific bar code or codes to use. In addition, we describe how next generation sequencing technologies have enabled new techniques that can be applied to identifying these products with greater authority and resolution. Lastly, we present case histories where dietary supplements, decoctions, and other products have been shown to contain materials other than the main ingredient stipulated on the label. We conclude that there is a fundamental need for greater quality control in this industry, which if not self-imposed, that may result from legislation. Georg Thieme Verlag KG Stuttgart · New York.
Nucleic acid molecules encoding isopentenyl monophosphate kinase, and methods of use
Croteau, Rodney B.; Lange, Bernd M.
2001-01-01
A cDNA encoding isopentenyl monophosphate kinase (IPK) from peppermint (Mentha x piperita) has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Accordingly, an isolated DNA sequence (SEQ ID NO:1) is provided which codes for the expression of isopentenyl monophosphate kinase (SEQ ID NO:2), from peppermint (Mentha x piperita). In other aspects, replicable recombinant cloning vehicles are provided which code for isopentenyl monophosphate kinase, or for a base sequence sufficiently complementary to at least a portion of isopentenyl monophosphate kinase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding isopentenyl monophosphate kinase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant isopentenyl monophosphate kinase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant isopentenyl monophosphate kinase may be used to obtain expression or enhanced expression of isopentenyl monophosphate kinase in plants in order to enhance the production of isopentenyl monophosphate kinase, or isoprenoids derived therefrom, or may be otherwise employed for the regulation or expression of isopentenyl monophosphate kinase, or the production of its products.
... exons, the parts of DNA that code for proteins in the body. Researchers like this method because it is faster and cheaper. Learn More More still needs to be done before whole genome sequencing becomes a routine part of medical care. Many ...
Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon
2015-01-01
Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190
Correlation approach to identify coding regions in DNA sequences
NASA Technical Reports Server (NTRS)
Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.
1994-01-01
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.
CORALINA: a universal method for the generation of gRNA libraries for CRISPR-based screening.
Köferle, Anna; Worf, Karolina; Breunig, Christopher; Baumann, Valentin; Herrero, Javier; Wiesbeck, Maximilian; Hutter, Lukas H; Götz, Magdalena; Fuchs, Christiane; Beck, Stephan; Stricker, Stefan H
2016-11-14
The bacterial CRISPR system is fast becoming the most popular genetic and epigenetic engineering tool due to its universal applicability and adaptability. The desire to deploy CRISPR-based methods in a large variety of species and contexts has created an urgent need for the development of easy, time- and cost-effective methods enabling large-scale screening approaches. Here we describe CORALINA (comprehensive gRNA library generation through controlled nuclease activity), a method for the generation of comprehensive gRNA libraries for CRISPR-based screens. CORALINA gRNA libraries can be derived from any source of DNA without the need of complex oligonucleotide synthesis. We show the utility of CORALINA for human and mouse genomic DNA, its reproducibility in covering the most relevant genomic features including regulatory, coding and non-coding sequences and confirm the functionality of CORALINA generated gRNAs. The simplicity and cost-effectiveness make CORALINA suitable for any experimental system. The unprecedented sequence complexities obtainable with CORALINA libraries are a necessary pre-requisite for less biased large scale genomic and epigenomic screens.
Vladimirov, N V; Likhoshvaĭ, V A; Matushkin, Iu G
2007-01-01
Gene expression is known to correlate with degree of codon bias in many unicellular organisms. However, such correlation is absent in some organisms. Recently we demonstrated that inverted complementary repeats within coding DNA sequence must be considered for proper estimation of translation efficiency, since they may form secondary structures that obstruct ribosome movement. We have developed a program for estimation of potential coding DNA sequence expression in defined unicellular organism using its genome sequence. The program computes elongation efficiency index. Computation is based on estimation of coding DNA sequence elongation efficiency, taking into account three key factors: codon bias, average number of inverted complementary repeats, and free energy of potential stem-loop structures formed by the repeats. The influence of these factors on translation is numerically estimated. An optimal proportion of these factors is computed for each organism individually. Quantitative translational characteristics of 384 unicellular organisms (351 bacteria, 28 archaea, 5 eukaryota) have been computed using their annotated genomes from NCBI GenBank. Five potential evolutionary strategies of translational optimization have been determined among studied organisms. A considerable difference of preferred translational strategies between Bacteria and Archaea has been revealed. Significant correlations between elongation efficiency index and gene expression levels have been shown for two organisms (S. cerevisiae and H. pylori) using available microarray data. The proposed method allows to estimate numerically the coding DNA sequence translation efficiency and to optimize nucleotide composition of heterologous genes in unicellular organisms. http://www.mgs.bionet.nsc.ru/mgs/programs/eei-calculator/.
New Trends of Digital Data Storage in DNA
2016-01-01
With the exponential growth in the capacity of information generated and the emerging need for data to be stored for prolonged period of time, there emerges a need for a storage medium with high capacity, high storage density, and possibility to withstand extreme environmental conditions. DNA emerges as the prospective medium for data storage with its striking features. Diverse encoding models for reading and writing data onto DNA, codes for encrypting data which addresses issues of error generation, and approaches for developing codons and storage styles have been developed over the recent past. DNA has been identified as a potential medium for secret writing, which achieves the way towards DNA cryptography and stenography. DNA utilized as an organic memory device along with big data storage and analytics in DNA has paved the way towards DNA computing for solving computational problems. This paper critically analyzes the various methods used for encoding and encrypting data onto DNA while identifying the advantages and capability of every scheme to overcome the drawbacks identified priorly. Cryptography and stenography techniques have been analyzed in a critical approach while identifying the limitations of each method. This paper also identifies the advantages and limitations of DNA as a memory device and memory applications. PMID:27689089
New Trends of Digital Data Storage in DNA.
De Silva, Pavani Yashodha; Ganegoda, Gamage Upeksha
With the exponential growth in the capacity of information generated and the emerging need for data to be stored for prolonged period of time, there emerges a need for a storage medium with high capacity, high storage density, and possibility to withstand extreme environmental conditions. DNA emerges as the prospective medium for data storage with its striking features. Diverse encoding models for reading and writing data onto DNA, codes for encrypting data which addresses issues of error generation, and approaches for developing codons and storage styles have been developed over the recent past. DNA has been identified as a potential medium for secret writing, which achieves the way towards DNA cryptography and stenography. DNA utilized as an organic memory device along with big data storage and analytics in DNA has paved the way towards DNA computing for solving computational problems. This paper critically analyzes the various methods used for encoding and encrypting data onto DNA while identifying the advantages and capability of every scheme to overcome the drawbacks identified priorly. Cryptography and stenography techniques have been analyzed in a critical approach while identifying the limitations of each method. This paper also identifies the advantages and limitations of DNA as a memory device and memory applications.
1-deoxy-d-xylulose-5-phosphate reductoisomerases and method of use
Croteau, Rodney B.; Lange, Bernd M.
2001-01-01
The present invention relates to isolated DNA sequences which code for the expression of plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein, such as the sequence presented in SEQ ID NO:1 which encodes a 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein from peppermint (Mentha x piperita). Additionally, the present invention relates to isolated plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein. In other aspects, the present invention is directed to replicable recombinant cloning vehicles comprising a nucleic acid sequence which codes for a plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase, to modified host cells transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence of the invention.
1-deoxy-D-xylulose-5-phosphate reductoisomerases, and methods of use
Croteau, Rodney B.; Lange, Bernd M.
2002-07-16
The present invention relates to isolated DNA sequences which code for the expression of plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein, such as the sequence presented in SEQ ID NO:1 which encodes a 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein from peppermint (Mentha x piperita). Additionally, the present invention relates to isolated plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein. In other aspects, the present invention is directed to replicable recombinant cloning vehicles comprising a nucleic acid sequence which codes for a plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase, to modified host cells transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence of the invention.
Numerical classification of coding sequences
NASA Technical Reports Server (NTRS)
Collins, D. W.; Liu, C. C.; Jukes, T. H.
1992-01-01
DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.
Genomics dataset of unidentified disclosed isolates.
Rekadwad, Bhagwan N
2016-09-01
Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.
Xia, Hui; Li, Lingling; Yin, Zhouyang; Hou, Xiandeng; Zhu, Jun-Jie
2015-01-14
A dual signal amplification strategy for electrochemiluminescence (ECL) aptasensor was designed based on biobar-coded gold nanoparticles (Au NPs) and DNAzyme. CdSeTe@ZnS quantum dots (QDs) were chosen as the ECL signal probes. To verify the proposed ultrasensitive ECL aptasensor for biomolecules, we detected thrombin (Tb) as a proof-of-principle analyte. The hairpin DNA designed for the recognition of protein consists of two parts: the sequences of catalytical 8-17 DNAzyme and thrombin aptamer. Only in the presence of thrombin could the hairpin DNA be opened, followed by a recycling cleavage of excess substrates by catalytic core of the DNAzyme to induce the first-step amplification. One part of the fragments was captured to open the capture DNA modified on the Au electrode, which further connected with the prepared biobar-coded Au NPs-CdSeTe@ZnS QDs to get the final dual-amplified ECL signal. The limit of detection for Tb was 0.28 fM with excellent selectivity, and this proposed method possessed good performance in real sample analysis. This design introduces the new concept of dual-signal amplification by a biobar-coded system and DNAzyme recycling into ECL determination, and it is promising to be extended to provide a highly sensitive platform for various target biomolecules.
Domínguez, Carmen M; Ramos, Daniel; Mingorance, Jesús; Fierro, José L G; Tamayo, Javier; Calleja, Montserrat
2018-01-02
Carbapenem-resistant Enterobacteriaceae have recently become an important cause of morbidity and mortality due to healthcare-associated infections. Most commonly used diagnostic methods are incompatible with fast and accurate directed therapy. We report here the direct identification of the bla OXA48 gene, which codes for the carbapenemase OXA-48, in lysate samples from Klebsiella pneumoniae. The method is PCR-free and label-free. It is based on the measurement of changes in the stiffness of DNA self-assembled monolayers anchored to microcantilevers that occur as a consequence of the hybridization. The stiffness of the DNA layer is measured through changes of the sensor resonance frequency upon hybridization and at varying relative humidity.
Evaluation of vector-primed cDNA library production from microgram quantities of total RNA.
Kuo, Jonathan; Inman, Jason; Brownstein, Michael; Usdin, Ted B
2004-12-15
cDNA sequences are important for defining the coding region of genes, and full-length cDNA clones have proven to be useful for investigation of the function of gene products. We produced cDNA libraries containing 3.5-5 x 10(5) primary transformants, starting with 5 mug of total RNA prepared from mouse pituitary, adrenal, thymus, and pineal tissue, using a vector-primed cDNA synthesis method. Of approximately 1000 clones sequenced, approximately 20% contained the full open reading frames (ORFs) of known transcripts, based on the presence of the initiating methionine residue codon. The libraries were complex, with 94, 91, 83 and 55% of the clones from the thymus, adrenal, pineal and pituitary libraries, respectively, represented only once. Twenty-five full-length clones, not yet represented in the Mammalian Gene Collection, were identified. Thus, we have produced useful cDNA libraries for the isolation of full-length cDNA clones that are not yet available in the public domain, and demonstrated the utility of a simple method for making high-quality libraries from small amounts of starting material.
Is a Genome a Codeword of an Error-Correcting Code?
Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo
2012-01-01
Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495
Organizational heterogeneity of vertebrate genomes.
Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham
2012-01-01
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize; Zhao, Yun; Zhao, Hai
2017-01-01
Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela , Landoltia , Lemna , Wolffiella , and Wolffia . This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds.
Studying Functions of All Yeast Genes Simultaneously
NASA Technical Reports Server (NTRS)
Stolc, Viktor; Eason, Robert G.; Poumand, Nader; Herman, Zelek S.; Davis, Ronald W.; Anthony Kevin; Jejelowo, Olufisayo
2006-01-01
A method of studying the functions of all the genes of a given species of microorganism simultaneously has been developed in experiments on Saccharomyces cerevisiae (commonly known as baker's or brewer's yeast). It is already known that many yeast genes perform functions similar to those of corresponding human genes; therefore, by facilitating understanding of yeast genes, the method may ultimately also contribute to the knowledge needed to treat some diseases in humans. Because of the complexity of the method and the highly specialized nature of the underlying knowledge, it is possible to give only a brief and sketchy summary here. The method involves the use of unique synthetic deoxyribonucleic acid (DNA) sequences that are denoted as DNA bar codes because of their utility as molecular labels. The method also involves the disruption of gene functions through deletion of genes. Saccharomyces cerevisiae is a particularly powerful experimental system in that multiple deletion strains easily can be pooled for parallel growth assays. Individual deletion strains recently have been created for 5,918 open reading frames, representing nearly all of the estimated 6,000 genetic loci of Saccharomyces cerevisiae. Tagging of each deletion strain with one or two unique 20-nucleotide sequences enables identification of genes affected by specific growth conditions, without prior knowledge of gene functions. Hybridization of bar-code DNA to oligonucleotide arrays can be used to measure the growth rate of each strain over several cell-division generations. The growth rate thus measured serves as an index of the fitness of the strain.
Context influences on TALE–DNA binding revealed by quantitative profiling
Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.
2015-01-01
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805
Context influences on TALE-DNA binding revealed by quantitative profiling.
Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L
2015-06-11
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.
On fuzzy semantic similarity measure for DNA coding.
Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin
2016-02-01
A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Statistical approaches to account for false-positive errors in environmental DNA samples.
Lahoz-Monfort, José J; Guillera-Arroita, Gurutzeta; Tingley, Reid
2016-05-01
Environmental DNA (eDNA) sampling is prone to both false-positive and false-negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false-positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false-positive rates. We advocate alternative approaches to account for false-positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false-positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false-negative and false-positive errors, the methods presented here should be more routinely adopted in eDNA studies. © 2015 John Wiley & Sons Ltd.
The agents of natural genome editing.
Witzany, Guenther
2011-06-01
The DNA serves as a stable information storage medium and every protein which is needed by the cell is produced from this blueprint via an RNA intermediate code. More recently it was found that an abundance of various RNA elements cooperate in a variety of steps and substeps as regulatory and catalytic units with multiple competencies to act on RNA transcripts. Natural genome editing on one side is the competent agent-driven generation and integration of meaningful DNA nucleotide sequences into pre-existing genomic content arrangements, and the ability to (re-)combine and (re-)regulate them according to context-dependent (i.e. adaptational) purposes of the host organism. Natural genome editing on the other side designates the integration of all RNA activities acting on RNA transcripts without altering DNA-encoded genes. If we take the genetic code seriously as a natural code, there must be agents that are competent to act on this code because no natural code codes itself as no natural language speaks itself. As code editing agents, viral and subviral agents have been suggested because there are several indicators that demonstrate viruses competent in both RNA and DNA natural genome editing.
Barcoding of fresh water fishes from Pakistan.
Karim, Asma; Iqbal, Asad; Akhtar, Rehan; Rizwan, Muhammad; Amar, Ali; Qamar, Usman; Jahan, Shah
2016-07-01
DNA bar-coding is a taxonomic method that uses small genetic markers in organisms' mitochondrial DNA (mt DNA) for identification of particular species. It uses sequence diversity in a 658-base pair fragment near the 5' end of the mitochondrial cytochrome c oxidase subunit 1 (CO1) gene as a tool for species identification. DNA barcoding is more accurate and reliable method as compared with the morphological identification. It is equally useful in juveniles as well as adult stages of fishes. The present study was conducted to identify three farm fish species of Pakistan (Cyprinus carpio, Cirrhinus mrigala, and Ctenopharyngodon idella) genetically. All of them belonged to family cyprinidae. CO1 gene was amplified. PCR products were sequenced and analyzed by bioinformatic software. Conspecific, congenric, and confamilial k2P nucleotide divergence was estimated. From these findings, it was concluded that the gene sequence, CO1, may serve as milestone for the identification of related species at molecular level.
DNA-based watermarks using the DNA-Crypt algorithm.
Heider, Dominik; Barnekow, Angelika
2007-05-29
The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.
DNA-based watermarks using the DNA-Crypt algorithm
Heider, Dominik; Barnekow, Angelika
2007-01-01
Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434
Iyer, Lakshminarayan M; Abhiman, Saraswathi; Aravind, L
2008-10-04
Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases.
Iyer, Lakshminarayan M; Abhiman, Saraswathi; Aravind, L
2008-01-01
Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases. This article was reviewed by Eugene Koonin and Mark Ragan. PMID:18834537
Research on Image Encryption Based on DNA Sequence and Chaos Theory
NASA Astrophysics Data System (ADS)
Tian Zhang, Tian; Yan, Shan Jun; Gu, Cheng Yan; Ren, Ran; Liao, Kai Xin
2018-04-01
Nowadays encryption is a common technique to protect image data from unauthorized access. In recent years, many scientists have proposed various encryption algorithms based on DNA sequence to provide a new idea for the design of image encryption algorithm. Therefore, a new method of image encryption based on DNA computing technology is proposed in this paper, whose original image is encrypted by DNA coding and 1-D logistic chaotic mapping. First, the algorithm uses two modules as the encryption key. The first module uses the real DNA sequence, and the second module is made by one-dimensional logistic chaos mapping. Secondly, the algorithm uses DNA complementary rules to encode original image, and uses the key and DNA computing technology to compute each pixel value of the original image, so as to realize the encryption of the whole image. Simulation results show that the algorithm has good encryption effect and security.
Pan, Gaofeng; Jiang, Limin; Tang, Jijun; Guo, Fei
2018-02-08
DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k -gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.
Tian, Ji-Yuan; Sun, Xiu-Qin; Chen, Xi-Guang
2008-05-01
Oral delivery of plasmid DNA (pDNA) is a desirable approach for fish immunization in intensive culture. However, its effectiveness is limited because of possible degradation of pDNA in the fish's digestive system. In this report, alginate microspheres loaded with pDNA coding for fish lymphocystis disease virus (LCDV) and green fluorescent protein were prepared with a modified oil containing water (W/O) emulsification method. Yield, loading percent and encapsulation efficiency of alginate microspheres were 90.5%, 1.8% and 92.7%, respectively. The alginate microspheres had diameters of less than 10 microm, and their shape was spherical. As compared to sodium alginate, a remarkable increase of DNA-phosphodiester and DNA-phosphomonoester bonds was observed for alginate microspheres loaded with pDNA by Fourier transform infrared (FTIR) spectroscopic analysis. Agarose gel electrophoresis showed a little supercoiled pDNA was transformed to open circular and linear pDNA during encapsulation. The cumulative release of pDNA in alginate microspheres was
Applications of statistical physics and information theory to the analysis of DNA sequences
NASA Astrophysics Data System (ADS)
Grosse, Ivo
2000-10-01
DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.
Kress, W John; Erickson, David L
2007-06-06
A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
Shoura, Massa J; Gabdank, Idan; Hansen, Loren; Merker, Jason; Gotlib, Jason; Levene, Stephen D; Fire, Andrew Z
2017-10-05
Investigations aimed at defining the 3D configuration of eukaryotic chromosomes have consistently encountered an endogenous population of chromosome-derived circular genomic DNA, referred to as extrachromosomal circular DNA (eccDNA). While the production, distribution, and activities of eccDNAs remain understudied, eccDNA formation from specific regions of the linear genome has profound consequences on the regulatory and coding capabilities for these regions. Here, we define eccDNA distributions in Caenorhabditis elegans and in three human cell types, utilizing a set of DNA topology-dependent approaches for enrichment and characterization. The use of parallel biophysical, enzymatic, and informatic approaches provides a comprehensive profiling of eccDNA robust to isolation and analysis methodology. Results in human and nematode systems provide quantitative analysis of the eccDNA loci at both unique and repetitive regions. Our studies converge on and support a consistent picture, in which endogenous genomic DNA circles are present in normal physiological states, and in which the circles come from both coding and noncoding genomic regions. Prominent among the coding regions generating DNA circles are several genes known to produce a diversity of protein isoforms, with mucin proteins and titin as specific examples. Copyright © 2017 Shoura et al.
Genomics dataset on unclassified published organism (patent US 7547531).
Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier
2016-12-01
Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.
Wavelet analysis of frequency chaos game signal: a time-frequency signature of the C. elegans DNA.
Messaoudi, Imen; Oueslati, Afef Elloumi; Lachiri, Zied
2014-12-01
Challenging tasks are encountered in the field of bioinformatics. The choice of the genomic sequence's mapping technique is one the most fastidious tasks. It shows that a judicious choice would serve in examining periodic patterns distribution that concord with the underlying structure of genomes. Despite that, searching for a coding technique that can highlight all the information contained in the DNA has not yet attracted the attention it deserves. In this paper, we propose a new mapping technique based on the chaos game theory that we call the frequency chaos game signal (FCGS). The particularity of the FCGS coding resides in exploiting the statistical properties of the genomic sequence itself. This may reflect important structural and organizational features of DNA. To prove the usefulness of the FCGS approach in the detection of different local periodic patterns, we use the wavelet analysis because it provides access to information that can be obscured by other time-frequency methods such as the Fourier analysis. Thus, we apply the continuous wavelet transform (CWT) with the complex Morlet wavelet as a mother wavelet function. Scalograms that relate to the organism Caenorhabditis elegans (C. elegans) exhibit a multitude of periodic organization of specific DNA sequences.
Estimating haplotype frequencies by combining data from large DNA pools with database information.
Gasbarra, Dario; Kulathinal, Sangita; Pirinen, Matti; Sillanpää, Mikko J
2011-01-01
We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.
NASA Astrophysics Data System (ADS)
Xu, Kuipeng; Tang, Xianghai; Bi, Guiqi; Cao, Min; Wang, Lu; Mao, Yunxiang
2017-08-01
Pyropia species grow in the intertidal zone and are cold-water adapted. To date, most of the information about the whole plastid and mitochondrial genomes (ptDNA and mtDNA) of this genus is limited to Northern Hemisphere species. Here, we report the sequencing of the ptDNA and mtDNA of the Antarctic red alga Pyropia endiviifolia using the Illumina platform. The plastid genome (195 784 bp, 33.28% GC content) contains 210 protein-coding genes, 37 tRNA genes and 6 rRNA genes. The mitochondrial genome (34 603 bp, 30.5% GC content) contains 26 protein-coding genes, 25 tRNA genes and 2 rRNA genes. Our results suggest that the organellar genomes of Py. endiviifolia have a compact organization. Although the collinearity of these genomes is conserved compared with other Pyropia species, the genome sizes show significant differences, mainly because of the different copy numbers of rDNA operons in the ptDNA and group II introns in the mtDNA. The other Pyropia species have 2u20133 distinct intronic ORFs in their cox 1 genes, but Py. endiviifolia has no introns in its cox 1 gene. This has led to a smaller mtDNA than in other Pyropia species. The phylogenetic relationships within Pyropia were examined using concatenated gene sets from most of the available organellar genomes with both the maximum likelihood and Bayesian methods. The analysis revealed a sister taxa affiliation between the Antarctic species Py. endiviifolia and the North American species Py. kanakaensis.
Strand-specific transcriptome profiling with directly labeled RNA on genomic tiling microarrays
2011-01-01
Background With lower manufacturing cost, high spot density, and flexible probe design, genomic tiling microarrays are ideal for comprehensive transcriptome studies. Typically, transcriptome profiling using microarrays involves reverse transcription, which converts RNA to cDNA. The cDNA is then labeled and hybridized to the probes on the arrays, thus the RNA signals are detected indirectly. Reverse transcription is known to generate artifactual cDNA, in particular the synthesis of second-strand cDNA, leading to false discovery of antisense RNA. To address this issue, we have developed an effective method using RNA that is directly labeled, thus by-passing the cDNA generation. This paper describes this method and its application to the mapping of transcriptome profiles. Results RNA extracted from laboratory cultures of Porphyromonas gingivalis was fluorescently labeled with an alkylation reagent and hybridized directly to probes on genomic tiling microarrays specifically designed for this periodontal pathogen. The generated transcriptome profile was strand-specific and produced signals close to background level in most antisense regions of the genome. In contrast, high levels of signal were detected in the antisense regions when the hybridization was done with cDNA. Five antisense areas were tested with independent strand-specific RT-PCR and none to negligible amplification was detected, indicating that the strong antisense cDNA signals were experimental artifacts. Conclusions An efficient method was developed for mapping transcriptome profiles specific to both coding strands of a bacterial genome. This method chemically labels and uses extracted RNA directly in microarray hybridization. The generated transcriptome profile was free of cDNA artifactual signals. In addition, this method requires fewer processing steps and is potentially more sensitive in detecting small amount of RNA compared to conventional end-labeling methods due to the incorporation of more fluorescent molecules per RNA fragment. PMID:21235785
Guo, Y C; Wang, H; Wu, H P; Zhang, M Q
2015-12-21
Aimed to address the defects of the large mean square error (MSE), and the slow convergence speed in equalizing the multi-modulus signals of the constant modulus algorithm (CMA), a multi-modulus algorithm (MMA) based on global artificial fish swarm (GAFS) intelligent optimization of DNA encoding sequences (GAFS-DNA-MMA) was proposed. To improve the convergence rate and reduce the MSE, this proposed algorithm adopted an encoding method based on DNA nucleotide chains to provide a possible solution to the problem. Furthermore, the GAFS algorithm, with its fast convergence and global search ability, was used to find the best sequence. The real and imaginary parts of the initial optimal weight vector of MMA were obtained through DNA coding of the best sequence. The simulation results show that the proposed algorithm has a faster convergence speed and smaller MSE in comparison with the CMA, the MMA, and the AFS-DNA-MMA.
Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang
2014-01-01
Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box–like motif (CPGDMM1, “TATANNNATNA”), and an unknown motif (CPGDMM2 “WNYANTGAW”). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome. PMID:24914614
Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang
2014-01-01
Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box-like motif (CPGDMM1, "TATANNNATNA"), and an unknown motif (CPGDMM2 "WNYANTGAW"). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome.
Visualization of yeast chromosomal DNA
NASA Technical Reports Server (NTRS)
Lubega, Seth
1990-01-01
The DNA molecule is the most significant life molecule since it codes the blue print for other structural and functional molecules of all living organisms. Agarose gel electrophoresis is now being widely used to separate DNA of virus, bacteria, and lower eukaryotes. The task was undertaken of reviewing the existing methods of DNA fractionation and microscopic visualization of individual chromosonal DNA molecules by gel electrophoresis as a basis for a proposed study to investigate the feasibility of separating DNA molecules in free fluids as an alternative to gel electrophoresis. Various techniques were studied. On the molecular level, agarose gel electrophoresis is being widely used to separate chromosomal DNA according to molecular weight. Carl and Olson separate and characterized the entire karyotype of a lab strain of Saccharomyces cerevisiae. Smith et al. and Schwartz and Koval independently reported the visualization of individual DNA molecules migrating through agarose gel matrix during electrophoresis. The techniques used by these researchers are being reviewed in the lab as a basis for the proposed studies.
Kress, W. John; Erickson, David L.
2007-01-01
Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588
BioPartsBuilder: a synthetic biology tool for combinatorial assembly of biological parts.
Yang, Kun; Stracquadanio, Giovanni; Luo, Jingchuan; Boeke, Jef D; Bader, Joel S
2016-03-15
Combinatorial assembly of DNA elements is an efficient method for building large-scale synthetic pathways from standardized, reusable components. These methods are particularly useful because they enable assembly of multiple DNA fragments in one reaction, at the cost of requiring that each fragment satisfies design constraints. We developed BioPartsBuilder as a biologist-friendly web tool to design biological parts that are compatible with DNA combinatorial assembly methods, such as Golden Gate and related methods. It retrieves biological sequences, enforces compliance with assembly design standards and provides a fabrication plan for each fragment. BioPartsBuilder is accessible at http://public.biopartsbuilder.org and an Amazon Web Services image is available from the AWS Market Place (AMI ID: ami-508acf38). Source code is released under the MIT license, and available for download at https://github.com/baderzone/biopartsbuilder joel.bader@jhu.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Brain cDNA clone for human cholinesterase
DOE Office of Scientific and Technical Information (OSTI.GOV)
McTiernan, C.; Adkins, S.; Chatonnet, A.
1987-10-01
A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
GUI to Facilitate Research on Biological Damage from Radiation
NASA Technical Reports Server (NTRS)
Cucinotta, Frances A.; Ponomarev, Artem Lvovich
2010-01-01
A graphical-user-interface (GUI) computer program has been developed to facilitate research on the damage caused by highly energetic particles and photons impinging on living organisms. The program brings together, into one computational workspace, computer codes that have been developed over the years, plus codes that will be developed during the foreseeable future, to address diverse aspects of radiation damage. These include codes that implement radiation-track models, codes for biophysical models of breakage of deoxyribonucleic acid (DNA) by radiation, pattern-recognition programs for extracting quantitative information from biological assays, and image-processing programs that aid visualization of DNA breaks. The radiation-track models are based on transport models of interactions of radiation with matter and solution of the Boltzmann transport equation by use of both theoretical and numerical models. The biophysical models of breakage of DNA by radiation include biopolymer coarse-grained and atomistic models of DNA, stochastic- process models of deposition of energy, and Markov-based probabilistic models of placement of double-strand breaks in DNA. The program is designed for use in the NT, 95, 98, 2000, ME, and XP variants of the Windows operating system.
Identification of three novel NHS mutations in families with Nance-Horan syndrome
Wu, Junhua; Brooks, Simon P.; Hardcastle, Alison J.; Lewis, Richard Alan; Stambolian, Dwight
2007-01-01
Purpose Nance-Horan Syndrome (NHS) is an infrequent and often overlooked X-linked disorder characterized by dense congenital cataracts, microphthalmia, and dental abnormalities. The syndrome is caused by mutations in the NHS gene, whose function is not known. The purpose of this study was to identify the frequency and distribution of NHS gene mutations and compare genotype with Nance-Horan phenotype in five North American NHS families. Methods Genomic DNA was isolated from white blood cells from NHS patients and family members. The NHS gene coding region and its splice site donor and acceptor regions were amplified from genomic DNA by PCR, and the amplicons were sequenced directly. Results We identified three unique NHS coding region mutations in these NHS families. Conclusions This report extends the number of unique identified NHS mutations to 14. PMID:17417607
Henrich, Oliver; Gutiérrez Fosado, Yair Augusto; Curk, Tine; Ouldridge, Thomas E
2018-05-10
During the last decade coarse-grained nucleotide models have emerged that allow us to study DNA and RNA on unprecedented time and length scales. Among them is oxDNA, a coarse-grained, sequence-specific model that captures the hybridisation transition of DNA and many structural properties of single- and double-stranded DNA. oxDNA was previously only available as standalone software, but has now been implemented into the popular LAMMPS molecular dynamics code. This article describes the new implementation and analyses its parallel performance. Practical applications are presented that focus on single-stranded DNA, an area of research which has been so far under-investigated. The LAMMPS implementation of oxDNA lowers the entry barrier for using the oxDNA model significantly, facilitates future code development and interfacing with existing LAMMPS functionality as well as other coarse-grained and atomistic DNA models.
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.
Mehmood, Tahir; Bohlin, Jon; Snipen, Lars
2015-01-01
The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E
2016-01-04
Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Novel variants of the 5S rRNA genes in Eruca sativa.
Singh, K; Bhatia, S; Lakshmikumaran, M
1994-02-01
The 5S ribosomal RNA (rRNA) genes of Eruca sativa were cloned and characterized. They are organized into clusters of tandemly repeated units. Each repeat unit consists of a 119-bp coding region followed by a noncoding spacer region that separates it from the coding region of the next repeat unit. Our study reports novel gene variants of the 5S rRNA genes in plants. Two families of the 5S rDNA, the 0.5-kb size family and the 1-kb size family, coexist in the E. sativa genome. The 0.5-kb size family consists of the 5S rRNA genes (S4) that have coding regions similar to those of other reported plant 5S rDNA sequences, whereas the 1-kb size family consists of the 5S rRNA gene variants (S1) that exist as 1-kb BamHI tandem repeats. S1 is made up of two variant units (V1 and V2) of 5S rDNA where the BamHI site between the two units is mutated. Sequence heterogeneity among S4, V1, and V2 units exists throughout the sequence and is not limited to the noncoding spacer region only. The coding regions of V1 and V2 show approximately 20% dissimilarity to the coding regions of S4 and other reported plant 5S rDNA sequences. Such a large variation in the coding regions of the 5S rDNA units within the same plant species has been observed for the first time. Restriction site variation is observed between the two size classes of 5S rDNA in E. sativa.(ABSTRACT TRUNCATED AT 250 WORDS)
Plappert-Helbig, Ulla; Junker-Walker, Ursula; Martus, Hans-Joerg
2015-07-01
As a part of the Japanese Center for the Validation of Alternative Methods (JaCVAM)-initiative international validation study of the in vivo rat alkaline comet assay (comet assay), we examined methyl methanesulfonate, 2,6-diaminotoluene, and 5-fluorouracil under coded test conditions. Rats were treated orally with the maximum tolerated dose (MTD) and two additional descending doses of the respective compounds. In the MMS treated groups liver and stomach showed significantly elevated DNA damage at each dose level and a significant dose-response relationship. 2,6-diaminotoluene induced significantly elevated DNA damage in the liver at each dose and a statistically significant dose-response relationship whereas no DNA damage was obtained in the stomach. 5-fluorouracil did not induce DNA damage in either liver or stomach. Copyright © 2015 Elsevier B.V. All rights reserved.
Croteau, Rodney Bruce; Crock, John E.
2005-01-25
A cDNA encoding (E)-.beta.-farnesene synthase from peppermint (Mentha piperita) has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Accordingly, an isolated DNA sequence (SEQ ID NO:1) is provided which codes for the expression of (E)-.beta.-farnesene synthase (SEQ ID NO:2), from peppermint (Mentha piperita). In other aspects, replicable recombinant cloning vehicles are provided which code for (E)-.beta.-farnesene synthase, or for a base sequence sufficiently complementary to at least a portion of (E)-.beta.-farnesene synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding (E)-.beta.-farnesene synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant (E)-.beta.-famesene synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant (E)-.beta.-farnesene synthase may be used to obtain expression or enhanced expression of (E)-.beta.-famesene synthase in plants in order to enhance the production of (E)-.beta.-farnesene, or may be otherwise employed for the regulation or expression of (E)-.beta.-farnesene synthase, or the production of its product.
Ultra-low background DNA cloning system.
Goto, Kenta; Nagano, Yukio
2013-01-01
Yeast-based in vivo cloning is useful for cloning DNA fragments into plasmid vectors and is based on the ability of yeast to recombine the DNA fragments by homologous recombination. Although this method is efficient, it produces some by-products. We have developed an "ultra-low background DNA cloning system" on the basis of yeast-based in vivo cloning, by almost completely eliminating the generation of by-products and applying the method to commonly used Escherichia coli vectors, particularly those lacking yeast replication origins and carrying an ampicillin resistance gene (Amp(r)). First, we constructed a conversion cassette containing the DNA sequences in the following order: an Amp(r) 5' UTR (untranslated region) and coding region, an autonomous replication sequence and a centromere sequence from yeast, a TRP1 yeast selectable marker, and an Amp(r) 3' UTR. This cassette allowed conversion of the Amp(r)-containing vector into the yeast/E. coli shuttle vector through use of the Amp(r) sequence by homologous recombination. Furthermore, simultaneous transformation of the desired DNA fragment into yeast allowed cloning of this DNA fragment into the same vector. We rescued the plasmid vectors from all yeast transformants, and by-products containing the E. coli replication origin disappeared. Next, the rescued vectors were transformed into E. coli and the by-products containing the yeast replication origin disappeared. Thus, our method used yeast- and E. coli-specific "origins of replication" to eliminate the generation of by-products. Finally, we successfully cloned the DNA fragment into the vector with almost 100% efficiency.
Multiple Site-Directed and Saturation Mutagenesis by the Patch Cloning Method.
Taniguchi, Naohiro; Murakami, Hiroshi
2017-01-01
Constructing protein-coding genes with desired mutations is a basic step for protein engineering. Herein, we describe a multiple site-directed and saturation mutagenesis method, termed MUPAC. This method has been used to introduce multiple site-directed mutations in the green fluorescent protein gene and in the moloney murine leukemia virus reverse transcriptase gene. Moreover, this method was also successfully used to introduce randomized codons at five desired positions in the green fluorescent protein gene, and for simple DNA assembly for cloning.
Colour-barcoded magnetic microparticles for multiplexed bioassays.
Lee, Howon; Kim, Junhoi; Kim, Hyoki; Kim, Jiyun; Kwon, Sunghoon
2010-09-01
Encoded particles have a demonstrated value for multiplexed high-throughput bioassays such as drug discovery and clinical diagnostics. In diverse samples, the ability to use a large number of distinct identification codes on assay particles is important to increase throughput. Proper handling schemes are also needed to readout these codes on free-floating probe microparticles. Here we create vivid, free-floating structural coloured particles with multi-axis rotational control using a colour-tunable magnetic material and a new printing method. Our colour-barcoded magnetic microparticles offer a coding capacity easily into the billions with distinct magnetic handling capabilities including active positioning for code readouts and active stirring for improved reaction kinetics in microscale environments. A DNA hybridization assay is done using the colour-barcoded magnetic microparticles to demonstrate multiplexing capabilities.
Imaging The Genetic Code of a Virus
NASA Astrophysics Data System (ADS)
Graham, Jenna; Link, Justin
2013-03-01
Atomic Force Microscopy (AFM) has allowed scientists to explore physical characteristics of nano-scale materials. However, the challenges that come with such an investigation are rarely expressed. In this research project a method was developed to image the well-studied DNA of the virus lambda phage. Through testing and integrating several sample preparations described in literature, a quality image of lambda phage DNA can be obtained. In our experiment, we developed a technique using the Veeco Autoprobe CP AFM and mica substrate with an appropriate absorption buffer of HEPES and NiCl2. This presentation will focus on the development of a procedure to image lambda phage DNA at Xavier University. The John A. Hauck Foundation and Xavier University
DNABIT Compress - Genome compression algorithm.
Rajarajeswari, Pothuraju; Apparao, Allam
2011-01-22
Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits for smaller segments of DNA bases to compress both repetitive and non repetitive DNA sequence. Our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. Significantly better compression results show that "DNABIT Compress" algorithm is the best among the remaining compression algorithms. While achieving the best compression ratios for DNA sequences (Genomes),our new DNABIT Compress algorithm significantly improves the running time of all previous DNA compression programs. Assigning binary bits (Unique BIT CODE) for (Exact Repeats, Reverse Repeats) fragments of DNA sequence is also a unique concept introduced in this algorithm for the first time in DNA compression. This proposed new algorithm could achieve the best compression ratio as much as 1.58 bits/bases where the existing best methods could not achieve a ratio less than 1.72 bits/bases.
Liberek, K; Osipiuk, J; Zylicz, M; Ang, D; Skorko, J; Georgopoulos, C
1990-02-25
The process of initiation of lambda DNA replication requires the assembly of the proper nucleoprotein complex at the origin of replication, ori lambda. The complex is composed of both phage and host-coded proteins. The lambda O initiator protein binds specifically to ori lambda. The lambda P initiator protein binds to both lambda O and the host-coded dnaB helicase, giving rise to an ori lambda DNA.lambda O.lambda P.dnaB structure. The dnaK and dnaJ heat shock proteins have been shown capable of dissociating this complex. The thus freed dnaB helicase unwinds the duplex DNA template at the replication fork. In this report, through cross-linking, size chromatography, and protein affinity chromatography, we document some of the protein-protein interactions occurring at ori lambda. Our results show that the dnaK protein specifically interacts with both lambda O and lambda P, and that the dnaJ protein specifically interacts with the dnaB helicase.
Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A
2015-01-01
It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.
Scaling features of noncoding DNA
NASA Technical Reports Server (NTRS)
Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.
1999-01-01
We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.
Privacy rules for DNA databanks. Protecting coded 'future diaries'.
Annas, G J
1993-11-17
In privacy terms, genetic information is like medical information. But the information contained in the DNA molecule itself is more sensitive because it contains an individual's probabilistic "future diary," is written in a code that has only partially been broken, and contains information about an individual's parents, siblings, and children. Current rules for protecting the privacy of medical information cannot protect either genetic information or identifiable DNA samples stored in DNA databanks. A review of the legal and public policy rationales for protecting genetic privacy suggests that specific enforceable privacy rules for DNA databanks are needed. Four preliminary rules are proposed to govern the creation of DNA databanks, the collection of DNA samples for storage, limits on the use of information derived from the samples, and continuing obligations to those whose DNA samples are in the databanks.
Samuels, David C.; Boys, Richard J.; Henderson, Daniel A.; Chinnery, Patrick F.
2003-01-01
We applied a hidden Markov model segmentation method to the human mitochondrial genome to identify patterns in the sequence, to compare these patterns to the gene structure of mtDNA and to see whether these patterns reveal additional characteristics important for our understanding of genome evolution, structure and function. Our analysis identified three segmentation categories based upon the sequence transition probabilities. Category 2 segments corresponded to the tRNA and rRNA genes, with a greater strand-symmetry in these segments. Category 1 and 3 segments covered the protein- coding genes and almost all of the non-coding D-loop. Compared to category 1, the mtDNA segments assigned to category 3 had much lower guanine abundance. A comparison to two independent databases of mitochondrial mutations and polymorphisms showed that the high substitution rate of guanine in human mtDNA is largest in the category 3 segments. Analysis of synonymous mutations showed the same pattern. This suggests that this heterogeneity in the mutation rate is partly independent of respiratory chain function and is a direct property of the genome sequence itself. This has important implications for our understanding of mtDNA evolution and its use as a ‘molecular clock’ to determine the rate of population and species divergence. PMID:14530452
snpAD: An ancient DNA genotype caller.
Prüfer, Kay
2018-06-21
The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.
Track structure in radiation biology: theory and applications.
Nikjoo, H; Uehara, S; Wilson, W E; Hoshi, M; Goodhead, D T
1998-04-01
A brief review is presented of the basic concepts in track structure and the relative merit of various theoretical approaches adopted in Monte-Carlo track-structure codes are examined. In the second part of the paper, a formal cluster analysis is introduced to calculate cluster-distance distributions. Total experimental ionization cross-sections were least-square fitted and compared with the calculation by various theoretical methods. Monte-Carlo track-structure code Kurbuc was used to examine and compare the spectrum of the secondary electrons generated by using functions given by Born-Bethe, Jain-Khare, Gryzinsky, Kim-Rudd, Mott and Vriens' theories. The cluster analysis in track structure was carried out using the k-means method and Hartigan algorithm. Data are presented on experimental and calculated total ionization cross-sections: inverse mean free path (IMFP) as a function of electron energy used in Monte-Carlo track-structure codes; the spectrum of secondary electrons generated by different functions for 500 eV primary electrons; cluster analysis for 4 MeV and 20 MeV alpha-particles in terms of the frequency of total cluster energy to the root-mean-square (rms) radius of the cluster and differential distance distributions for a pair of clusters; and finally relative frequency distribution for energy deposited in DNA, single-strand break and double-strand breaks for 10MeV/u protons, alpha-particles and carbon ions. There are a number of Monte-Carlo track-structure codes that have been developed independently and the bench-marking presented in this paper allows a better choice of the theoretical method adopted in a track-structure code to be made. A systematic bench-marking of cross-sections and spectra of the secondary electrons shows differences between the codes at atomic level, but such differences are not significant in biophysical modelling at the macromolecular level. Clustered-damage evaluation shows: that a substantial proportion of dose ( 30%) is deposited by low-energy electrons; the majority of DNA damage lesions are of simple type; the complexity of damage increases with increased LET, while the total yield of strand breaks remains constant; and at high LET values nearly 70% of all double-strand breaks are of complex type.
NASA Astrophysics Data System (ADS)
Vargas, E. L.; Rivas, D. A.; Duot, A. C.; Hovey, R. T.; Andrianarijaona, V. M.
2015-03-01
DNA replication is the basis for all biological reproduction. A strand of DNA will ``unzip'' and bind with a complimentary strand, creating two identical strands. In this study, we are considering how this process is affected by Interatomic Coulombic Decay (ICD), specifically how ICD affects the individual coding proteins' ability to hold together. ICD mainly deals with how the electron returns to its original state after excitation and how this affects its immediate atomic environment, sometimes affecting the connectivity between interaction sites on proteins involved in the DNA coding process. Biological heredity is fundamentally controlled by DNA and its replication therefore it affects every living thing. The small nature of the proteins (within the range of nanometers) makes it a good candidate for research of this scale. Understanding how ICD affects DNA molecules can give us invaluable insight into the human genetic code and the processes behind cell mutations that can lead to cancer. Authors wish to give special thanks to Pacific Union College Student Senate in Angwin, California, for their financial support.
RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China
Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang
2013-01-01
Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571
Transcription and DNA Damage: Holding Hands or Crossing Swords?
D'Alessandro, Giuseppina; d'Adda di Fagagna, Fabrizio
2017-10-27
Transcription has classically been considered a potential threat to genome integrity. Collision between transcription and DNA replication machinery, and retention of DNA:RNA hybrids, may result in genome instability. On the other hand, it has been proposed that active genes repair faster and preferentially via homologous recombination. Moreover, while canonical transcription is inhibited in the proximity of DNA double-strand breaks, a growing body of evidence supports active non-canonical transcription at DNA damage sites. Small non-coding RNAs accumulate at DNA double-strand break sites in mammals and other organisms, and are involved in DNA damage signaling and repair. Furthermore, RNA binding proteins are recruited to DNA damage sites and participate in the DNA damage response. Here, we discuss the impact of transcription on genome stability, the role of RNA binding proteins at DNA damage sites, and the function of small non-coding RNAs generated upon damage in the signaling and repair of DNA lesions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Lagoumintzis, George; Poulas, Konstantinos
2017-01-01
During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors.
Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Poulas, Konstantinos
2017-01-01
During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors. PMID:29091919
Seligmann, Hervé
2013-05-07
GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges. Copyright © 2013 Elsevier Ltd. All rights reserved.
Transcriptional mapping of the ribosomal RNA region of mouse L-cell mitochondrial DNA.
Nagley, P; Clayton, D A
1980-01-01
The map positions in mouse mitochondrial DNA of the two ribosomal RNA genes and adjacent genes coding several small transcripts have been determined precisely by application of a procedure in which DNA-RNA hybrids have been subjected to digestion by S1 nuclease under conditions of varying severity. Digestion of the DNA-RNA hybrids with S1 nuclease yielded a series of species which were shown to contain ribosomal RNA molecules together with adjacent transcripts hybridized conjointly to a continuous segment of mitochondrial DNA. There is one small transcript about 60 bases long whose gene adjoins the sequences coding the 5'-end of the small ribosomal RNA (950 bases) and which lies approximately 200 nucleotides from the D-loop origin of heavy strand mitochondrial DNA synthesis. An 80-base transcript lies between the small and large ribosomal RNA genes, and genes for two further short transcript (each about 80 bases in length) abut the sequences coding the 3'-end of the large ribosomal RNA (approximately 1500 bases). The ability to isolate a discrete DNA-RNA hybrid species approximately 2700 base pairs in length containing all these transcripts suggests that there can be few nucleotides in this region of mouse mitochondrial DNA which are not represented as stable RNA species. Images PMID:6253898
Statistical and linguistic features of DNA sequences
NASA Technical Reports Server (NTRS)
Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
Gene and genon concept: coding versus regulation
2007-01-01
We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
Tramontano, A; Macchiato, M F
1986-01-01
An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761
GRID-seq reveals the global RNA-chromatin interactome
Li, Xiao; Zhou, Bing; Chen, Liang; Gou, Lan-Tao; Li, Hairi; Fu, Xiang-Dong
2017-01-01
Higher eukaryotic genomes are bound by a large number of coding and non-coding RNAs, but approaches to comprehensively map the identity and binding sites of these RNAs are lacking. Here we report a method to in situ capture global RNA interactions with DNA by deep sequencing (GRID-seq), which enables the comprehensive identification of the entire repertoire of chromatin-interacting RNAs and their respective binding sites. In human, mouse and Drosophila cells, we detected a large set of tissue-specific coding and non-coding RNAs that are bound to active promoters and enhancers, especially super-enhancers. Assuming that most mRNA-chromatin interactions indicate the physical proximity of a promoter and an enhancer, we constructed a three-dimensional global connectivity map of promoters and enhancers, revealing transcription activity-linked genomic interactions in the nucleus. PMID:28922346
Improved detection of DNA-binding proteins via compression technology on PSSM information.
Wang, Yubo; Ding, Yijie; Guo, Fei; Wei, Leyi; Tang, Jijun
2017-01-01
Since the importance of DNA-binding proteins in multiple biomolecular functions has been recognized, an increasing number of researchers are attempting to identify DNA-binding proteins. In recent years, the machine learning methods have become more and more compelling in the case of protein sequence data soaring, because of their favorable speed and accuracy. In this paper, we extract three features from the protein sequence, namely NMBAC (Normalized Moreau-Broto Autocorrelation), PSSM-DWT (Position-specific scoring matrix-Discrete Wavelet Transform), and PSSM-DCT (Position-specific scoring matrix-Discrete Cosine Transform). We also employ feature selection algorithm on these feature vectors. Then, these features are fed into the training SVM (support vector machine) model as classifier to predict DNA-binding proteins. Our method applys three datasets, namely PDB1075, PDB594 and PDB186, to evaluate the performance of our approach. The PDB1075 and PDB594 datasets are employed for Jackknife test and the PDB186 dataset is used for the independent test. Our method achieves the best accuracy in the Jacknife test, from 79.20% to 86.23% and 80.5% to 86.20% on PDB1075 and PDB594 datasets, respectively. In the independent test, the accuracy of our method comes to 76.3%. The performance of independent test also shows that our method has a certain ability to be effectively used for DNA-binding protein prediction. The data and source code are at https://doi.org/10.6084/m9.figshare.5104084.
Evolutional dynamics of 45S and 5S ribosomal DNA in ancient allohexaploid Atropa belladonna.
Volkov, Roman A; Panchuk, Irina I; Borisjuk, Nikolai V; Hosiawa-Baranska, Marta; Maluszynska, Jolanta; Hemleben, Vera
2017-01-23
Polyploid hybrids represent a rich natural resource to study molecular evolution of plant genes and genomes. Here, we applied a combination of karyological and molecular methods to investigate chromosomal structure, molecular organization and evolution of ribosomal DNA (rDNA) in nightshade, Atropa belladonna (fam. Solanaceae), one of the oldest known allohexaploids among flowering plants. Because of their abundance and specific molecular organization (evolutionarily conserved coding regions linked to variable intergenic spacers, IGS), 45S and 5S rDNA are widely used in plant taxonomic and evolutionary studies. Molecular cloning and nucleotide sequencing of A. belladonna 45S rDNA repeats revealed a general structure characteristic of other Solanaceae species, and a very high sequence similarity of two length variants, with the only difference in number of short IGS subrepeats. These results combined with the detection of three pairs of 45S rDNA loci on separate chromosomes, presumably inherited from both tetraploid and diploid ancestor species, example intensive sequence homogenization that led to substitution/elimination of rDNA repeats of one parent. Chromosome silver-staining revealed that only four out of six 45S rDNA sites are frequently transcriptionally active, demonstrating nucleolar dominance. For 5S rDNA, three size variants of repeats were detected, with the major class represented by repeats containing all functional IGS elements required for transcription, the intermediate size repeats containing partially deleted IGS sequences, and the short 5S repeats containing severe defects both in the IGS and coding sequences. While shorter variants demonstrate increased rate of based substitution, probably in their transition into pseudogenes, the functional 5S rDNA variants are nearly identical at the sequence level, pointing to their origin from a single parental species. Localization of the 5S rDNA genes on two chromosome pairs further supports uniparental inheritance from the tetraploid progenitor. The obtained molecular, cytogenetic and phylogenetic data demonstrate complex evolutionary dynamics of rDNA loci in allohexaploid species of Atropa belladonna. The high level of sequence unification revealed in 45S and 5S rDNA loci of this ancient hybrid species have been seemingly achieved by different molecular mechanisms.
Exploring the read-write genome: mobile DNA and mammalian adaptation.
Shapiro, James A
2017-02-01
The read-write genome idea predicts that mobile DNA elements will act in evolution to generate adaptive changes in organismal DNA. This prediction was examined in the context of mammalian adaptations involving regulatory non-coding RNAs, viviparous reproduction, early embryonic and stem cell development, the nervous system, and innate immunity. The evidence shows that mobile elements have played specific and sometimes major roles in mammalian adaptive evolution by generating regulatory sites in the DNA and providing interaction motifs in non-coding RNA. Endogenous retroviruses and retrotransposons have been the predominant mobile elements in mammalian adaptive evolution, with the notable exception of bats, where DNA transposons are the major agents of RW genome inscriptions. A few examples of independent but convergent exaptation of mobile DNA elements for similar regulatory rewiring functions are noted.
Kazakoff, Stephen H.; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T.; Gresshoff, Peter M.
2012-01-01
Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® ‘Second Generation DNA Sequencing (2GS)’ and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites. PMID:23272141
Kazakoff, Stephen H; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T; Gresshoff, Peter M
2012-01-01
Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® 'Second Generation DNA Sequencing (2GS)' and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites.
Geranyl diphosphate synthase from mint
Croteau, Rodney Bruce; Wildung, Mark Raymond; Burke, Charles Cullen; Gershenzon, Jonathan
1999-01-01
A cDNA encoding geranyl diphosphate synthase from peppermint has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Accordingly, an isolated DNA sequence (SEQ ID No:1) is provided which codes for the expression of geranyl diphosphate synthase (SEQ ID No:2) from peppermint (Mentha piperita). In other aspects, replicable recombinant cloning vehicles are provided which code for geranyl diphosphate synthase or for a base sequence sufficiently complementary to at least a portion of the geranyl diphosphate synthase DNA or RNA to enable hybridization therewith (e.g., antisense geranyl diphosphate synthase RNA or fragments of complementary geranyl diphosphate synthase DNA which are useful as polymerase chain reaction primers or as probes for geranyl diphosphate synthase or related genes). In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding geranyl diphosphate synthase. Thus, systems and methods are provided for the recombinant expression of geranyl diphosphate synthase that may be used to facilitate the production, isolation and purification of significant quantities of recombinant geranyl diphosphate synthase for subsequent use, to obtain expression or enhanced expression of geranyl diphosphate synthase in plants in order to enhance the production of monoterpenoids, to produce geranyl diphosphate in cancerous cells as a precursor to monoterpenoids having anti-cancer properties or may be otherwise employed for the regulation or expression of geranyl diphosphate synthase or the production of geranyl diphosphate.
Geranyl diphosphate synthase from mint
Croteau, R.B.; Wildung, M.R.; Burke, C.C.; Gershenzon, J.
1999-03-02
A cDNA encoding geranyl diphosphate synthase from peppermint has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Accordingly, an isolated DNA sequence (SEQ ID No:1) is provided which codes for the expression of geranyl diphosphate synthase (SEQ ID No:2) from peppermint (Mentha piperita). In other aspects, replicable recombinant cloning vehicles are provided which code for geranyl diphosphate synthase or for a base sequence sufficiently complementary to at least a portion of the geranyl diphosphate synthase DNA or RNA to enable hybridization therewith (e.g., antisense geranyl diphosphate synthase RNA or fragments of complementary geranyl diphosphate synthase DNA which are useful as polymerase chain reaction primers or as probes for geranyl diphosphate synthase or related genes). In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding geranyl diphosphate synthase. Thus, systems and methods are provided for the recombinant expression of geranyl diphosphate synthase that may be used to facilitate the production, isolation and purification of significant quantities of recombinant geranyl diphosphate synthase for subsequent use, to obtain expression or enhanced expression of geranyl diphosphate synthase in plants in order to enhance the production of monoterpenoids, to produce geranyl diphosphate in cancerous cells as a precursor to monoterpenoids having anti-cancer properties or may be otherwise employed for the regulation or expression of geranyl diphosphate synthase or the production of geranyl diphosphate. 5 figs.
Geranyl diphosphate synthase large subunit, and methods of use
Croteau, Rodney B.; Burke, Charles C.; Wildung, Mark R.
2001-10-16
A cDNA encoding geranyl diphosphate synthase large subunit from peppermint has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Replicable recombinant cloning vehicles are provided which code for geranyl diphosphate synthase large subunit). In another aspect, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding geranyl diphosphate synthase large subunit. In yet another aspect, the present invention provides isolated, recombinant geranyl diphosphate synthase protein comprising an isolated, recombinant geranyl diphosphate synthase large subunit protein and an isolated, recombinant geranyl diphosphate synthase small subunit protein. Thus, systems and methods are provided for the recombinant expression of geranyl diphosphate synthase.
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
[Long non-coding RNAs in plants].
Xiaoqing, Huang; Dandan, Li; Juan, Wu
2015-04-01
Long non-coding RNAs (lncRNAs), which are longer than 200 nucleotides in length, widely exist in organisms and function in a variety of biological processes. Currently, most of lncRNAs found in plants are transcribed by RNA polymerase Ⅱ and mediate gene expression through multiple mechanisms, such as target mimicry, transcription interference, histone methylation and DNA methylation, and play important roles in flowering, male sterility, nutrition metabolism, biotic and abiotic stress and other biological processes as regulators in plants. In this review, we summarize the databases, prediction methods, and possible functions of plant lncRNAs discovered in recent years.
G = MAT: linking transcription factor expression and DNA binding data.
Tretyakov, Konstantin; Laur, Sven; Vilo, Jaak
2011-01-31
Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/.
G = MAT: Linking Transcription Factor Expression and DNA Binding Data
Tretyakov, Konstantin; Laur, Sven; Vilo, Jaak
2011-01-01
Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/. PMID:21297945
DNABIT Compress – Genome compression algorithm
Rajarajeswari, Pothuraju; Apparao, Allam
2011-01-01
Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, “DNABIT Compress” for DNA sequences based on a novel algorithm of assigning binary bits for smaller segments of DNA bases to compress both repetitive and non repetitive DNA sequence. Our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. Significantly better compression results show that “DNABIT Compress” algorithm is the best among the remaining compression algorithms. While achieving the best compression ratios for DNA sequences (Genomes),our new DNABIT Compress algorithm significantly improves the running time of all previous DNA compression programs. Assigning binary bits (Unique BIT CODE) for (Exact Repeats, Reverse Repeats) fragments of DNA sequence is also a unique concept introduced in this algorithm for the first time in DNA compression. This proposed new algorithm could achieve the best compression ratio as much as 1.58 bits/bases where the existing best methods could not achieve a ratio less than 1.72 bits/bases. PMID:21383923
USDA-ARS?s Scientific Manuscript database
Single-nucleotide Polymorphism (SNP) markers are by far the most common form of DNA polymorphism in a genome. The objectives of this study were to discover SNPs in common bean comparing sequences from coding and non-coding regions obtained from Genbank and genomic DNA and to compare sequencing resu...
Szabóová, Dana; Bielik, Peter; Poláková, Silvia; Šoltys, Katarína; Jatzová, Katarína; Szemes, Tomáš
2017-01-01
Abstract The yeast Saccharomyces are widely used to test ecological and evolutionary hypotheses. A large number of nuclear genomic DNA sequences are available, but mitochondrial genomic data are insufficient. We completed mitochondrial DNA (mtDNA) sequencing from Illumina MiSeq reads for all Saccharomyces species. All are circularly mapped molecules decreasing in size with phylogenetic distance from Saccharomyces cerevisiae but with similar gene content including regulatory and selfish elements like origins of replication, introns, free-standing open reading frames or GC clusters. Their most profound feature is species-specific alteration in gene order. The genetic code slightly differs from well-established yeast mitochondrial code as GUG is used rarely as the translation start and CGA and CGC code for arginine. The multilocus phylogeny, inferred from mtDNA, does not correlate with the trees derived from nuclear genes. mtDNA data demonstrate that Saccharomyces cariocanus should be assigned as a separate species and Saccharomyces bayanus CBS 380T should not be considered as a distinct species due to mtDNA nearly identical to Saccharomyces uvarum mtDNA. Apparently, comparison of mtDNAs should not be neglected in genomic studies as it is an important tool to understand the origin and evolutionary history of some yeast species. PMID:28992063
The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins.
Ponce de Leon, Miguel; de Miranda, Antonio Basilio; Alvarez-Valin, Fernando; Carels, Nicolas
2014-01-01
For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional constraints on proteins.
Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity
NASA Astrophysics Data System (ADS)
Mukherjee, Shashi Bajaj; Sen, Pradip Kumar
2010-10-01
Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.
Physics behind the mechanical nucleosome positioning code
NASA Astrophysics Data System (ADS)
Zuiddam, Martijn; Everaers, Ralf; Schiessel, Helmut
2017-11-01
The positions along DNA molecules of nucleosomes, the most abundant DNA-protein complexes in cells, are influenced by the sequence-dependent DNA mechanics and geometry. This leads to the "nucleosome positioning code", a preference of nucleosomes for certain sequence motives. Here we introduce a simplified model of the nucleosome where a coarse-grained DNA molecule is frozen into an idealized superhelical shape. We calculate the exact sequence preferences of our nucleosome model and find it to reproduce qualitatively all the main features known to influence nucleosome positions. Moreover, using well-controlled approximations to this model allows us to come to a detailed understanding of the physics behind the sequence preferences of nucleosomes.
LaPolla, R J; Mayne, K M; Davidson, N
1984-01-01
A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870
Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu
2006-06-01
VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation.
Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu
2006-01-01
VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation. PMID:16757746
Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing
Just, Rebecca S.; Irwin, Jodi A.; Parson, Walther
2015-01-01
Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10–20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method. PMID:26009256
CAPRRESI: Chimera Assembly by Plasmid Recovery and Restriction Enzyme Site Insertion.
Santillán, Orlando; Ramírez-Romero, Miguel A; Dávila, Guillermo
2017-06-25
Here, we present chimera assembly by plasmid recovery and restriction enzyme site insertion (CAPRRESI). CAPRRESI benefits from many strengths of the original plasmid recovery method and introduces restriction enzyme digestion to ease DNA ligation reactions (required for chimera assembly). For this protocol, users clone wildtype genes into the same plasmid (pUC18 or pUC19). After the in silico selection of amino acid sequence regions where chimeras should be assembled, users obtain all the synonym DNA sequences that encode them. Ad hoc Perl scripts enable users to determine all synonym DNA sequences. After this step, another Perl script searches for restriction enzyme sites on all synonym DNA sequences. This in silico analysis is also performed using the ampicillin resistance gene (ampR) found on pUC18/19 plasmids. Users design oligonucleotides inside synonym regions to disrupt wildtype and ampR genes by PCR. After obtaining and purifying complementary DNA fragments, restriction enzyme digestion is accomplished. Chimera assembly is achieved by ligating appropriate complementary DNA fragments. pUC18/19 vectors are selected for CAPRRESI because they offer technical advantages, such as small size (2,686 base pairs), high copy number, advantageous sequencing reaction features, and commercial availability. The usage of restriction enzymes for chimera assembly eliminates the need for DNA polymerases yielding blunt-ended products. CAPRRESI is a fast and low-cost method for fusing protein-coding genes.
Signatures of DNA Methylation across Insects Suggest Reduced DNA Methylation Levels in Holometabola
Provataris, Panagiotis; Meusemann, Karen; Niehuis, Oliver; Grath, Sonja; Misof, Bernhard
2018-01-01
Abstract It has been experimentally shown that DNA methylation is involved in the regulation of gene expression and the silencing of transposable element activity in eukaryotes. The variable levels of DNA methylation among different insect species indicate an evolutionarily flexible role of DNA methylation in insects, which due to a lack of comparative data is not yet well-substantiated. Here, we use computational methods to trace signatures of DNA methylation across insects by analyzing transcriptomic and genomic sequence data from all currently recognized insect orders. We conclude that: 1) a functional methylation system relying exclusively on DNA methyltransferase 1 is widespread across insects. 2) DNA methylation has potentially been lost or extremely reduced in species belonging to springtails (Collembola), flies and relatives (Diptera), and twisted-winged parasites (Strepsiptera). 3) Holometabolous insects display signs of reduced DNA methylation levels in protein-coding sequences compared with hemimetabolous insects. 4) Evolutionarily conserved insect genes associated with housekeeping functions tend to display signs of heavier DNA methylation in comparison to the genomic/transcriptomic background. With this comparative study, we provide the much needed basis for experimental and detailed comparative analyses required to gain a deeper understanding on the evolution and function of DNA methylation in insects. PMID:29697817
Cloning and expression of a cDNA coding for catalase from zebrafish (Danio rerio).
Ken, C F; Lin, C T; Wu, J L; Shaw, J F
2000-06-01
A full-length complementary DNA (cDNA) clone encoding a catalase was amplified by the rapid amplication of cDNA ends-polymerase chain reaction (RACE-PCR) technique from zebrafish (Danio rerio) mRNA. Nucleotide sequence analysis of this cDNA clone revealed that it comprised a complete open reading frame coding for 526 amino acid residues and that it had a molecular mass of 59 654 Da. The deduced amino acid sequence showed high similarity with the sequences of catalase from swine (86.9%), mouse (85.8%), rat (85%), human (83.7%), fruit fly (75.6%), nematode (71.1%), and yeast (58.6%). The amino acid residues for secondary structures are apparently conserved as they are present in other mammal species. Furthermore, the coding region of zebrafish catalase was introduced into an expression vector, pET-20b(+), and transformed into Escherichia coli expression host BL21(DE3)pLysS. A 60-kDa active catalase protein was expressed and detected by Coomassie blue staining as well as activity staining on polyacrylamide gel followed electrophoresis.
Croteau, Rodney Bruce; Wildung, Mark Raymond; Crock, John E.
1999-01-01
A cDNA encoding (E)-.beta.-farnesene synthase from peppermint (Mentha piperita) has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Accordingly, an isolated DNA sequence (SEQ ID NO:1) is provided which codes for the expression of (E)-.beta.-farnesene synthase (SEQ ID NO:2), from peppermint (Mentha piperita). In other aspects, replicable recombinant cloning vehicles are provided which code for (E)-.beta.-farnesene synthase, or for a base sequence sufficiently complementary to at least a portion of (E)-.beta.-farnesene synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding (E)-.beta.-farnesene synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant (E)-.beta.-farnesene synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant (E)-.beta.-farnesene synthase may be used to obtain expression or enhanced expression of (E)-.beta.-farnesene synthase in plants in order to enhance the production of (E)-.beta.-farnesene, or may be otherwise employed for the regulation or expression of (E)-.beta.-farnesene synthase, or the production of its product.
Noy, Agnes; Pérez, Alberto; Laughton, Charles A.; Orozco, Modesto
2007-01-01
We explore here the possibility of determining theoretically the free energy change associated with large conformational transitions in DNA, like the solvent-induced B⇔A conformational change. We find that a combination of targeted molecular dynamics (tMD) and the weighted histogram analysis method (WHAM) can be used to trace this transition in both water and ethanol/water mixture. The pathway of the transition in the A→B direction mirrors the B→A pathway, and is dominated by two processes that occur somewhat independently: local changes in sugar puckering and global rearrangements (particularly twist and roll) in the structure. The B→A transition is found to be a quasi-harmonic process, which follows closely the first spontaneous deformation mode of B-DNA, showing that a physiologically-relevant deformation is in coded in the flexibility pattern of DNA. PMID:17459891
DNA methylation of miRNA coding sequences putatively associated with childhood obesity.
Mansego, M L; Garcia-Lacarte, M; Milagro, F I; Marti, A; Martinez, J A
2017-02-01
Epigenetic mechanisms may be involved in obesity onset and its consequences. The aim of the present study was to evaluate whether DNA methylation status in microRNA (miRNA) coding regions is associated with childhood obesity. DNA isolated from white blood cells of 24 children (identification sample: 12 obese and 12 non-obese) from the Grupo Navarro de Obesidad Infantil study was hybridized in a 450 K methylation microarray. Several CpGs whose DNA methylation levels were statistically different between obese and non-obese were validated by MassArray® in 95 children (validation sample) from the same study. Microarray analysis identified 16 differentially methylated CpGs between both groups (6 hypermethylated and 10 hypomethylated). DNA methylation levels in miR-1203, miR-412 and miR-216A coding regions significantly correlated with body mass index standard deviation score (BMI-SDS) and explained up to 40% of the variation of BMI-SDS. The network analysis identified 19 well-defined obesity-relevant biological pathways from the KEGG database. MassArray® validation identified three regions located in or near miR-1203, miR-412 and miR-216A coding regions differentially methylated between obese and non-obese children. The current work identified three CpG sites located in coding regions of three miRNAs (miR-1203, miR-412 and miR-216A) that were differentially methylated between obese and non-obese children, suggesting a role of miRNA epigenetic regulation in childhood obesity. © 2016 World Obesity Federation.
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka
2010-01-01
Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Buccal DNA collection: comparison of buccal swabs with FTA cards.
Milne, Elizabeth; van Bockxmeer, Frank M; Robertson, Laila; Brisbane, Joanna M; Ashton, Lesley J; Scott, Rodney J; Armstrong, Bruce K
2006-04-01
Collection and analysis of DNA, most commonly from blood or buccal cells, is becoming more common in epidemiologic studies. Buccal samples, which are painless to take and relatively easily collected, are often the preferred source. There are several buccal cell collection methods: swabs, brushes, mouthwash, and treated cards, such as FTA or IsoCode cards. Few studies have systematically compared methods of buccal cell collection with respect to DNA yield and amplification success under similar conditions. We compared buccal DNA collection and amplification using buccal swabs and FTA cards in 122 control subjects from our Australian case-control study of childhood acute lymphoblastic leukaemia. Buccal DNA was quantified using a real-time PCR for beta-actin and genotyped at the loci of three polymorphisms (MTHFR 677C>T, ACE I/D, and XPD 1012G>A). PCR was successful with DNA from buccal swabs for 62% to 89% of subjects and from FTA cards for 83% to 100% of subjects, depending on the locus. The matched pair odds ratios (95% confidence interval) comparing success of FTA cards with buccal swabs are as follows: MTHFR 677C>T using PCR-RFLP, 12.5 (11.6-13.5) and using real-time PCR, 130.0 (113.1-152.8); ACE I/D using PCR-amplified fragment length polymorphism, 3.36 (3.2-3.5); XPD 1012G>A using real-time PCR, 150.0 (132.7-172.3). FTA cards are a robust DNA collection method and generally produce DNA suitable for PCR more reliably than buccal swabs. There are, however, technical challenges in handling discs punched from FTA cards that intending users should be aware of.
Enyeart, Peter J; Mohr, Georg; Ellington, Andrew D; Lambowitz, Alan M
2014-01-13
Mobile group II introns are bacterial retrotransposons that combine the activities of an autocatalytic intron RNA (a ribozyme) and an intron-encoded reverse transcriptase to insert site-specifically into DNA. They recognize DNA target sites largely by base pairing of sequences within the intron RNA and achieve high DNA target specificity by using the ribozyme active site to couple correct base pairing to RNA-catalyzed intron integration. Algorithms have been developed to program the DNA target site specificity of several mobile group II introns, allowing them to be made into 'targetrons.' Targetrons function for gene targeting in a wide variety of bacteria and typically integrate at efficiencies high enough to be screened easily by colony PCR, without the need for selectable markers. Targetrons have found wide application in microbiological research, enabling gene targeting and genetic engineering of bacteria that had been intractable to other methods. Recently, a thermostable targetron has been developed for use in bacterial thermophiles, and new methods have been developed for using targetrons to position recombinase recognition sites, enabling large-scale genome-editing operations, such as deletions, inversions, insertions, and 'cut-and-pastes' (that is, translocation of large DNA segments), in a wide range of bacteria at high efficiency. Using targetrons in eukaryotes presents challenges due to the difficulties of nuclear localization and sub-optimal magnesium concentrations, although supplementation with magnesium can increase integration efficiency, and directed evolution is being employed to overcome these barriers. Finally, spurred by new methods for expressing group II intron reverse transcriptases that yield large amounts of highly active protein, thermostable group II intron reverse transcriptases from bacterial thermophiles are being used as research tools for a variety of applications, including qRT-PCR and next-generation RNA sequencing (RNA-seq). The high processivity and fidelity of group II intron reverse transcriptases along with their novel template-switching activity, which can directly link RNA-seq adaptor sequences to cDNAs during reverse transcription, open new approaches for RNA-seq and the identification and profiling of non-coding RNAs, with potentially wide applications in research and biotechnology.
NASA Astrophysics Data System (ADS)
Ignat, V.
2016-08-01
Advanced industrial countries are affected by technology theft. German industry annually loses more than 50 billion euros. The main causes are industrial espionage and fraudulent copying patents and industrial products. Many Asian countries are profiteering saving up to 65% of production costs. Most affected are small medium enterprises, who do not have sufficient economic power to assert themselves against some powerful countries. International organizations, such as Interpol and World Customs Organization - WCO - work together to combat international economic crime. Several methods of protection can be achieved by registering patents or specific technical methods for recognition of product originality. They have developed more suitable protection, like Hologram, magnetic stripe, barcode, CE marking, digital watermarks, DNA or Nano-technologies, security labels, radio frequency identification, micro color codes, matrix code, cryptographic encodings. The automotive industry has developed the method “Manufactures against Product Piracy”. A sticker on the package features original products and it uses a Data Matrix verifiable barcode. The code can be recorded with a smartphone camera. The smartphone is connected via Internet to a database, where the identification numbers of the original parts are stored.
Pilotte, Nils; Papaiakovou, Marina; Grant, Jessica R; Bierwert, Lou Ann; Llewellyn, Stacey; McCarthy, James S; Williams, Steven A
2016-03-01
The soil transmitted helminths are a group of parasitic worms responsible for extensive morbidity in many of the world's most economically depressed locations. With growing emphasis on disease mapping and eradication, the availability of accurate and cost-effective diagnostic measures is of paramount importance to global control and elimination efforts. While real-time PCR-based molecular detection assays have shown great promise, to date, these assays have utilized sub-optimal targets. By performing next-generation sequencing-based repeat analyses, we have identified high copy-number, non-coding DNA sequences from a series of soil transmitted pathogens. We have used these repetitive DNA elements as targets in the development of novel, multi-parallel, PCR-based diagnostic assays. Utilizing next-generation sequencing and the Galaxy-based RepeatExplorer web server, we performed repeat DNA analysis on five species of soil transmitted helminths (Necator americanus, Ancylostoma duodenale, Trichuris trichiura, Ascaris lumbricoides, and Strongyloides stercoralis). Employing high copy-number, non-coding repeat DNA sequences as targets, novel real-time PCR assays were designed, and assays were tested against established molecular detection methods. Each assay provided consistent detection of genomic DNA at quantities of 2 fg or less, demonstrated species-specificity, and showed an improved limit of detection over the existing, proven PCR-based assay. The utilization of next-generation sequencing-based repeat DNA analysis methodologies for the identification of molecular diagnostic targets has the ability to improve assay species-specificity and limits of detection. By exploiting such high copy-number repeat sequences, the assays described here will facilitate soil transmitted helminth diagnostic efforts. We recommend similar analyses when designing PCR-based diagnostic tests for the detection of other eukaryotic pathogens.
Shafaghat, Farzaneh; Abbasi-Kenarsari, Hajar; Majidi, Jafar; Movassaghpour, Ali Akbar; Shanehbandi, Dariush; Kazemi, Tohid
2015-01-01
Purpose: Transmembrane CD34 glycoprotein is the most important marker for identification, isolation and enumeration of hematopoietic stem cells (HSCs). We aimed in this study to clone the cDNA coding for human CD34 from KG1a cell line and stably express in mouse fibroblast cell line NIH-3T3. Such artificial cell line could be useful as proper immunogen for production of mouse monoclonal antibodies. Methods: CD34 cDNA was cloned from KG1a cell line after total RNA extraction and cDNA synthesis. Pfu DNA polymerase-amplified specific band was ligated to pGEMT-easy TA-cloning vector and sub-cloned in pCMV6-Neo expression vector. After transfection of NIH-3T3 cells using 3 μg of recombinant construct and 6 μl of JetPEI transfection reagent, stable expression was obtained by selection of cells by G418 antibiotic and confirmed by surface flow cytometry. Results: 1158 bp specific band was aligned completely to reference sequence in NCBI database corresponding to long isoform of human CD34. Transient and stable expression of human CD34 on transfected NIH-3T3 mouse fibroblast cells was achieved (25% and 95%, respectively) as shown by flow cytometry. Conclusion: Cloning and stable expression of human CD34 cDNA was successfully performed and validated by standard flow cytometric analysis. Due to murine origin of NIH-3T3 cell line, CD34-expressing NIH-3T3 cells could be useful as immunogen in production of diagnostic monoclonal antibodies against human CD34. This approach could bypass the need for purification of recombinant proteins produced in eukaryotic expression systems. PMID:25789221
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, B; Georgia Institute of Technology, Atlanta, GA; Wang, C
Purpose: To correlate the damage produced by particles of different types and qualities to cell survival on the basis of nanodosimetric analysis and advanced DNA structures in the cell nucleus. Methods: A Monte Carlo code was developed to simulate subnuclear DNA chromatin fibers (CFs) of 30nm utilizing a mean-free-path approach common to radiation transport. The cell nucleus was modeled as a spherical region containing 6000 chromatin-dense domains (CDs) of 400nm diameter, with additional CFs modeled in a sparser interchromatin region. The Geant4-DNA code was utilized to produce a particle track database representing various particles at different energies and dose quantities.more » These tracks were used to stochastically position the DNA structures based on their mean free path to interaction with CFs. Excitation and ionization events intersecting CFs were analyzed using the DBSCAN clustering algorithm for assessment of the likelihood of producing DSBs. Simulated DSBs were then assessed based on their proximity to one another for a probability of inducing cell death. Results: Variations in energy deposition to chromatin fibers match expectations based on differences in particle track structure. The quality of damage to CFs based on different particle types indicate more severe damage by high-LET radiation than low-LET radiation of identical particles. In addition, the model indicates more severe damage by protons than of alpha particles of same LET, which is consistent with differences in their track structure. Cell survival curves have been produced showing the L-Q behavior of sparsely ionizing radiation. Conclusion: Initial results indicate the feasibility of producing cell survival curves based on the Monte Carlo cell nucleus method. Accurate correlation between simulated DNA damage to cell survival on the basis of nanodosimetric analysis can provide insight into the biological responses to various radiation types. Current efforts are directed at producing cell survival curves for high-LET radiation.« less
Mitochondrial sequence analysis for forensic identification using pyrosequencing technology.
Andréasson, H; Asp, A; Alderborn, A; Gyllensten, U; Allen, M
2002-01-01
Over recent years, requests for mtDNA analysis in the field of forensic medicine have notably increased, and the results of such analyses have proved to be very useful in forensic cases where nuclear DNA analysis cannot be performed. Traditionally, mtDNA has been analyzed by DNA sequencing of the two hypervariable regions, HVI and HVII, in the D-loop. DNA sequence analysis using the conventional Sanger sequencing is very robust but time consuming and labor intensive. By contrast, mtDNA analysis based on the pyrosequencing technology provides fast and accurate results from the human mtDNA present in many types of evidence materials in forensic casework. The assay has been developed to determine polymorphic sites in the mitochondrial D-loop as well as the coding region to further increase the discrimination power of mtDNA analysis. The pyrosequencing technology for analysis of mtDNA polymorphisms has been tested with regard to sensitivity, reproducibility, and success rate when applied to control samples and actual casework materials. The results show that the method is very accurate and sensitive; the results are easily interpreted and provide a high success rate on casework samples. The panel of pyrosequencing reactions for the mtDNA polymorphisms were chosen to result in an optimal discrimination power in relation to the number of bases determined.
Wada, Kunio; Fukuyama, Tomoki; Nakashima, Nobuaki; Matsumoto, Kyomu
2015-07-01
As part of the Japanese Center for the Validation of Alternative Methods (JaCVAM) international validation study of in vivo rat alkaline comet assays, we examined cadmium chloride, chloroform, and D,L-menthol under blind conditions as coded chemicals in the liver and stomach of Sprague-Dawley rats after 3 days of administration. Cadmium chloride showed equivocal responses in the liver and stomach, supporting previous reports of its poor mutagenic potential and non-carcinogenic effects in these organs. Treatment with chloroform, which is a non-genotoxic carcinogen, did not induce DNA damage in the liver or stomach. Some histopathological changes, such as necrosis and degeneration, were observed in the liver; however, they did not affect the comet assay results. D,L-Menthol, a non-genotoxic non-carcinogen, did not induce liver or stomach DNA damage. These results indicate that the comet assay can reflect genotoxic properties under blind conditions. Copyright © 2015 Elsevier B.V. All rights reserved.
Liu, Huitao; Cui, Peng; Zhan, Kehui; Lin, Qiang; Zhuo, Guoyin; Guo, Xiaoli; Ding, Feng; Yang, Wenlong; Liu, Dongcheng; Hu, Songnian; Yu, Jun; Zhang, Aimin
2011-03-29
Plant mitochondria, semiautonomous organelles that function as manufacturers of cellular ATP, have their own genome that has a slow rate of evolution and rapid rearrangement. Cytoplasmic male sterility (CMS), a common phenotype in higher plants, is closely associated with rearrangements in mitochondrial DNA (mtDNA), and is widely used to produce F1 hybrid seeds in a variety of valuable crop species. Novel chimeric genes deduced from mtDNA rearrangements causing CMS have been identified in several plants, such as rice, sunflower, pepper, and rapeseed, but there are very few reports about mtDNA rearrangements in wheat. In the present work, we describe the mitochondrial genome of a wheat K-type CMS line and compare it with its maintainer line. The complete mtDNA sequence of a wheat K-type (with cytoplasm of Aegilops kotschyi) CMS line, Ks3, was assembled into a master circle (MC) molecule of 647,559 bp and found to harbor 34 known protein-coding genes, three rRNAs (18 S, 26 S, and 5 S rRNAs), and 16 different tRNAs. Compared to our previously published sequence of a K-type maintainer line, Km3, we detected Ks3-specific mtDNA (> 100 bp, 11.38%) and repeats (> 100 bp, 29 units) as well as genes that are unique to each line: rpl5 was missing in Ks3 and trnH was absent from Km3. We also defined 32 single nucleotide polymorphisms (SNPs) in 13 protein-coding, albeit functionally irrelevant, genes, and predicted 22 unique ORFs in Ks3, representing potential candidates for K-type CMS. All these sequence variations are candidates for involvement in CMS. A comparative analysis of the mtDNA of several angiosperms, including those from Ks3, Km3, rice, maize, Arabidopsis thaliana, and rapeseed, showed that non-coding sequences of higher plants had mostly divergent multiple reorganizations during the mtDNA evolution of higher plants. The complete mitochondrial genome of the wheat K-type CMS line Ks3 is very different from that of its maintainer line Km3, especially in non-coding sequences. Sequence rearrangement has produced novel chimeric ORFs, which may be candidate genes for CMS. Comparative analysis of several angiosperm mtDNAs indicated that non-coding sequences are the most frequently reorganized during mtDNA evolution in higher plants.
Interlaboratory Comparison of Methods Determining the Botanical Composition of Animal Feed.
Braglia, Luca; Morello, Laura; Gavazzi, Floriana; Gianì, Silvia; Mastromauro, Francesco; Breviario, Diego; Cardoso, Hélia Guerra; Valadas, Vera; Campos, Maria Doroteia
2018-01-01
A consortium of European enterprises and research institutions has been engaged in the Feed-Code Project with the aim of addressing the requirements stated in European Union Regulation No. 767/2009, concerning market placement and use of feed of known and ascertained botanical composition. Accordingly, an interlaboratory trial was set up to compare the performance of different assays based either on optical microscope or DNA analysis for the qualitative and quantitative identification of the composition of compound animal feeds. A tubulin-based polymorphism method, on which the Feed-Code platform was developed, provided the most accurate results. The present study highlights the need for the performance of ring trials for the determination of the botanical composition of animal feeds and raises an alarm on the actual status of analytical inaccuracy.
Cioffi, Anna Valentina; Ferrara, Diana; Cubellis, Maria Vittoria; Aniello, Francesco; Corrado, Marcella; Liguori, Francesca; Amoroso, Alessandro; Fucci, Laura; Branno, Margherita
2002-08-01
Analysis of the genome structure of the Paracentrotus lividus (sea urchin) DNA methyltransferase (DNA MTase) gene showed the presence of an open reading frame, named METEX, in intron 7 of the gene. METEX expression is developmentally regulated, showing no correlation with DNA MTase expression. In fact, DNA MTase transcripts are present at high concentrations in the early developmental stages, while METEX is expressed at late stages of development. Two METEX cDNA clones (Met1 and Met2) that are different in the 3' end have been isolated in a cDNA library screening. The putative translated protein from Met2 cDNA clone showed similarity with Escherichia coli endonuclease III on the basis of sequence and predictive three-dimensional structure. The protein, overexpressed in E. coli and purified, had functional properties similar to the endonuclease specific for apurinic/apyrimidinic (AP) sites on the basis of the lyase activity. Therefore the open reading frame, present in intron 7 of the P. lividus DNA MTase gene, codes for a functional AP endonuclease designated SuAP1.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.
1987-06-01
To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from lambdagt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. Inmore » RNA blots of poly(A)/sup +/ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species.« less
Zhao, A; Guo, A; Liu, Z; Pape, L
1997-01-01
The coding sequences for a Schizosaccharomyces pombe sequence-specific DNA binding protein, Reb1p, have been cloned. The predicted S. pombe Reb1p is 24-29% identical to mouse TTF-1 (transcription termination factor-1) and Saccharomyces cerevisiae REB1 protein, both of which direct termination of RNA polymerase I catalyzed transcripts. The S.pombe Reb1 cDNA encodes a predicted polypeptide of 504 amino acids with a predicted molecular weight of 58.4 kDa. The S. pombe Reb1p is unusual in that the bipartite DNA binding motif identified originally in S.cerevisiae and Klyveromyces lactis REB1 proteins is uninterrupted and thus S.pombe Reb1p may contain the smallest natural REB1 homologous DNA binding domain. Its genomic coding sequences were shown to be interrupted by two introns. A recombinant histidine-tagged Reb1 protein bearing the rDNA binding domain has two homologous, sequence-specific binding sites in the S. pomber DNA intergenic spacer, located between 289 and 480 nt downstream of the end of the approximately 25S rRNA coding sequences. Each binding site is 13-14 bp downstream of two of the three proposed in vivo termination sites. The core of this 17 bp site, AGGTAAGGGTAATGCAC, is specifically protected by Reb1p in footprinting analysis. PMID:9016645
Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun
2017-01-03
Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.
A Novel Cassette Method for Probe Evaluation in the Designed Biochips
Zinkevich, Vitaly; Sapojnikova, Nelly; Mitchell, Julian; Kartvelishvili, Tamar; Asatiani, Nino; Alkhalil, Samia; Bogdarina, Irina; Al-Humam, Abdulmohsen A.
2014-01-01
A critical step in biochip design is the selection of probes with identical hybridisation characteristics. In this article we describe a novel method for evaluating DNA hybridisation probes, allowing the fine-tuning of biochips, that uses cassettes with multiple probes. Each cassette contains probes in equimolar proportions so that their hybridisation performance can be assessed in a single reaction. The model used to demonstrate this method was a series of probes developed to detect TORCH pathogens. DNA probes were designed for Toxoplasma gondii, Chlamidia trachomatis, Rubella, Cytomegalovirus, and Herpes virus and these were used to construct the DNA cassettes. Five cassettes were constructed to detect TORCH pathogens using a variety of genes coding for membrane proteins, viral matrix protein, an early expressed viral protein, viral DNA polymerase and the repetitive gene B1 of Toxoplasma gondii. All of these probes, except that for the B1 gene, exhibited similar profiles under the same hybridisation conditions. The failure of the B1 gene probe to hybridise was not due to a position effect, and this indicated that the probe was unsuitable for inclusion in the biochip. The redesigned probe for the B1 gene exhibited identical hybridisation properties to the other probes, suitable for inclusion in a biochip. PMID:24897111
Synthetic Genome Recoding: New genetic codes for new features
Kuo, James; Stirling, Finn; Lau, Yu Heng; Shulgina, Yekaterina; Way, Jeffrey C.; Silver, Pamela A.
2018-01-01
Full genome recoding, or rewriting codon meaning, through chemical synthesis of entire bacterial chromosomes has become feasible in the past several years. Recoding an organism can impart new properties including non-natural amino acid incorporation, virus resistance, and biocontainment. The estimated cost of construction that includes DNA synthesis, assembly by recombination, and troubleshooting, is now comparable to costs of early stage development of drugs or other high-tech products. Here we discuss several recently published assembly methods and provide some thoughts on the future, including how synthetic efforts might benefit from analysis of natural recoding processes and organisms that use alternative genetic codes. PMID:28983660
Adachi, Noboru; Umetsu, Kazuo; Shojo, Hideki
2014-01-01
Mitochondrial DNA (mtDNA) is widely used for DNA analysis of highly degraded samples because of its polymorphic nature and high number of copies in a cell. However, as endogenous mtDNA in deteriorated samples is scarce and highly fragmented, it is not easy to obtain reliable data. In the current study, we report the risks of direct sequencing mtDNA in highly degraded material, and suggest a strategy to ensure the quality of sequencing data. It was observed that direct sequencing data of the hypervariable segment (HVS) 1 by using primer sets that generate an amplicon of 407 bp (long-primer sets) was different from results obtained by using newly designed primer sets that produce an amplicon of 120-139 bp (mini-primer sets). The data aligned with the results of mini-primer sets analysis in an amplicon length-dependent manner; the shorter the amplicon, the more evident the endogenous sequence became. Coding region analysis using multiplex amplified product-length polymorphisms revealed the incongruence of single nucleotide polymorphisms between the coding region and HVS 1 caused by contamination with exogenous mtDNA. Although the sequencing data obtained using long-primer sets turned out to be erroneous, it was unambiguous and reproducible. These findings suggest that PCR primers that produce amplicons shorter than those currently recognized should be used for mtDNA analysis in highly degraded samples. Haplogroup motif analysis of the coding region and HVS should also be performed to improve the reliability of forensic mtDNA data. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Kondoh, H; Paul, B R; Howe, M M
1980-09-01
A general method for constructing lambda specialized transducing phages is described. The method, which is potentially applicable to any gene of Escherichia coli, is based on using Mu DNA homology to direct the integration of a lambda pMu phage near the genes whose transduction is desired. With this method we isolated a lambda transducing phage carrying all 10 genes in the che gene cluster (map location, 41.5 to 42.5 min). The products of the cheA and tar genes were identified by using transducing phages with amber mutations in these genes. It was established that tar codes for methyl-accepting chemotaxis protein II (molecular weight, 62,000) and that cheA codes for two polypeptides (molecular weights, 76,000 and 66,000). Possible origins of the two cheA polypeptides are discussed.
Kotoula, Vassiliki; Lyberopoulou, Aggeliki; Papadopoulou, Kyriaki; Charalambous, Elpida; Alexopoulou, Zoi; Gakou, Chryssa; Lakis, Sotiris; Tsolaki, Eleftheria; Lilakos, Konstantinos; Fountzilas, George
2015-01-01
Background—Aim Massively parallel sequencing (MPS) holds promise for expanding cancer translational research and diagnostics. As yet, it has been applied on paraffin DNA (FFPE) with commercially available highly multiplexed gene panels (100s of DNA targets), while custom panels of low multiplexing are used for re-sequencing. Here, we evaluated the performance of two highly multiplexed custom panels on FFPE DNA. Methods Two custom multiplex amplification panels (B, 373 amplicons; T, 286 amplicons) were coupled with semiconductor sequencing on DNA samples from FFPE breast tumors and matched peripheral blood samples (n samples: 316; n libraries: 332). The two panels shared 37% DNA targets (common or shifted amplicons). Panel performance was evaluated in paired sample groups and quartets of libraries, where possible. Results Amplicon read ratios yielded similar patterns per gene with the same panel in FFPE and blood samples; however, performance of common amplicons differed between panels (p<0.001). FFPE genotypes were compared for 1267 coding and non-coding variant replicates, 999 out of which (78.8%) were concordant in different paired sample combinations. Variant frequency was highly reproducible (Spearman’s rho 0.959). Repeatedly discordant variants were of high coverage / low frequency (p<0.001). Genotype concordance was (a) high, for intra-run duplicates with the same panel (mean±SD: 97.2±4.7, 95%CI: 94.8–99.7, p<0.001); (b) modest, when the same DNA was analyzed with different panels (mean±SD: 81.1±20.3, 95%CI: 66.1–95.1, p = 0.004); and (c) low, when different DNA samples from the same tumor were compared with the same panel (mean±SD: 59.9±24.0; 95%CI: 43.3–76.5; p = 0.282). Low coverage / low frequency variants were validated with Sanger sequencing even in samples with unfavourable DNA quality. Conclusions Custom MPS may yield novel information on genomic alterations, provided that data evaluation is adjusted to tumor tissue FFPE DNA. To this scope, eligibility of all amplicons along with variant coverage and frequency need to be assessed. PMID:26039550
Avatar DNA Nanohybrid System in Chip-on-a-Phone
NASA Astrophysics Data System (ADS)
Park, Dae-Hwan; Han, Chang Jo; Shul, Yong-Gun; Choy, Jin-Ho
2014-05-01
Long admired for informational role and recognition function in multidisciplinary science, DNA nanohybrids have been emerging as ideal materials for molecular nanotechnology and genetic information code. Here, we designed an optical machine-readable DNA icon on microarray, Avatar DNA, for automatic identification and data capture such as Quick Response and ColorZip codes. Avatar icon is made of telepathic DNA-DNA hybrids inscribed on chips, which can be identified by camera of smartphone with application software. Information encoded in base-sequences can be accessed by connecting an off-line icon to an on-line web-server network to provide message, index, or URL from database library. Avatar DNA is then converged with nano-bio-info-cogno science: each building block stands for inorganic nanosheets, nucleotides, digits, and pixels. This convergence could address item-level identification that strengthens supply-chain security for drug counterfeits. It can, therefore, provide molecular-level vision through mobile network to coordinate and integrate data management channels for visual detection and recording.
Avatar DNA Nanohybrid System in Chip-on-a-Phone
Park, Dae-Hwan; Han, Chang Jo; Shul, Yong-Gun; Choy, Jin-Ho
2014-01-01
Long admired for informational role and recognition function in multidisciplinary science, DNA nanohybrids have been emerging as ideal materials for molecular nanotechnology and genetic information code. Here, we designed an optical machine-readable DNA icon on microarray, Avatar DNA, for automatic identification and data capture such as Quick Response and ColorZip codes. Avatar icon is made of telepathic DNA-DNA hybrids inscribed on chips, which can be identified by camera of smartphone with application software. Information encoded in base-sequences can be accessed by connecting an off-line icon to an on-line web-server network to provide message, index, or URL from database library. Avatar DNA is then converged with nano-bio-info-cogno science: each building block stands for inorganic nanosheets, nucleotides, digits, and pixels. This convergence could address item-level identification that strengthens supply-chain security for drug counterfeits. It can, therefore, provide molecular-level vision through mobile network to coordinate and integrate data management channels for visual detection and recording. PMID:24824876
Lectin cDNA and transgenic plants derived therefrom
Raikhel, Natasha V.
2000-10-03
Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties.
The PARTRAC code: Status and recent developments
NASA Astrophysics Data System (ADS)
Friedland, Werner; Kundrat, Pavel
Biophysical modeling is of particular value for predictions of radiation effects due to manned space missions. PARTRAC is an established tool for Monte Carlo-based simulations of radiation track structures, damage induction in cellular DNA and its repair [1]. Dedicated modules describe interactions of ionizing particles with the traversed medium, the production and reactions of reactive species, and score DNA damage determined by overlapping track structures with multi-scale chromatin models. The DNA repair module describes the repair of DNA double-strand breaks (DSB) via the non-homologous end-joining pathway; the code explicitly simulates the spatial mobility of individual DNA ends in parallel with their processing by major repair enzymes [2]. To simulate the yields and kinetics of radiation-induced chromosome aberrations, the repair module has been extended by tracking the information on the chromosome origin of ligated fragments as well as the presence of centromeres [3]. PARTRAC calculations have been benchmarked against experimental data on various biological endpoints induced by photon and ion irradiation. The calculated DNA fragment distributions after photon and ion irradiation reproduce corresponding experimental data and their dose- and LET-dependence. However, in particular for high-LET radiation many short DNA fragments are predicted below the detection limits of the measurements, so that the experiments significantly underestimate DSB yields by high-LET radiation [4]. The DNA repair module correctly describes the LET-dependent repair kinetics after (60) Co gamma-rays and different N-ion radiation qualities [2]. First calculations on the induction of chromosome aberrations have overestimated the absolute yields of dicentrics, but correctly reproduced their relative dose-dependence and the difference between gamma- and alpha particle irradiation [3]. Recent developments of the PARTRAC code include a model of hetero- vs euchromatin structures to enable accounting for variations in DNA damage yields, complexity and repair between these regions. Second, the applicability of the code to low-energy ions has been extended to full stopping by using a modified Barkas scaling of proton cross sections for ions heavier than helium. Third, ongoing studies aim at hitherto unprecedented benchmarking of the code against experiments with sub-muµm focused bunches of low-LET ions mimicking single high-LET ion tracks [5] which separate effects of damage clustering on a sub-mum scale from DNA damage complexity on a nanometer scale. Fourth, motivated by implications for the involvement of mitochondria in intercellular signaling and radiation-induced bystander effects, ongoing work extends the range of PARTRAC DNA models to radiation effects on mitochondrial DNA. The contribution will discuss the PARTRAC modules, benchmarks to experimental data, recent and ongoing developments of the code, with special attention to its implications and potential applications in radiation protection and space research. Acknowledgement. This work was partially funded by the EU (Contract FP7-249689 ‘DoReMi’). References 1. Friedland et al., Mutat. Res. 711, 28 (2011) 2. Friedland et al., Int. J. Radiat. Biol. 88, 129 (2012) 3. Friedland et al., Mutat. Res. 756, 213 (2013) 4. Alloni et al., Radiat. Res. 179, 690 (2013) 5. Schmid et al., Phys. Med. Biol. 57, 5889 (2012)
Computation of the Genetic Code
NASA Astrophysics Data System (ADS)
Kozlov, Nicolay N.; Kozlova, Olga N.
2018-03-01
One of the problems in the development of mathematical theory of the genetic code (summary is presented in [1], the detailed -to [2]) is the problem of the calculation of the genetic code. Similar problems in the world is unknown and could be delivered only in the 21st century. One approach to solving this problem is devoted to this work. For the first time provides a detailed description of the method of calculation of the genetic code, the idea of which was first published earlier [3]), and the choice of one of the most important sets for the calculation was based on an article [4]. Such a set of amino acid corresponds to a complete set of representations of the plurality of overlapping triple gene belonging to the same DNA strand. A separate issue was the initial point, triggering an iterative search process all codes submitted by the initial data. Mathematical analysis has shown that the said set contains some ambiguities, which have been founded because of our proposed compressed representation of the set. As a result, the developed method of calculation was limited to the two main stages of research, where the first stage only the of the area were used in the calculations. The proposed approach will significantly reduce the amount of computations at each step in this complex discrete structure.
N6-methyladenine: a conserved and dynamic DNA mark
O’Brown, Zach Klapholz; Greer, Eric Lieberman
2017-01-01
Chromatin, consisting of deoxyribonucleic acid (DNA) wrapped around histone proteins, facilitates DNA compaction and allows identical DNA code to confer many different cellular phenotypes. This biological versatility is accomplished in large part by post-translational modifications to histones and chemical modifications to DNA. These modifications direct the cellular machinery to expand or compact specific chromatin regions, and mark regions of the DNA as important for cellular functions. While each of the four bases that make up DNA can be modified (Iyer et al. 2011), this chapter will focus on methylation of the 6th position on adenines (6mA), as this modification has been poorly characterized in recently evolved eukaryotes but shows promise as a new conserved layer of epigenetic regulation. 6mA was previously thought to be restricted to unicellular organisms, but recent work has revealed its presence in more recently evolved metazoa. Here, we will briefly describe the history of 6mA, examine its evolutionary conservation, and evaluate the current methods for detecting 6mA. We will discuss the enzymes that bind and regulate this mark and finally examine known and potential functions of 6mA in eukaryotes. PMID:27826841
NASA Technical Reports Server (NTRS)
Chang, Dong Kyung; Metzgar, David; Wills, Christopher; Boland, C. Richard
2003-01-01
All "minor" components of the human DNA mismatch repair (MMR) system-MSH3, MSH6, PMS2, and the recently discovered MLH3-contain mononucleotide microsatellites in their coding sequences. This intriguing finding contrasts with the situation found in the major components of the DNA MMR system-MSH2 and MLH1-and, in fact, most human genes. Although eukaryotic genomes are rich in microsatellites, non-triplet microsatellites are rare in coding regions. The recurring presence of exonal mononucleotide repeat sequences within a single family of human genes would therefore be considered exceptional.
Delimitation of essential genes of cassava latent virus DNA 2.
Etessami, P; Callis, R; Ellwood, S; Stanley, J
1988-01-01
Insertion and deletion mutagenesis of both extended open reading frames (ORFs) of cassava latent virus DNA 2 destroys infectivity. Infectivity is restored by coinoculating constructs that contain single mutations within different ORFs. Although frequent intermolecular recombination produces dominant parental-type virus, mutants can be retained within the virus population indicating that they are competent for replication and suggesting that rescue can occur by complementation of trans acting gene products. By cloning specific fragments into DNA 1 coat protein deletion vectors we have delimited the DNA 2 coding regions and provide substantive evidence that both are essential for virus infection. Although a DNA 2 component is unique to whitefly-transmitted geminiviruses, the results demonstrate that neither coding region is involved solely in insect transmission. The requirement for a bipartite genome for whitefly-transmitted geminiviruses is discussed. Images PMID:3387209
A discriminatory function for prediction of protein-DNA interactions based on alpha shape modeling.
Zhou, Weiqiang; Yan, Hong
2010-10-15
Protein-DNA interaction has significant importance in many biological processes. However, the underlying principle of the molecular recognition process is still largely unknown. As more high-resolution 3D structures of protein-DNA complex are becoming available, the surface characteristics of the complex become an important research topic. In our work, we apply an alpha shape model to represent the surface structure of the protein-DNA complex and developed an interface-atom curvature-dependent conditional probability discriminatory function for the prediction of protein-DNA interaction. The interface-atom curvature-dependent formalism captures atomic interaction details better than the atomic distance-based method. The proposed method provides good performance in discriminating the native structures from the docking decoy sets, and outperforms the distance-dependent formalism in terms of the z-score. Computer experiment results show that the curvature-dependent formalism with the optimal parameters can achieve a native z-score of -8.17 in discriminating the native structure from the highest surface-complementarity scored decoy set and a native z-score of -7.38 in discriminating the native structure from the lowest RMSD decoy set. The interface-atom curvature-dependent formalism can also be used to predict apo version of DNA-binding proteins. These results suggest that the interface-atom curvature-dependent formalism has a good prediction capability for protein-DNA interactions. The code and data sets are available for download on http://www.hy8.com/bioinformatics.htm kenandzhou@hotmail.com.
Alvarado, David M; Yang, Ping; Druley, Todd E; Lovett, Michael; Gurnett, Christina A
2014-06-01
Despite declining sequencing costs, few methods are available for cost-effective single-nucleotide polymorphism (SNP), insertion/deletion (INDEL) and copy number variation (CNV) discovery in a single assay. Commercially available methods require a high investment to a specific region and are only cost-effective for large samples. Here, we introduce a novel, flexible approach for multiplexed targeted sequencing and CNV analysis of large genomic regions called multiplexed direct genomic selection (MDiGS). MDiGS combines biotinylated bacterial artificial chromosome (BAC) capture and multiplexed pooled capture for SNP/INDEL and CNV detection of 96 multiplexed samples on a single MiSeq run. MDiGS is advantageous over other methods for CNV detection because pooled sample capture and hybridization to large contiguous BAC baits reduces sample and probe hybridization variability inherent in other methods. We performed MDiGS capture for three chromosomal regions consisting of ∼ 550 kb of coding and non-coding sequence with DNA from 253 patients with congenital lower limb disorders. PITX1 nonsense and HOXC11 S191F missense mutations were identified that segregate in clubfoot families. Using a novel pooled-capture reference strategy, we identified recurrent chromosome chr17q23.1q23.2 duplications and small HOXC 5' cluster deletions (51 kb and 12 kb). Given the current interest in coding and non-coding variants in human disease, MDiGS fulfills a niche for comprehensive and low-cost evaluation of CNVs, coding, and non-coding variants across candidate regions of interest. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Systematic analysis and evolution of 5S ribosomal DNA in metazoans.
Vierna, J; Wehner, S; Höner zu Siederdissen, C; Martínez-Lage, A; Marz, M
2013-11-01
Several studies on 5S ribosomal DNA (5S rDNA) have been focused on a subset of the following features in mostly one organism: number of copies, pseudogenes, secondary structure, promoter and terminator characteristics, genomic arrangements, types of non-transcribed spacers and evolution. In this work, we systematically analyzed 5S rDNA sequence diversity in available metazoan genomes, and showed organism-specific and evolutionary-conserved features. Putatively functional sequences (12,766) from 97 organisms allowed us to identify general features of this multigene family in animals. Interestingly, we show that each mammal species has a highly conserved (housekeeping) 5S rRNA type and many variable ones. The genomic organization of 5S rDNA is still under debate. Here, we report the occurrence of several paralog 5S rRNA sequences in 58 of the examined species, and a flexible genome organization of 5S rDNA in animals. We found heterogeneous 5S rDNA clusters in several species, supporting the hypothesis of an exchange of 5S rDNA from one locus to another. A rather high degree of variation of upstream, internal and downstream putative regulatory regions appears to characterize metazoan 5S rDNA. We systematically studied the internal promoters and described three different types of termination signals, as well as variable distances between the coding region and the typical termination signal. Finally, we present a statistical method for detection of linkage among noncoding RNA (ncRNA) gene families. This method showed no evolutionary-conserved linkage among 5S rDNAs and any other ncRNA genes within Metazoa, even though we found 5S rDNA to be linked to various ncRNAs in several clades.
Systematic analysis and evolution of 5S ribosomal DNA in metazoans
Vierna, J; Wehner, S; Höner zu Siederdissen, C; Martínez-Lage, A; Marz, M
2013-01-01
Several studies on 5S ribosomal DNA (5S rDNA) have been focused on a subset of the following features in mostly one organism: number of copies, pseudogenes, secondary structure, promoter and terminator characteristics, genomic arrangements, types of non-transcribed spacers and evolution. In this work, we systematically analyzed 5S rDNA sequence diversity in available metazoan genomes, and showed organism-specific and evolutionary-conserved features. Putatively functional sequences (12 766) from 97 organisms allowed us to identify general features of this multigene family in animals. Interestingly, we show that each mammal species has a highly conserved (housekeeping) 5S rRNA type and many variable ones. The genomic organization of 5S rDNA is still under debate. Here, we report the occurrence of several paralog 5S rRNA sequences in 58 of the examined species, and a flexible genome organization of 5S rDNA in animals. We found heterogeneous 5S rDNA clusters in several species, supporting the hypothesis of an exchange of 5S rDNA from one locus to another. A rather high degree of variation of upstream, internal and downstream putative regulatory regions appears to characterize metazoan 5S rDNA. We systematically studied the internal promoters and described three different types of termination signals, as well as variable distances between the coding region and the typical termination signal. Finally, we present a statistical method for detection of linkage among noncoding RNA (ncRNA) gene families. This method showed no evolutionary-conserved linkage among 5S rDNAs and any other ncRNA genes within Metazoa, even though we found 5S rDNA to be linked to various ncRNAs in several clades. PMID:23838690
DNA fingerprinting of Chinese melon provides evidentiary support of seed quality appraisal.
Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin
2012-01-01
Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications.
DNA Fingerprinting of Chinese Melon Provides Evidentiary Support of Seed Quality Appraisal
Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin
2012-01-01
Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications. PMID:23285039
Multiple pathogen biomarker detection using an encoded bead array in droplet PCR.
Periyannan Rajeswari, Prem Kumar; Soderberg, Lovisa M; Yacoub, Alia; Leijon, Mikael; Andersson Svahn, Helene; Joensson, Haakan N
2017-08-01
We present a droplet PCR workflow for detection of multiple pathogen DNA biomarkers using fluorescent color-coded Luminex® beads. This strategy enables encoding of multiple singleplex droplet PCRs using a commercially available bead set of several hundred distinguishable fluorescence codes. This workflow provides scalability beyond the limited number offered by fluorescent detection probes such as TaqMan probes, commonly used in current multiplex droplet PCRs. The workflow was validated for three different Luminex bead sets coupled to target specific capture oligos to detect hybridization of three microorganisms infecting poultry: avian influenza, infectious laryngotracheitis virus and Campylobacter jejuni. In this assay, the target DNA was amplified with fluorescently labeled primers by PCR in parallel in monodisperse picoliter droplets, to avoid amplification bias. The color codes of the Luminex detection beads allowed concurrent and accurate classification of the different bead sets used in this assay. The hybridization assay detected target DNA of all three microorganisms with high specificity, from samples with average target concentration of a single DNA template molecule per droplet. This workflow demonstrates the possibility of increasing the droplet PCR assay detection panel to detect large numbers of targets in parallel, utilizing the scalability offered by the color-coded Luminex detection beads. Copyright © 2017. Published by Elsevier B.V.
The histone codes for meiosis.
Wang, Lina; Xu, Zhiliang; Khawar, Muhammad Babar; Liu, Chao; Li, Wei
2017-09-01
Meiosis is a specialized process that produces haploid gametes from diploid cells by a single round of DNA replication followed by two successive cell divisions. It contains many special events, such as programmed DNA double-strand break (DSB) formation, homologous recombination, crossover formation and resolution. These events are associated with dynamically regulated chromosomal structures, the dynamic transcriptional regulation and chromatin remodeling are mainly modulated by histone modifications, termed 'histone codes'. The purpose of this review is to summarize the histone codes that are required for meiosis during spermatogenesis and oogenesis, involving meiosis resumption, meiotic asymmetric division and other cellular processes. We not only systematically review the functional roles of histone codes in meiosis but also discuss future trends and perspectives in this field. © 2017 Society for Reproduction and Fertility.
Rapid screening for nuclear genes mutations in isolated respiratory chain complex I defects.
Pagniez-Mammeri, Hélène; Lombes, Anne; Brivet, Michèle; Ogier-de Baulny, Hélène; Landrieu, Pierre; Legrand, Alain; Slama, Abdelhamid
2009-04-01
Complex I or reduced nicotinamide adenine dinucleotide (NADH): ubiquinone oxydoreductase deficiency is the most common cause of respiratory chain defects. Molecular bases of complex I deficiencies are rarely identified because of the dual genetic origin of this multi-enzymatic complex (nuclear DNA and mitochondrial DNA) and the lack of phenotype-genotype correlation. We used a rapid method to screen patients with isolated complex I deficiencies for nuclear genes mutations by Surveyor nuclease digestion of cDNAs. Eight complex I nuclear genes, among the most frequently mutated (NDUFS1, NDUFS2, NDUFS3, NDUFS4, NDUFS7, NDUFS8, NDUFV1 and NDUFV2), were studied in 22 cDNA fragments spanning their coding sequences in 8 patients with a biochemically proved complex I deficiency. Single nucleotide polymorphisms and missense mutations were detected in 18.7% of the cDNA fragments by Surveyor nuclease treatment. Molecular defects were detected in 3 patients. Surveyor nuclease screening is a reliable method for genotyping nuclear complex I deficiencies, easy to interpret, and limits the number of sequence reactions. Its use will enhance the possibility of prenatal diagnosis and help us for a better understanding of complex I molecular defects.
Röper, Andrea; Reichert, Walter; Mattern, Rainer
2007-01-01
In the field of forensic DNA typing, the analysis of Short Tandem Repeats (STRs) can fail in cases of degraded DNA. The typing of coding region Single Nucleotide Polymorphisms (SNPs) of the mitochondrial genome provides an approach to acquire additional information. In the examined case of aggravated theft, both suspects could be excluded of having left the analyzed hair on the crime scene by SNP typing. This conclusion was not possible subsequent to STR typing. SNP typing of the trace on the torch light left on the crime scene increased the likelihood for suspect no. 2 to be the origin of this trace. This finding was already indicated by STR analysis. Suspect no. 1 was excluded for being the origin of this trace by SNP typing which was also indicated by STR analysis. A limiting factor for the analysis of SNPs is the maternal inheritance of mitochondrial DNA. Individualisation is not possible. In conclusion, it can be said that in the case of traces which cause problems with conventional STR typing the supplementary analysis of coding region SNPs from the mitochondrial genome is very reasonable and greatly contributes to the refinement of analysis methods in the field of forensic genetics.
SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments
Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic
2001-01-01
Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
Publishing large DNA sequence data in reduced spaces and lasting formats, in paper or PDF.
Aguiar, Alexandre Pires
2013-02-04
Scientific publications carry a practical moral duty: they must last. Along that line of thinking, some methods are proposed to allow economically and structurally viable publication of DNA sequence data of any size in printed matter and PDFs. The proposal is primarily aimed at contributing for preserving information for the future, while allowing authors to avoid information splitting and complement storage ex situ, that is, in server machines, outside the publication proper. The technique may also help to solve the impasse between the ICZN Code requirement that a new nomen be associated to diagnostic characters for the taxon vs. the phylogenetic definition of taxa, based on cladograms only: sequence data are characters, and can now be easily and comfortably included in taxonomic publications, with direct textual mention to their diagnostic sections. The compression level achieved allows the inclusion of all wanted DNA or RNA sequences in the same printed matter or PDF publications where the sequences are cited and discussed. Reduced font sizes, invisible fonts, and original 2D black & white and color barcodes are illustrated and briefly discussed. The level of data compression achieved can allow each full page of sequence data, or about 5000 characters, to be precisely coded into a color barcode as small as a square of 1.5 mm. A practical example is provided with Taeniogonalos woodorum Smith (Hymenoptera, Trigonalidae). Free software to generate publishable barcodes from txt or FASTA files is provided at www.systaxon.ufes.br/dna.
Antimicrobial peptide evolution in the Asiatic honey bee Apis cerana.
Xu, Peng; Shi, Min; Chen, Xue-Xin
2009-01-01
The Asiatic honeybee, Apis cerana Fabricius, is an important honeybee species in Asian countries. It is still found in the wild, but is also one of the few bee species that can be domesticated. It has acquired some genetic advantages and significantly different biological characteristics compared with other Apis species. However, it has been less studied, and over the past two decades, has become a threatened species in China. We designed primers for the sequences of the four antimicrobial peptide cDNA gene families (abaecin, defensin, apidaecin, and hymenoptaecin) of the Western honeybee, Apis mellifera L. and identified all the antimicrobial peptide cDNA genes in the Asiatic honeybee for the first time. All the sequences were amplified by reverse transcriptase-polymerase chain reaction (RT-PCR). In all, 29 different defensin cDNA genes coding 7 different defensin peptides, 11 different abaecin cDNA genes coding 2 different abaecin peptides, 13 different apidaecin cDNA genes coding 4 apidaecin peptides and 34 different hymenoptaecin cDNA genes coding 13 different hymenoptaecin peptides were cloned and identified from the Asiatic honeybee adult workers. Detailed comparison of these four antimicrobial peptide gene families with those of the Western honeybee revealed that there are many similarities in the quantity and amino acid components of peptides in the abaecin, defensin and apidaecin families, while many more hymenoptaecin peptides are found in the Asiatic honeybee than those in the Western honeybee (13 versus 1). The results indicated that the Asiatic honeybee adult generated more variable antimicrobial peptides, especially hymenoptaecin peptides than the Western honeybee when stimulated by pathogens or injury. This suggests that, compared to the Western honeybee that has a longer history of domestication, selection on the Asiatic honeybee has favored the generation of more variable antimicrobial peptides as protection against pathogens.
Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P
1988-02-01
Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.
Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P
1988-01-01
Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578
A Binary-Encounter-Bethe Approach to Simulate DNA Damage by the Direct Effect
NASA Technical Reports Server (NTRS)
Plante, Ianik; Cucinotta, Francis A.
2013-01-01
The DNA damage is of crucial importance in the understanding of the effects of ionizing radiation. The main mechanisms of DNA damage are by the direct effect of radiation (e.g. direct ionization) and by indirect effect (e.g. damage by.OH radicals created by the radiolysis of water). Despite years of research in this area, many questions on the formation of DNA damage remains. To refine existing DNA damage models, an approach based on the Binary-Encounter-Bethe (BEB) model was developed[1]. This model calculates differential cross sections for ionization of the molecular orbitals of the DNA bases, sugars and phosphates using the electron binding energy, the mean kinetic energy and the occupancy number of the orbital. This cross section has an analytic form which is quite convenient to use and allows the sampling of the energy loss occurring during an ionization event. To simulate the radiation track structure, the code RITRACKS developed at the NASA Johnson Space Center is used[2]. This code calculates all the energy deposition events and the formation of the radiolytic species by the ion and the secondary electrons as well. We have also developed a technique to use the integrated BEB cross section for the bases, sugar and phosphates in the radiation transport code RITRACKS. These techniques should allow the simulation of DNA damage by ionizing radiation, and understanding of the formation of double-strand breaks caused by clustered damage in different conditions.
Prody, C A; Zevin-Sonkin, D; Gnatt, A; Goldberg, O; Soreq, H
1987-01-01
To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase (BtChoEase; EC 3.1.1.8) and Torpedo electric organ "true" acetylcholinesterase (AcChoEase; EC 3.1.1.7). Using these probes, we isolated several cDNA clones from lambda gt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A)+ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These findings demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species. Images PMID:3035536
Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao
2018-01-01
Abstract Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. PMID:29069510
Su, Huei-Jiun; Hu, Jer-Ming
2012-01-01
Background and Aims The holoparasitic flowering plant Balanophora displays extreme floral reduction and was previously found to have enormous rate acceleration in the nuclear 18S rDNA region. So far, it remains unclear whether non-ribosomal, protein-coding genes of Balanophora also evolve in an accelerated fashion and whether the genes with high substitution rates retain their functionality. To tackle these issues, six different genes were sequenced from two Balanophora species and their rate variation and expression patterns were examined. Methods Sequences including nuclear PI, euAP3, TM6, LFY and RPB2 and mitochondrial matR were determined from two Balanophora spp. and compared with selected hemiparasitic species of Santalales and autotrophic core eudicots. Gene expression was detected for the six protein-coding genes and the expression patterns of the three B-class genes (PI, AP3 and TM6) were further examined across different organs of B. laxiflora using RT-PCR. Key Results Balanophora mitochondrial matR is highly accelerated in both nonsynonymous (dN) and synonymous (dS) substitution rates, whereas the rate variation of nuclear genes LFY, PI, euAP3, TM6 and RPB2 are less dramatic. Significant dS increases were detected in Balanophora PI, TM6, RPB2 and dN accelerations in euAP3. All of the protein-coding genes are expressed in inflorescences, indicative of their functionality. PI is restrictively expressed in tepals, synandria and floral bracts, whereas AP3 and TM6 are widely expressed in both male and female inflorescences. Conclusions Despite the observation that rates of sequence evolution are generally higher in Balanophora than in hemiparasitic species of Santalales and autotrophic core eudicots, the five nuclear protein-coding genes are functional and are evolving at a much slower rate than 18S rDNA. The mechanism or mechanisms responsible for rapid sequence evolution and concomitant rate acceleration for 18S rDNA and matR are currently not well understood and require further study in Balanophora and other holoparasites. PMID:23041381
Position specific variation in the rate of evolution in transcription factor binding sites
Moses, Alan M; Chiang, Derek Y; Kellis, Manolis; Lander, Eric S; Eisen, Michael B
2003-01-01
Background The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. Results Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms. Conclusion As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA. PMID:12946282
Chromatin accessibility prediction via a hybrid deep convolutional neural network.
Liu, Qiao; Xia, Fei; Yin, Qijin; Jiang, Rui
2018-03-01
A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies. We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases. Deopen is freely available at https://github.com/kimmo1019/Deopen. ruijiang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda
2013-01-01
While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002.
Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda
2013-01-01
While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002. PMID:23867905
NASA Astrophysics Data System (ADS)
Dumitrica, Traian; Hourahine, Ben; Aradi, Balint; Frauenheim, Thomas
We discus the coupling of the objective boundary conditions into the SCC density functional-based tight binding code DFTB+. The implementation is enabled by a generalization to the helical case of the classical Ewald method, specifically by Ewald-like formulas that do not rely on a unit cell with translational symmetry. The robustness of the method in addressing complex hetero-nuclear nano- and bio-fibrous systems is demonstrated with illustrative simulations on a helical boron nitride nanotube, a screw dislocated zinc oxide nanowire, and an ideal double-strand DNA. Work supported by NSF CMMI 1332228.
Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E
2012-07-01
Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.
Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing
2011-05-01
The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
Noncoding sequence classification based on wavelet transform analysis: part I
NASA Astrophysics Data System (ADS)
Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.
2017-09-01
DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
Shannon Entropy of the Canonical Genetic Code
NASA Astrophysics Data System (ADS)
Nemzer, Louis
The probability that a non-synonymous point mutation in DNA will adversely affect the functionality of the resultant protein is greatly reduced if the substitution is conservative. In that case, the amino acid coded by the mutated codon has similar physico-chemical properties to the original. Many simplified alphabets, which group the 20 common amino acids into families, have been proposed. To evaluate these schema objectively, we introduce a novel, quantitative method based on the inherent redundancy in the canonical genetic code. By calculating the Shannon information entropy carried by 1- or 2-bit messages, groupings that best leverage the robustness of the code are identified. The relative importance of properties related to protein folding - like hydropathy and size - and function, including side-chain acidity, can also be estimated. In addition, this approach allows us to quantify the average information value of nucleotide codon positions, and explore the physiological basis for distinguishing between transition and transversion mutations. Supported by NSU PFRDG Grant #335347.
2011-01-01
Background In recent years, phylogeographic studies have produced detailed knowledge on the worldwide distribution of mitochondrial DNA (mtDNA) variants, linking specific clades of the mtDNA phylogeny with certain geographic areas. However, a multiplex genotyping system for the detection of the mtDNA haplogroups of major continental distribution that would be desirable for efficient DNA-based bio-geographic ancestry testing in various applications is still missing. Results Three multiplex genotyping assays, based on single-base primer extension technology, were developed targeting a total of 36 coding-region mtDNA variants that together differentiate 43 matrilineal haplo-/paragroups. These include the major diagnostic haplogroups for Africa, Western Eurasia, Eastern Eurasia and Native America. The assays show high sensitivity with respect to the amount of template DNA: successful amplification could still be obtained when using as little as 4 pg of genomic DNA and the technology is suitable for medium-throughput analyses. Conclusions We introduce an efficient and sensitive multiplex genotyping system for bio-geographic ancestry inference from mtDNA that provides resolution on the continental level. The method can be applied in forensics, to aid tracing unknown suspects, as well as in population studies, genealogy and personal ancestry testing. For more complete inferences of overall bio-geographic ancestry from DNA, the mtDNA system provided here can be combined with multiplex systems for suitable autosomal and, in the case of males, Y-chromosomal ancestry-sensitive DNA markers. PMID:21429198
Artificial Intelligence, DNA Mimicry, and Human Health.
Stefano, George B; Kream, Richard M
2017-08-14
The molecular evolution of genomic DNA across diverse plant and animal phyla involved dynamic registrations of sequence modifications to maintain existential homeostasis to increasingly complex patterns of environmental stressors. As an essential corollary, driver effects of positive evolutionary pressure are hypothesized to effect concerted modifications of genomic DNA sequences to meet expanded platforms of regulatory controls for successful implementation of advanced physiological requirements. It is also clearly apparent that preservation of updated registries of advantageous modifications of genomic DNA sequences requires coordinate expansion of convergent cellular proofreading/error correction mechanisms that are encoded by reciprocally modified genomic DNA. Computational expansion of operationally defined DNA memory extends to coordinate modification of coding and previously under-emphasized noncoding regions that now appear to represent essential reservoirs of untapped genetic information amenable to evolutionary driven recruitment into the realm of biologically active domains. Additionally, expansion of DNA memory potential via chemical modification and activation of noncoding sequences is targeted to vertical augmentation and integration of an expanded cadre of transcriptional and epigenetic regulatory factors affecting linear coding of protein amino acid sequences within open reading frames.
The C terminus of Ku80 activates the DNA-dependent protein kinase catalytic subunit.
Singleton, B K; Torres-Arzayus, M I; Rottinghaus, S T; Taccioli, G E; Jeggo, P A
1999-05-01
Ku is a heterodimeric protein with double-stranded DNA end-binding activity that operates in the process of nonhomologous end joining. Ku is thought to target the DNA-dependent protein kinase (DNA-PK) complex to the DNA and, when DNA bound, can interact and activate the DNA-PK catalytic subunit (DNA-PKcs). We have carried out a 3' deletion analysis of Ku80, the larger subunit of Ku, and shown that the C-terminal 178 amino acid residues are dispensable for DNA end-binding activity but are required for efficient interaction of Ku with DNA-PKcs. Cells expressing Ku80 proteins that lack the terminal 178 residues have low DNA-PK activity, are radiation sensitive, and can recombine the signal junctions but not the coding junctions during V(D)J recombination. These cells have therefore acquired the phenotype of mouse SCID cells despite expressing DNA-PKcs protein, suggesting that an interaction between DNA-PKcs and Ku, involving the C-terminal region of Ku80, is required for DNA double-strand break rejoining and coding but not signal joint formation. To gain further insight into important domains in Ku80, we report a point mutational change in Ku80 in the defective xrs-2 cell line. This residue is conserved among species and lies outside of the previously reported Ku70-Ku80 interaction domain. The mutational change nonetheless abrogates the Ku70-Ku80 interaction and DNA end-binding activity.
Liu, Baodong; Liu, Xiaoling; Lai, Weiyi; Wang, Hailin
2017-06-06
DNA N 6 -methyl-2'-deoxyadenosine (6mdA) is an epigenetic modification in both eukaryotes and bacteria. Here we exploited stable isotope-labeled deoxynucleoside [ 15 N 5 ]-2'-deoxyadenosine ([ 15 N 5 ]-dA) as an initiation tracer and for the first time developed a metabolically differential tracing code for monitoring DNA 6mdA in human cells. We demonstrate that the initiation tracer [ 15 N 5 ]-dA undergoes a specific and efficient adenine deamination reaction leading to the loss the exocyclic amine 15 N, and further utilizes the purine salvage pathway to generate mainly both [ 15 N 4 ]-dA and [ 15 N 4 ]-2'-deoxyguanosine ([ 15 N 4 ]-dG) in mammalian genomes. However, [ 15 N 5 ]-dA is largely retained in the genomes of mycoplasmas, which are often found in cultured cells and experimental animals. Consequently, the methylation of dA generates 6mdA with a consistent coding pattern, with a predominance of [ 15 N 4 ]-6mdA. Therefore, mammalian DNA 6mdA can be potentially discriminated from that generated by infecting mycoplasmas. Collectively, we show a promising approach for identification of authentic DNA 6mdA in human cells and determine if the human cells are contaminated with mycoplasmas.
Living Organisms Author Their Read-Write Genomes in Evolution.
Shapiro, James A
2017-12-06
Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with "non-coding" DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called "non-coding" RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.
Structural features based genome-wide characterization and prediction of nucleosome organization
2012-01-01
Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene expression regulation. The results indicated that our proposed methods are effective in predicting nucleosome occupancy and positions and that these structural features are highly predictive of nucleosome organization. The implementation of our DLaNe method based on structural features is available online. PMID:22449207
Swimley, Michelle S.; Taylor, Amber W.; Dawson, Erica D.
2011-01-01
Abstract Shiga toxin–producing Escherichia coli O157 is a leading cause of foodborne illness worldwide. To evaluate better methods to rapidly detect and genotype E. coli O157 strains, the present study evaluated the use of ampliPHOX, a novel colorimetric detection method based on photopolymerization, for pathogen identification with DNA microarrays. A low-density DNA oligonucleotide microarray was designed to target stx1 and stx2 genes encoding Shiga toxin production, the eae gene coding for adherence membrane protein, and the per gene encoding the O157-antigen perosamine synthetase. Results from the validation experiments demonstrated that the use of ampliPHOX allowed the accurate genotyping of the tested E. coli strains, and positive hybridization signals were observed for only probes targeting virulence genes present in the reference strains. Quantification showed that the average signal-to-noise ratio values ranged from 47.73 ± 7.12 to 76.71 ± 8.33, whereas average signal-to-noise ratio values below 2.5 were determined for probes where no polymer was formed due to lack of specific hybridization. Sensitivity tests demonstrated that the sensitivity threshold for E. coli O157 detection was 100–1000 CFU/mL. Thus, the use of DNA microarrays in combination with photopolymerization allowed the rapid and accurate genotyping of E. coli O157 strains. PMID:21288130
[Current situation and prospect of breast cancer liquid biopsy].
Zhou, B; Xin, L; Xu, L; Ye, J M; Liu, Y H
2018-02-01
Liquid biopsy is a diagnostic approach by analyzing body fluid samples. Peripheral blood is the most common sample. Urine, saliva, pleural effusion and ascites are also used. Now liquid biopsy is mainly used in the area of neoplasm diagnosis and treatment. Compared with traditional tissue biopsy, liquid biopsy is minimally invasive, convenient to sample and easy to repeat. Liquid biopsy mainly includes circulating tumor cells and circulating tumor DNA (ctDNA) detection. Detection of ctDNA requires sensitive and accurate methods. The progression of next-generation sequencing (NGS) and digital PCR promote the process of studies in ctDNA. In 2016, Nature published the result of whole-genome sequencing study of breast cancer. The study found 1 628 mutations of 93 protein-coding genes which may be driver mutations of breast cancer. The result of this study provided a new platform for breast cancer ctDNA studies. In recent years, there were many studies using ctDNA detection to monitor therapeutic effect and guide treatment. NGS is a promising technique in accessing genetic information and guiding targeted therapy. It must be emphasized that ctDNA detection using NGS is still at research stage. It is important to standardize ctDNA detection technique and perform prospective clinical researches. The time is not ripe for using ctDNA detection to guide large-scale breast cancer clinical practice at present.
Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes
2007-10-01
reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07
Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes (Postprint)
2007-01-01
reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07
Superimposed Code Theorectic Analysis of DNA Codes and DNA Computing
2010-03-01
because only certain collections (partitioned by font type) of sequences are allowed to be in each position (e.g., Arial = position 0, Comic ...rigidity of short oligos and the shape of the polar charge. Oligo movement was modeled by a Brownian motion 3 dimensional random walk. The one...temperature, kB is Boltz he viscosity of the medium. The random walk motion is modeled by assuming the oligo is on a three dimensional lattice and may
Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer
Junk DNA and the long non-coding RNA twist in cancer genetics
Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A
2015-01-01
The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research
The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins
Public Perceptions and Expectations of the Forensic Use of DNA: Results of a Preliminary Study
ERIC Educational Resources Information Center
Curtis, Cate
2009-01-01
The forensic use of Deoxyribonucleic Acid (DNA) is demonstrating significant success as a crime-solving tool. However, numerous concerns have been raised regarding the potential for DNA use to contravene cultural, ethical, and legal codes. In this article the expectations and level of knowledge of the New Zealand public of the DNA data-bank and…
Evaluation of the phospholamban gene in purebred large-breed dogs with dilated cardiomyopathy.
Stabej, Polona; Leegwater, Peter A; Stokhof, Arnold A; Domanjko-Petric, Aleksandra; van Oost, Bernard A
2005-03-01
To evaluate the role of the phospholamban gene in purebred large-breed dogs with dilated cardiomyopathy (DCM). 6 dogs with DCM, including 2 Doberman Pinschers, 2 Newfoundlands, and 2 Great Danes. All dogs had clinical signs of congestive heart failure, and a diagnosis of DCM was made on the basis of echocardiographic findings. Blood samples were collected from each dog, and genomic DNA was isolated by a salt extraction method. Specific oligonucleotides were designed to amplify the promoter, exon 1, the 5'-part of exon 2 including the complete coding region, and part of intron 1 of the canine phospholamban gene via polymerase chain reaction procedures. These regions were screened for mutations in DNA obtained from the 6 dogs with DCM. No mutations were identified in the promoter, 5' untranslated region, part of intron 1, part of the 3' untranslated region, and the complete coding region of the phospholamban gene in dogs with DCM. Results indicate that mutations in the phospholamban gene are not a frequent cause of DCM in Doberman Pinschers, Newfoundlands, and Great Danes.
Xiao, W; Rank, G H
1989-03-15
The yeast SMR1 gene was used as a dominant resistance-selectable marker for industrial yeast transformation and for targeting integration of an economically important gene at the homologous ILV2 locus. A MEL1 gene, which codes for alpha-galactosidase, was inserted into a dispensable upstream region of SMR1 in vitro; different treatments of the plasmid (pWX813) prior to transformation resulted in 3' end, 5' end and replacement integrations that exhibited distinct integrant structures. One-step replacement within a nonessential region of the host genome generated a stable integration of MEL1 devoid of bacterial plasmid DNA. Using this method, we have constructed several alpha-galactosidase positive industrial Saccharomyces strains. Our study provides a general method for stable gene transfer in most industrial Saccharomyces yeasts, including those used in the baking, brewing (ale and lager), distilling, wine and sake industries, with solely nucleotide sequences of interest. The absence of bacterial DNA in the integrant structure facilitates the commercial application of recombinant DNA technology in the food and beverage industry.
Vander Lugt correlation of DNA sequence data
NASA Astrophysics Data System (ADS)
Christens-Barry, William A.; Hawk, James F.; Martin, James C.
1990-12-01
DNA, the molecule containing the genetic code of an organism, is a linear chain of subunits. It is the sequence of subunits, of which there are four kinds, that constitutes the unique blueprint of an individual. This sequence is the focus of a large number of analyses performed by an army of geneticists, biologists, and computer scientists. Most of these analyses entail searches for specific subsequences within the larger set of sequence data. Thus, most analyses are essentially pattern recognition or correlation tasks. Yet, there are special features to such analysis that influence the strategy and methods of an optical pattern recognition approach. While the serial processing employed in digital electronic computers remains the main engine of sequence analyses, there is no fundamental reason that more efficient parallel methods cannot be used. We describe an approach using optical pattern recognition (OPR) techniques based on matched spatial filtering. This allows parallel comparison of large blocks of sequence data. In this study we have simulated a Vander Lugt1 architecture implementing our approach. Searches for specific target sequence strings within a block of DNA sequence from the Co/El plasmid2 are performed.
Vaginal DNA vaccination against infectious diseases transmitted through the vagina.
Kanazawa, Takanori; Takashima, Yuuki; Okada, Hiroaki
2012-06-01
There is an urgent need for the development of vaccines against genital virus infections that are transmitted through heterosexual intercourse, including the HIV and HPV. In general, the surface of female genital mucosa, including vaginal mucosa, is the most common site of initiation of these infections. Thus, it is becoming clear that successful vaccines must induce both cellular and humoral immune responses in both the local genital tract and systemically. We believe that a strong vaginal immune response could be obtained by inducing strong gene expression of antigen-coding DNA in the local targeted tissue. In order to improve transfection efficiency in the vagina, it is important that methods allowing breakthrough of the various barriers, such as the epithelial layer, cellular and nuclear membrane, are developed. Therefore, systems providing less invasive and more effective delivery into the subepithelial layer are required. In this review, we will introduce our studies into efficient vaginal DNA vaccination methods, focusing on the effects of the menstrual cycle, utilization of the combination of functional peptides, and use of a needle-free injector.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Iovannisci, D.; Brown, C.; Winn-Deen, E.
1994-09-01
The cloning and sequencing of the gene associated with cystic fibrosis (CF) now provides the opportunity for earlier detection and carrier screening through DNA-based detection schemes. To date, over 300 mutations have been reported to the CF Consortium; however, only 30 mutations have been observed frequently enough world-wide to warrant routine screening. Many of these mutations are not available as cloned material or as established tissue culture cell lines to aid in the development of DNA-based detection assays. We have therefore cloned the 30 most frequently reported mutations, plus the mutation R347H due to its association with male infertility (31more » mutations, total). Two approaches were employed: direct PCR amplification, where mutations were available from patient sources, and site-directed PCR mutagenesis of normal genomic DNA to generate the remaining mutations. After amplification, products were cloned into a sequencing vector, bacterial transformants were screened by a novel method (PCR/oligonucleotide litigation assay/sequence-coded separation), and plamid DNA sequences determined by automated fluorescent methods on the Applied Biosystems 373A. Mixing of the clones allows the construction of artificial genotypes useful as positive control material for assay validation. A second round of mutagenesis, resulting in the construction of plasmids bearing multiple mutations, will be evaluated for their utility as reagent control materials in kit development.« less
Sato, Toshinori; Nakata, Mitsuhiro; Yang, Zhihong; Torizuka, Yu; Kishimoto, Satoko; Ishihara, Masayuki
2017-08-01
Lyophilization is an effective method for preserving nonviral gene vectors. To improve the stability and transgene expression of lyophilized plasmid DNA (pDNA) complexes, we coated the surfaces of pDNA/chitosan complexes with hyaluronic acid (HA) of varying molecular masses. The transgene expression of pDNA/chitosan/HA ternary complexes was characterized in vitro and in vivo. pDNA complexes were lyophilized overnight and the resultant products with spongy, porous consistencies were stored at -30, 4 or 25°C for 2 weeks. Rehydrated complexes were characterized using gel retardation assays, aiming to confirm complex formation, measure particle size and evaluate zeta potential, as well as conduct luciferase gene reporter assays. The anti-tumor effects of pDNA ternary complexes were evaluated using suicide gene (pTK) coding thymidine kinase in Huh7-implanted mice. Transfection efficiencies of pDNA/chitosan/HA ternary complexes were dependent on the average molecular masses of HA. The coating of pDNA/chitosan complexes with HA maintained the cellular transfection efficiencies of lyophilized pDNA ternary complexes. Furthermore, intratumoral injection of lyophilized, rehydrated pDNA ternary complexes into tumor-bearing mice showed a significant suppression of tumor growth. The coating of pDNA/chitosan complexes with high-molecular-weight HA augmented the stability and cellular transfection ability of the complexes after lyophilization-rehydration. Copyright © 2017 John Wiley & Sons, Ltd.
2013-01-01
Background Vitis vinifera L. is one of society’s most important agricultural crops with a broad genetic variability. The difficulty in recognizing grapevine genotypes based on ampelographic traits and secondary metabolites prompted the development of molecular markers suitable for achieving variety genetic identification. Findings Here, we propose a comparison between a multi-locus barcoding approach based on six chloroplast markers and a single-copy nuclear gene sequencing method using five coding regions combined with a character-based system with the aim of reconstructing cultivar-specific haplotypes and genotypes to be exploited for the molecular characterization of 157 V. vinifera accessions. The analysis of the chloroplast target regions proved the inadequacy of the DNA barcoding approach at the subspecies level, and hence further DNA genotyping analyses were targeted on the sequences of five nuclear single-copy genes amplified across all of the accessions. The sequencing of the coding region of the UFGT nuclear gene (UDP-glucose: flavonoid 3-0-glucosyltransferase, the key enzyme for the accumulation of anthocyanins in berry skins) enabled the discovery of discriminant SNPs (1/34 bp) and the reconstruction of 130 V. vinifera distinct genotypes. Most of the genotypes proved to be cultivar-specific, and only few genotypes were shared by more, although strictly related, cultivars. Conclusion On the whole, this technique was successful for inferring SNP-based genotypes of grapevine accessions suitable for assessing the genetic identity and ancestry of international cultivars and also useful for corroborating some hypotheses regarding the origin of local varieties, suggesting several issues of misidentification (synonymy/homonymy). PMID:24298902
Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao; Ning, Shangwei; Jin, Lianhong; Li, Xia
2018-01-04
Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Database extraction strategies for low-template evidence.
Bleka, Øyvind; Dørum, Guro; Haned, Hinda; Gill, Peter
2014-03-01
Often in forensic cases, the profile of at least one of the contributors to a DNA evidence sample is unknown and a database search is needed to discover possible perpetrators. In this article we consider two types of search strategies to extract suspects from a database using methods based on probability arguments. The performance of the proposed match scores is demonstrated by carrying out a study of each match score relative to the level of allele drop-out in the crime sample, simulating low-template DNA. The efficiency was measured by random man simulation and we compared the performance using the SGM Plus kit and the ESX 17 kit for the Norwegian population, demonstrating that the latter has greatly enhanced power to discover perpetrators of crime in large national DNA databases. The code for the database extraction strategies will be prepared for release in the R-package forensim. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong
2007-08-01
The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
Phylogeographic Differentiation of Mitochondrial DNA in Han Chinese
Yao, Yong-Gang; Kong, Qing-Peng; Bandelt, Hans-Jürgen; Kivisild, Toomas; Zhang, Ya-Ping
2002-01-01
To characterize the mitochondrial DNA (mtDNA) variation in Han Chinese from several provinces of China, we have sequenced the two hypervariable segments of the control region and the segment spanning nucleotide positions 10171–10659 of the coding region, and we have identified a number of specific coding-region mutations by direct sequencing or restriction-fragment–length–polymorphism tests. This allows us to define new haplogroups (clades of the mtDNA phylogeny) and to dissect the Han mtDNA pool on a phylogenetic basis, which is a prerequisite for any fine-grained phylogeographic analysis, the interpretation of ancient mtDNA, or future complete mtDNA sequencing efforts. Some of the haplogroups under study differ considerably in frequencies across different provinces. The southernmost provinces show more pronounced contrasts in their regional Han mtDNA pools than the central and northern provinces. These and other features of the geographical distribution of the mtDNA haplogroups observed in the Han Chinese make an initial Paleolithic colonization from south to north plausible but would suggest subsequent migration events in China that mainly proceeded from north to south and east to west. Lumping together all regional Han mtDNA pools into one fictive general mtDNA pool or choosing one or two regional Han populations to represent all Han Chinese is inappropriate for prehistoric considerations as well as for forensic purposes or medical disease studies. PMID:11836649
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helfenbein, Kevin G.; Brown, Wesley M.; Boore, Jeffrey L.
We have sequenced the complete mitochondrial DNA (mtDNA) of the articulate brachiopod Terebratalia transversa. The circular genome is 14,291 bp in size, relatively small compared to other published metazoan mtDNAs. The 37 genes commonly found in animal mtDNA are present; the size decrease is due to the truncation of several tRNA, rRNA, and protein genes, to some nucleotide overlaps, and to a paucity of non-coding nucleotides. Although the gene arrangement differs radically from those reported for other metazoans, some gene junctions are shared with two other articulate brachiopods, Laqueus rubellus and Terebratulina retusa. All genes in the T. transversa mtDNA,more » unlike those in most metazoan mtDNAs reported, are encoded by the same strand. The A+T content (59.1 percent) is low for a metazoan mtDNA, and there is a high propensity for homopolymer runs and a strong base-compositional strand bias. The coding strand is quite G+T-rich, a skew that is shared by the confamilial (laqueid) specie s L. rubellus, but opposite to that found in T. retusa, a cancellothyridid. These compositional skews are strongly reflected in the codon usage patterns and the amino acid compositions of the mitochondrial proteins, with markedly different usage observed between T. retusa and the two laqueids. This observation, plus the similarity of the laqueid non-coding regions to the reverse complement of the non-coding region of the cancellothyridid, suggest that an inversion that resulted in a reversal in the direction of first-strand replication has occurred in one of the two lineages. In addition to the presence of one non-coding region in T. transversa that is comparable to those in the other brachiopod mtDNAs, there are two others with the potential to form secondary structures; one or both of these may be involved in the process of transcript cleavage.« less
Cartwright, Joseph F; Anderson, Karin; Longworth, Joseph; Lobb, Philip; James, David C
2018-06-01
High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ∼40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5 kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding DNA. The putative ratio of non-synonymous and synonymous changes within the open reading frames (ORFs) in the plasmid sequence indicates that natural selection does not impact upon the prevalence of these mutations. Here we have demonstrated the abundance of mutations that fall outside of the reported range of detection of next generation sequencing (NGS) and second generation sequencing (SGS) platforms, providing a methodology capable of being utilized in cell line development platforms to identify the fidelity of recombinant genes throughout the production process. © 2018 Wiley Periodicals, Inc.
2015-01-01
Conformational polymorphism of DNA is a major causative factor behind several incurable trinucleotide repeat expansion disorders that arise from overexpansion of trinucleotide repeats located in coding/non-coding regions of specific genes. Hairpin DNA structures that are formed due to overexpansion of CAG repeat lead to Huntington’s disorder and spinocerebellar ataxias. Nonetheless, DNA hairpin stem structure that generally embraces B-form with canonical base pairs is poorly understood in the context of periodic noncanonical A…A mismatch as found in CAG repeat overexpansion. Molecular dynamics simulations on DNA hairpin stems containing A…A mismatches in a CAG repeat overexpansion show that A…A dictates local Z-form irrespective of starting glycosyl conformation, in sharp contrast to canonical DNA duplex. Transition from B-to-Z is due to the mechanistic effect that originates from its pronounced nonisostericity with flanking canonical base pairs facilitated by base extrusion, backbone and/or base flipping. Based on these structural insights we envisage that such an unusual DNA structure of the CAG hairpin stem may have a role in disease pathogenesis. As this is the first study that delineates the influence of a single A…A mismatch in reversing DNA helicity, it would further have an impact on understanding DNA mismatch repair. PMID:25876062
A Tandemly Arranged Pattern of Two 5S rDNA Arrays in Amolops mantzorum (Anura, Ranidae).
Liu, Ting; Song, Menghuan; Xia, Yun; Zeng, Xiaomao
2017-01-01
In an attempt to extend the knowledge of the 5S rDNA organization in anurans, the 5S rDNA sequences of Amolops mantzorum were isolated, characterized, and mapped by FISH. Two forms of 5S rDNA, type I (209 bp) and type II (about 870 bp), were found in specimens investigated from various populations. Both of them contained a 118-bp coding sequence, readily differentiated by their non-transcribed spacer (NTS) sizes and compositions. Four probes (the 5S rDNA coding sequences, the type I NTS, the type II NTS, and the entire type II 5S rDNA sequences) were respectively labeled with TAMRA or digoxigenin to hybridize with mitotic chromosomes for samples of all localities. It turned out that all probes showed the same signals that appeared in every centromeric region and in the telomeric regions of chromosome 5, without differences within or between populations. Obviously, both type I and type II of the 5S rDNA arrays arranged in tandem, which was contrasting with other frogs or fishes recorded to date. More interestingly, all the probes detected centromeric regions in all karyotypes, suggesting the presence of a satellite DNA family derived from 5S rDNA. © 2017 S. Karger AG, Basel.
Linear and Nonlinear Statistical Characterization of DNA
NASA Astrophysics Data System (ADS)
Norio Oiwa, Nestor; Goldman, Carla; Glazier, James
2002-03-01
We find spatial order in the distribution of protein-coding (including RNAs) and control segments of GenBank genomic sequences, irrespective of ATCG content. This is achieved by correlations, histograms, fractal dimensions and singularity spectra. Estimates of these quantities in complete nuclear genome indicate that coding sequences are long-range correlated and their disposition are self-similar (multifractal) for eukaryotes. These characteristics are absent in prokaryotes, where there are few noncoding sequences, suggesting the `junk' DNA play a relevant role to the genome structure and function. Concerning the genetic message of ATCG sequences, we build a random walk (Levy flight), using DNA symmetry arguments, where we associate A, T, C and G as left, right, down and up steps, respectively. Nonlinear analysis of mitochondrial DNA walks reveal multifractal pattern based on palindromic sequences, which fold in hairpins and loops.
Silencing of the pentose phosphate pathway genes influences DNA replication in human fibroblasts.
Fornalewicz, Karolina; Wieczorek, Aneta; Węgrzyn, Grzegorz; Łyżeń, Robert
2017-11-30
Previous reports and our recently published data indicated that some enzymes of glycolysis and the tricarboxylic acid cycle can affect the genome replication process by changing either the efficiency or timing of DNA synthesis in human normal cells. Both these pathways are connected with the pentose phosphate pathway (PPP pathway). The PPP pathway supports cell growth by generating energy and precursors for nucleotides and amino acids. Therefore, we asked if silencing of genes coding for enzymes involved in the pentose phosphate pathway may also affect the control of DNA replication in human fibroblasts. Particular genes coding for PPP pathway enzymes were partially silenced with specific siRNAs. Such cells remained viable. We found that silencing of the H6PD, PRPS1, RPE genes caused less efficient enterance to the S phase and decrease in efficiency of DNA synthesis. On the other hand, in cells treated with siRNA against G6PD, RBKS and TALDO genes, the fraction of cells entering the S phase was increased. However, only in the case of G6PD and TALDO, the ratio of BrdU incorporation to DNA was significantly changed. The presented results together with our previously published studies illustrate the complexity of the influence of genes coding for central carbon metabolism on the control of DNA replication in human fibroblasts, and indicate which of them are especially important in this process. Copyright © 2017 Elsevier B.V. All rights reserved.
Rectified factor networks for biclustering of omics data.
Clevert, Djork-Arné; Unterthiner, Thomas; Povysil, Gundula; Hochreiter, Sepp
2017-07-15
Biclustering has become a major tool for analyzing large datasets given as matrix of samples times features and has been successfully applied in life sciences and e-commerce for drug design and recommender systems, respectively. actor nalysis for cluster cquisition (FABIA), one of the most successful biclustering methods, is a generative model that represents each bicluster by two sparse membership vectors: one for the samples and one for the features. However, FABIA is restricted to about 20 code units because of the high computational complexity of computing the posterior. Furthermore, code units are sometimes insufficiently decorrelated and sample membership is difficult to determine. We propose to use the recently introduced unsupervised Deep Learning approach Rectified Factor Networks (RFNs) to overcome the drawbacks of existing biclustering methods. RFNs efficiently construct very sparse, non-linear, high-dimensional representations of the input via their posterior means. RFN learning is a generalized alternating minimization algorithm based on the posterior regularization method which enforces non-negative and normalized posterior means. Each code unit represents a bicluster, where samples for which the code unit is active belong to the bicluster and features that have activating weights to the code unit belong to the bicluster. On 400 benchmark datasets and on three gene expression datasets with known clusters, RFN outperformed 13 other biclustering methods including FABIA. On data of the 1000 Genomes Project, RFN could identify DNA segments which indicate, that interbreeding with other hominins starting already before ancestors of modern humans left Africa. https://github.com/bioinf-jku/librfn. djork-arne.clevert@bayer.com or hochreit@bioinf.jku.at. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Superimposed Code Theoretic Analysis of Deoxyribonucleic Acid (DNA) Codes and DNA Computing
2010-01-01
partitioned by font type) of sequences are allowed to be in each position (e.g., Arial = position 0, Comic = position 1, etc. ) and within each collection...movement was modeled by a Brownian motion 3 dimensional random walk. The one dimensional diffusion coefficient D for the ellipsoid shape with 3...temperature, kB is Boltzmann’s constant, and η is the viscosity of the medium. The random walk motion is modeled by assuming the oligo is on a three
Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer cells. Read more...
Shin, Kayeong; Choi, Jaeyeong; Kim, Yeoju; Lee, Yoonjeong; Kim, Joohoon; Lee, Seungho; Chung, Hoeil
2018-06-29
We propose a new analytical scheme in which field-flow fractionation (FFF)-based separation of target-specific polystyrene (PS) particle probes of different sizes are incorporated with amplified surface-enhanced Raman scattering (SERS) tagging for the simultaneous and sensitive detection of multiple microRNAs (miRNAs). For multiplexed detection, PS particles of three different diameters (15, 10, 5 μm) were used for the size-coding, and a probe single stranded DNA (ssDNA) complementary to a target miRNA was conjugated on an intended PS particle. After binding of a target miRNA on PS probe, polyadenylation reaction was executed to generate a long tail composed of adenine (A) serving as a binding site to thymine (T) conjugated Au nanoparticles (T-AuNPs) to increase SERS intensity. The three size-coded PS probes bound with T-AuNPs were then separated in a FFF channel. With the observation of extinction-based fractograms, separation of three size-coded PS probes was clearly confirmed, thereby enabling of measuring three miRNAs simultaneously. Raman intensities of FFF fractions collected at the peak maximum of 15, 10 and 5 μm PS probes varied fairy quantitatively with the change of miRNA concentrations, and the reproducibility of measurement was acceptable. The proposed method is potentially useful for simultaneous detection of multiple miRNAs with high sensitivity. Copyright © 2018 Elsevier B.V. All rights reserved.
Goremykin, Vadim V; Lockhart, Peter J; Viola, Roberto; Velasco, Riccardo
2012-08-01
Mitochondrial genomes of spermatophytes are the largest of all organellar genomes. Their large size has been attributed to various factors; however, the relative contribution of these factors to mitochondrial DNA (mtDNA) expansion remains undetermined. We estimated their relative contribution in Malus domestica (apple). The mitochondrial genome of apple has a size of 396 947 bp and a one to nine ratio of coding to non-coding DNA, close to the corresponding average values for angiosperms. We determined that 71.5% of the apple mtDNA sequence was highly similar to sequences of its nuclear DNA. Using nuclear gene exons, nuclear transposable elements and chloroplast DNA as markers of promiscuous DNA content in mtDNA, we estimated that approximately 20% of the apple mtDNA consisted of DNA sequences imported from other cell compartments, mostly from the nucleus. Similar marker-based estimates of promiscuous DNA content in the mitochondrial genomes of other species ranged between 21.2 and 25.3% of the total mtDNA length for grape, between 23.1 and 38.6% for rice, and between 47.1 and 78.4% for maize. All these estimates are conservative, because they underestimate the import of non-functional DNA. We propose that the import of promiscuous DNA is a core mechanism for mtDNA size expansion in seed plants. In apple, maize and grape this mechanism contributed far more to genome expansion than did homologous recombination. In rice the estimated contribution of both mechanisms was found to be similar. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.
Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei
2005-08-15
Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.
Microbeads display of proteins using emulsion PCR and cell-free protein synthesis.
Gan, Rui; Yamanaka, Yumiko; Kojima, Takaaki; Nakano, Hideo
2008-01-01
We developed a method for coupling protein to its coding DNA on magnetic microbeads using emulsion PCR and cell-free protein synthesis in emulsion. A PCR mixture containing streptavidin-coated microbeads was compartmentalized by water-in-oil (w/o) emulsion with estimated 0.5 template molecules per droplet. The template molecules were amplified and immobilized on beads via bead-linked reverse primers and biotinylated forward primers. After amplification, the templates were sequentially labeled with streptavidin and biotinylated anti-glutathione S-transferase (GST) antibody. The pool of beads was then subjected to cell-free protein synthesis compartmentalized in another w/o emulsion, in which templates were coupled to their coding proteins. We mixed two types of DNA templates of Histidine6 tag (His6)-fused and FLAG tag-fused GST in a ratio of 1:1,000 (His6: FLAG) for use as a model DNA library. After incubation with fluorescein isothiocyanate (FITC)-labeled anti-His6 (C-term) antibody, the beads with the His6 gene were enriched 917-fold in a single-round screening by using flow cytometry. A library with a theoretical diversity of 10(6) was constructed by randomizing the middle four residues of the His6 tag. After a two-round screening, the randomized sequences were substantially converged to peptide-encoding sequences recognized by the anti-His6 antibody.
[DNA barcoding and its utility in commonly-used medicinal snakes].
Huang, Yong; Zhang, Yue-yun; Zhao, Cheng-jian; Xu, Yong-li; Gu, Ying-le; Huang, Wen-qi; Lin, Kui; Li, Li
2015-03-01
Identification accuracy of traditional Chinese medicine is crucial for the traditional Chinese medicine research, production and application. DNA barcoding based on the mitochondrial gene coding for cytochrome c oxidase subunit I (COI), are more and more used for identification of traditional Chinese medicine. Using universal barcoding primers to sequence, we discussed the feasibility of DNA barcoding method for identification commonly-used medicinal snakes (a total of 109 samples belonging to 19 species 15 genera 6 families). The phylogenetic trees using Neighbor-joining were constructed. The results indicated that the mean content of G + C(46.5%) was lower than that of A + T (53.5%). As calculated by Kimera-2-parameter model, the mean intraspecies genetic distance of Trimeresurus albolabris, Ptyas dhumnades and Lycodon rufozonatus was greater than 2%. Further phylogenetic relationship results suggested that identification of one sample of T. albolabris was erroneous. The identification of some samples of P. dhumnades was also not correct, namely originally P. korros was identified as P. dhumnades. Factors influence on intraspecific genetic distance difference of L. rufozonatus need to be studied further. Therefore, DNA barcoding for identification of medicinal snakes is feasible, and greatly complements the morphological classification method. It is necessary to further study in identification of traditional Chinese medicine.
Nakamura, Mikiko; Suzuki, Ayako; Akada, Junko; Tomiyoshi, Keisuke; Hoshida, Hisashi; Akada, Rinji
2015-12-01
Mammalian gene expression constructs are generally prepared in a plasmid vector, in which a promoter and terminator are located upstream and downstream of a protein-coding sequence, respectively. In this study, we found that front terminator constructs-DNA constructs containing a terminator upstream of a promoter rather than downstream of a coding region-could sufficiently express proteins as a result of end joining of the introduced DNA fragment. By taking advantage of front terminator constructs, FLAG substitutions, and deletions were generated using mutagenesis primers to identify amino acids specifically recognized by commercial FLAG antibodies. A minimal epitope sequence for polyclonal FLAG antibody recognition was also identified. In addition, we analyzed the sequence of a C-terminal Ser-Lys-Leu peroxisome localization signal, and identified the key residues necessary for peroxisome targeting. Moreover, front terminator constructs of hepatitis B surface antigen were used for deletion analysis, leading to the identification of regions required for the particle formation. Collectively, these results indicate that front terminator constructs allow for easy manipulations of C-terminal protein-coding sequences, and suggest that direct gene expression with PCR-amplified DNA is useful for high-throughput protein analysis in mammalian cells.
Multimodal biometric digital watermarking on immigrant visas for homeland security
NASA Astrophysics Data System (ADS)
Sasi, Sreela; Tamhane, Kirti C.; Rajappa, Mahesh B.
2004-08-01
Passengers with immigrant Visa's are a major concern to the International Airports due to the various fraud operations identified. To curb tampering of genuine Visa, the Visa's should contain human identification information. Biometric characteristic is a common and reliable way to authenticate the identity of an individual [1]. A Multimodal Biometric Human Identification System (MBHIS) that integrates iris code, DNA fingerprint, and the passport number on the Visa photograph using digital watermarking scheme is presented. Digital Watermarking technique is well suited for any system requiring high security [2]. Ophthalmologists [3], [4], [5] suggested that iris scan is an accurate and nonintrusive optical fingerprint. DNA sequence can be used as a genetic barcode [6], [7]. While issuing Visa at the US consulates, the DNA sequence isolated from saliva, the iris code and passport number shall be digitally watermarked in the Visa photograph. This information is also recorded in the 'immigrant database'. A 'forward watermarking phase' combines a 2-D DWT transformed digital photograph with the personal identification information. A 'detection phase' extracts the watermarked information from this VISA photograph at the port of entry, from which iris code can be used for identification and DNA biometric for authentication, if an anomaly arises.
Sequence Polishing Library (SPL) v10.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oberortner, Ernst
The Sequence Polishing Library (SPL) is a suite of software tools in order to automate "Design for Synthesis and Assembly" workflows. Specifically: The SPL "Converter" tool converts files among the following sequence data exchange formats: CSV, FASTA, GenBank, and Synthetic Biology Open Language (SBOL); The SPL "Juggler" tool optimizes the codon usages of DNA coding sequences according to an optimization strategy, a user-specific codon usage table and genetic code. In addition, the SPL "Juggler" can translate amino acid sequences into DNA sequences.:The SPL "Polisher" verifies NA sequences against DNA synthesis constraints, such as GC content, repeating k-mers, and restriction sites.more » In case of violations, the "Polisher" reports the violations in a comprehensive manner. The "Polisher" tool can also modify the violating regions according to an optimization strategy, a user-specific codon usage table and genetic code;The SPL "Partitioner" decomposes large DNA sequences into smaller building blocks with partial overlaps that enable an efficient assembly. The "Partitioner" enables the user to configure the characteristics of the overlaps, which are mostly determined by the utilized assembly protocol, such as length, GC content, or melting temperature.« less
Ozawa, Tatsuhiko; Kondo, Masato; Isobe, Masaharu
2004-01-01
The 3' rapid amplification of cDNA ends (3' RACE) is widely used to isolate the cDNA of unknown 3' flanking sequences. However, the conventional 3' RACE often fails to amplify cDNA from a large transcript if there is a long distance between the 5' gene-specific primer and poly(A) stretch, since the conventional 3' RACE utilizes 3' oligo-dT-containing primer complementary to the poly(A) tail of mRNA at the first strand cDNA synthesis. To overcome this problem, we have developed an improved 3' RACE method suitable for the isolation of cDNA derived from very large transcripts. By using the oligonucleotide-containing random 9mer together with the GC-rich sequence for the suppression PCR technology at the first strand of cDNA synthesis, we have been able to amplify the cDNA from a very large transcript, such as the microtubule-actin crosslinking factor 1 (MACF1) gene, which codes a transcript of 20 kb in size. When there is no splicing variant, our highly specific amplification allows us to perform the direct sequencing of 3' RACE products without requiring cloning in bacterial hosts. Thus, this stepwise 3' RACE walking will help rapid characterization of the 3' structure of a gene, even when it encodes a very large transcript.
Wills, Peter R
2016-03-13
This article reviews contributions to this theme issue covering the topic 'DNA as information' in relation to the structure of DNA, the measure of its information content, the role and meaning of information in biology and the origin of genetic coding as a transition from uninformed to meaningful computational processes in physical systems. © 2016 The Author(s).
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2010 CFR
2010-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2011 CFR
2011-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2012 CFR
2012-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2014 CFR
2014-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2013 CFR
2013-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
NASA Astrophysics Data System (ADS)
Mackiewicz, P.; Gierlik, A.; Kowalczuk, M.; Szczepanik, D.; Dudek, M. R.; Cebrat, S.
1999-12-01
We have analysed protein coding and intergenic sequences in the Borrelia burgdorferi (the Lyme disease bacterium) genome using different kinds of DNA walks. Genes occupying the leading strand of DNA have significantly different nucleotide composition from genes occupying the lagging strand. Nucleotide compositional bias of the two DNA strands reflects the aminoacid composition of proteins. 96% of genes coding for ribosomal proteins lie on the leading DNA strand, which suggests that the positions of these as well as other genes are non-random. In the B. burgdorferi genome, the asymmetry in intergenic DNA sequences is lower than the asymmetry in the third positions in codons. All these characters of the B. burgdorferi genome suggest that both replication-associated mutational pressure and recombination mechanisms have established the specific structure of the genome and now any recombination leading to inversion of a gene in respect to the direction of replication is forbidden. This property of the genome allows us to assume that it is in a steady state, which enables us to fix some parameters for simulations of DNA evolution.
Cloning and sequence analysis of Hemonchus contortus HC58cDNA.
Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li
2007-06-01
The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.
URF6, Last Unidentified Reading Frame of Human mtDNA, Codes for an NADH Dehydrogenase Subunit
NASA Astrophysics Data System (ADS)
Chomyn, Anne; Cleeter, Michael W. J.; Ragan, C. Ian; Riley, Marcia; Doolittle, Russell F.; Attardi, Giuseppe
1986-10-01
The polypeptide encoded in URF6, the last unassigned reading frame of human mitochondrial DNA, has been identified with antibodies to peptides predicted from the DNA sequence. Antibodies prepared against highly purified respiratory chain NADH dehydrogenase from beef heart or against the cytoplasmically synthesized 49-kilodalton iron-sulfur subunit isolated from this enzyme complex, when added to a deoxycholate or a Triton X-100 mitochondrial lysate of HeLa cells, specifically precipitated the URF6 product together with the six other URF products previously identified as subunits of NADH dehydrogenase. These results strongly point to the URF6 product as being another subunit of this enzyme complex. Thus, almost 60% of the protein coding capacity of mammalian mitochondrial DNA is utilized for the assembly of the first enzyme complex of the respiratory chain. The absence of such information in yeast mitochondrial DNA dramatizes the variability in gene content of different mitochondrial genomes.
Boeri, Eduardo J.; Wanke, María M.; Madariaga, María J.; Teijeiro, María L.; Elena, Sebastian A.; Trangoni, Marcos D.
2018-01-01
Aim: This study aimed to compare the sensitivity (S), specificity (Sp), and positive likelihood ratios (LR+) of four polymerase chain reaction (PCR) assays for the detection of Brucella spp. in dog’s clinical samples. Materials and Methods: A total of 595 samples of whole blood, urine, and genital fluids were evaluated between October 2014 and November 2016. To compare PCR assays, the gold standard was defined using a combination of different serological and microbiological test. Bacterial isolation from urine and blood cultures was carried out. Serological methods such as rapid slide agglutination test, indirect enzyme-linked immunosorbent assay, agar gel immunodiffusion test, and buffered plate antigen test were performed. Four genes were evaluated: (i) The gene coding for the BCSP31 protein, (ii) the ribosomal gene coding for the 16S-23S intergenic spacer region, (iii) the gene coding for porins omp2a/omp2b, and (iv) the gene coding for the insertion sequence IS711. Results: The results obtained were as follows: (1) For the primers that amplify the gene coding for the BCSP31 protein: S: 45.64% (confidence interval [CI] 39.81-51.46), Sp: 95.62% (CI 93.13-98.12), and LR+: 10.43 (CI 6.04-18); (2) for the primers that amplify the ribosomal gene of the 16S-23S rDNA intergenic spacer region: S: 69.80% (CI 64.42-75.18), Sp: 95.62 % (CI 93.13-98.12), and LR+: 11.52 (CI 7.31-18.13); (3) for the primers that amplify the omp2a and omp2b genes: S: 39.26% (CI 33.55-44.97), Sp: 97.31% (CI 95.30-99.32), and LR+ 14.58 (CI 7.25-29.29); and (4) for the primers that amplify the insertion sequence IS711: S: 22.82% (CI 17.89 - 27.75), Sp: 99.66% (CI 98.84-100), and LR+ 67.77 (CI 9.47-484.89). Conclusion: We concluded that the gene coding for the 16S-23S rDNA intergenic spacer region was the one that best detected Brucella spp. in canine clinical samples. PMID:29657404
Magro, Massimiliano; Martinello, Tiziana; Bonaiuto, Emanuela; Gomiero, Chiara; Baratella, Davide; Zoppellaro, Giorgio; Cozza, Giorgio; Patruno, Marco; Zboril, Radek; Vianello, Fabio
2017-11-01
Conversely to common coated iron oxide nanoparticles, novel naked surface active maghemite nanoparticles (SAMNs) can covalently bind DNA. Plasmid (pDNA) harboring the coding gene for GFP was directly chemisorbed onto SAMNs, leading to a novel DNA nanovector (SAMN@pDNA). The spontaneous internalization of SAMN@pDNA into cells was compared with an extensively studied fluorescent SAMN derivative (SAMN@RITC). Moreover, the transfection efficiency of SAMN@pDNA was evaluated and explained by computational model. SAMN@pDNA was prepared and characterized by spectroscopic and computational methods, and molecular dynamic simulation. The size and hydrodynamic properties of SAMN@pDNA and SAMN@RITC were studied by electron transmission microscopy, light scattering and zeta-potential. The two nanomaterials were tested by confocal scanning microscopy on equine peripheral blood-derived mesenchymal stem cells (ePB-MSCs) and GFP expression by SAMN@pDNA was determined. Nanomaterials characterized by similar hydrodynamic properties were successfully internalized and stored into mesenchymal stem cells. Transfection by SAMN@pDNA occurred and GFP expression was higher than lipofectamine procedure, even in the absence of an external magnetic field. A computational model clarified that transfection efficiency can be ascribed to DNA availability inside cells. Direct covalent binding of DNA on naked magnetic nanoparticles led to an extremely robust gene delivery tool. Hydrodynamic and chemical-physical properties of SAMN@pDNA were responsible of the successful uptake by cells and of the efficiency of GFP gene transfection. SAMNs are characterized by colloidal stability, excellent cell uptake, persistence in the host cells, low toxicity and are proposed as novel intelligent DNA nanovectors for efficient cell transfection. Copyright © 2017 Elsevier B.V. All rights reserved.
Michel, Christian J
2017-04-18
In 1996, a set X of 20 trinucleotides was identified in genes of both prokaryotes and eukaryotes which has on average the highest occurrence in reading frame compared to its two shifted frames. Furthermore, this set X has an interesting mathematical property as X is a maximal C 3 self-complementary trinucleotide circular code. In 2015, by quantifying the inspection approach used in 1996, the circular code X was confirmed in the genes of bacteria and eukaryotes and was also identified in the genes of plasmids and viruses. The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e., of large and small lengths, with the same weight for searching the circular code X . As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code X is strengthened in the genes of bacteria, eukaryotes, plasmids, and viruses, and is now also identified in the genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code X . Finally, by studying viral genes, the circular code X was found in DNA genomes, RNA genomes, double-stranded genomes, and single-stranded genomes.
Suárez, Martha Y.; Villagrán; Miller, John H.
2015-01-01
We report on a new technique, computational DNA hole spectroscopy, which creates spectra of electron hole probabilities vs. nucleotide position. A hole is a site of positive charge created when an electron is removed. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of mitochondrial DNA reveal a correlation between L-strand hole spectrum peaks and spikes in the human mutation spectrum. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential disease ‘driver’ mutations. Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome. An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands. PMID:26310834
Villagrán, Martha Y Suárez; Miller, John H
2015-08-27
We report on a new technique, computational DNA hole spectroscopy, which creates spectra of electron hole probabilities vs. nucleotide position. A hole is a site of positive charge created when an electron is removed. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of mitochondrial DNA reveal a correlation between L-strand hole spectrum peaks and spikes in the human mutation spectrum. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential disease 'driver' mutations. Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome. An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands.
Morchikh, Mehdi; Cribier, Alexandra; Raffel, Raoul; Amraoui, Sonia; Cau, Julien; Severac, Dany; Dubois, Emeric; Schwartz, Olivier; Bennasser, Yamina; Benkirane, Monsef
2017-08-03
The DNA-mediated innate immune response underpins anti-microbial defenses and certain autoimmune diseases. Here we used immunoprecipitation, mass spectrometry, and RNA sequencing to identify a ribonuclear complex built around HEXIM1 and the long non-coding RNA NEAT1 that we dubbed the HEXIM1-DNA-PK-paraspeckle components-ribonucleoprotein complex (HDP-RNP). The HDP-RNP contains DNA-PK subunits (DNAPKc, Ku70, and Ku80) and paraspeckle proteins (SFPQ, NONO, PSPC1, RBM14, and MATRIN3). We show that binding of HEXIM1 to NEAT1 is required for its assembly. We further demonstrate that the HDP-RNP is required for the innate immune response to foreign DNA, through the cGAS-STING-IRF3 pathway. The HDP-RNP interacts with cGAS and its partner PQBP1, and their interaction is remodeled by foreign DNA. Remodeling leads to the release of paraspeckle proteins, recruitment of STING, and activation of DNAPKc and IRF3. Our study establishes the HDP-RNP as a key nuclear regulator of DNA-mediated activation of innate immune response through the cGAS-STING pathway. Copyright © 2017 Elsevier Inc. All rights reserved.
Does CTCF mediate between nuclear organization and gene expression?
Ohlsson, Rolf; Lobanenkov, Victor; Klenova, Elena
2010-01-01
The multifunctional zinc-finger protein CCCTC-binding factor (CTCF) is a very strong candidate for the role of coordinating the expression level of coding sequences with their three-dimensional position in the nucleus, apparently responding to a "code" in the DNA itself. Dynamic interactions between chromatin fibers in the context of nuclear architecture have been implicated in various aspects of genome functions. However, the molecular basis of these interactions still remains elusive and is a subject of intense debate. Here we discuss the nature of CTCF-DNA interactions, the CTCF-binding specificity to its binding sites and the relationship between CTCF and chromatin, and we examine data linking CTCF with gene regulation in the three-dimensional nuclear space. We discuss why these features render CTCF a very strong candidate for the role and propose a unifying model, the "CTCF code," explaining the mechanistic basis of how the information encrypted in DNA may be interpreted by CTCF into diverse nuclear functions.
2013-01-01
Background Significant efforts have been made to address the problem of identifying short genes in prokaryotic genomes. However, most known methods are not effective in detecting short genes. Because of the limited information contained in short DNA sequences, it is very difficult to accurately distinguish between protein coding and non-coding sequences in prokaryotic genomes. We have developed a new Iteratively Adaptive Sparse Partial Least Squares (IASPLS) algorithm as the classifier to improve the accuracy of the identification process. Results For testing, we chose the short coding and non-coding sequences from seven prokaryotic organisms. We used seven feature sets (including GC content, Z-curve, etc.) of short genes. In comparison with GeneMarkS, Metagene, Orphelia, and Heuristic Approachs methods, our model achieved the best prediction performance in identification of short prokaryotic genes. Even when we focused on the very short length group ([60–100 nt)), our model provided sensitivity as high as 83.44% and specificity as high as 92.8%. These values are two or three times higher than three of the other methods while Metagene fails to recognize genes in this length range. The experiments also proved that the IASPLS can improve the identification accuracy in comparison with other widely used classifiers, i.e. Logistic, Random Forest (RF) and K nearest neighbors (KNN). The accuracy in using IASPLS was improved 5.90% or more in comparison with the other methods. In addition to the improvements in accuracy, IASPLS required ten times less computer time than using KNN or RF. Conclusions It is conclusive that our method is preferable for application as an automated method of short gene classification. Its linearity and easily optimized parameters make it practicable for predicting short genes of newly-sequenced or under-studied species. Reviewers This article was reviewed by Alexey Kondrashov, Rajeev Azad (nominated by Dr J.Peter Gogarten) and Yuriy Fofanov (nominated by Dr Janet Siefert). PMID:24067167
Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.
2001-01-01
cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.
Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W
2018-05-31
In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C
2000-06-01
Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Herrnstadt, Corinna; Elson, Joanna L; Fahy, Eoin; Preston, Gwen; Turnbull, Douglass M; Anderson, Christen; Ghosh, Soumitra S; Olefsky, Jerrold M; Beal, M Flint; Davis, Robert E; Howell, Neil
2002-05-01
The evolution of the human mitochondrial genome is characterized by the emergence of ethnically distinct lineages or haplogroups. Nine European, seven Asian (including Native American), and three African mitochondrial DNA (mtDNA) haplogroups have been identified previously on the basis of the presence or absence of a relatively small number of restriction-enzyme recognition sites or on the basis of nucleotide sequences of the D-loop region. We have used reduced-median-network approaches to analyze 560 complete European, Asian, and African mtDNA coding-region sequences from unrelated individuals to develop a more complete understanding of sequence diversity both within and between haplogroups. A total of 497 haplogroup-associated polymorphisms were identified, 323 (65%) of which were associated with one haplogroup and 174 (35%) of which were associated with two or more haplogroups. Approximately one-half of these polymorphisms are reported for the first time here. Our results confirm and substantially extend the phylogenetic relationships among mitochondrial genomes described elsewhere from the major human ethnic groups. Another important result is that there were numerous instances both of parallel mutations at the same site and of reversion (i.e., homoplasy). It is likely that homoplasy in the coding region will confound evolutionary analysis of small sequence sets. By a linkage-disequilibrium approach, additional evidence for the absence of human mtDNA recombination is presented here.
Implementation of new physics models for low energy electrons in liquid water in Geant4-DNA.
Bordage, M C; Bordes, J; Edel, S; Terrissol, M; Franceries, X; Bardiès, M; Lampe, N; Incerti, S
2016-12-01
A new alternative set of elastic and inelastic cross sections has been added to the very low energy extension of the Geant4 Monte Carlo simulation toolkit, Geant4-DNA, for the simulation of electron interactions in liquid water. These cross sections have been obtained from the CPA100 Monte Carlo track structure code, which has been a reference in the microdosimetry community for many years. They are compared to the default Geant4-DNA cross sections and show better agreement with published data. In order to verify the correct implementation of the CPA100 cross section models in Geant4-DNA, simulations of the number of interactions and ranges were performed using Geant4-DNA with this new set of models, and the results were compared with corresponding results from the original CPA100 code. Good agreement is observed between the implementations, with relative differences lower than 1% regardless of the incident electron energy. Useful quantities related to the deposited energy at the scale of the cell or the organ of interest for internal dosimetry, like dose point kernels, are also calculated using these new physics models. They are compared with results obtained using the well-known Penelope Monte Carlo code. Copyright © 2016 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud
2000-01-01
Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413
DeWitt, D L; Smith, W L
1988-01-01
Prostaglandin G/H synthase (8,11,14-icosatrienoate, hydrogen-donor:oxygen oxidoreductase, EC 1.14.99.1) catalyzes the first step in the formation of prostaglandins and thromboxanes, the conversion of arachidonic acid to prostaglandin endoperoxides G and H. This enzyme is the site of action of nonsteroidal anti-inflammatory drugs. We have isolated a 2.7-kilobase complementary DNA (cDNA) encompassing the entire coding region of prostaglandin G/H synthase from sheep vesicular glands. This cDNA, cloned from a lambda gt 10 library prepared from poly(A)+ RNA of vesicular glands, hybridizes with a single 2.75-kilobase mRNA species. The cDNA clone was selected using oligonucleotide probes modeled from amino acid sequences of tryptic peptides prepared from the purified enzyme. The full-length cDNA encodes a protein of 600 amino acids, including a signal sequence of 24 amino acids. Identification of the cDNA as coding for prostaglandin G/H synthase is based on comparison of amino acid sequences of seven peptides comprising 103 amino acids with the amino acid sequence deduced from the nucleotide sequence of the cDNA. The molecular weight of the unglycosylated enzyme lacking the signal peptide is 65,621. The synthase is a glycoprotein, and there are three potential sites for N-glycosylation, two of them in the amino-terminal half of the molecule. The serine reported to be acetylated by aspirin is at position 530, near the carboxyl terminus. There is no significant similarity between the sequence of the synthase and that of any other protein in amino acid or nucleotide sequence libraries, and a heme binding site(s) is not apparent from the amino acid sequence. The availability of a full-length cDNA clone coding for prostaglandin G/H synthase should facilitate studies of the regulation of expression of this enzyme and the structural features important for catalysis and for interaction with anti-inflammatory drugs. Images PMID:3125548
Efficient gene knock-out and knock-in with transgenic Cas9 in Drosophila.
Xue, Zhaoyu; Ren, Mengda; Wu, Menghua; Dai, Junbiao; Rong, Yikang S; Gao, Guanjun
2014-03-21
Bacterial Cas9 nuclease induces site-specific DNA breaks using small gRNA as guides. Cas9 has been successfully introduced into Drosophila for genome editing. Here, we improve the versatility of this method by developing a transgenic system that expresses Cas9 in the Drosophila germline. Using this system, we induced inheritable knock-out mutations by injecting only the gRNA into embryos, achieved highly efficient mutagenesis by expressing gRNA from the promoter of a novel non-coding RNA gene, and recovered homologous recombination-based knock-in of a fluorescent marker at a rate of 4.5% by co-injecting gRNA with a circular DNA donor. Copyright © 2014 Xue et al.
NASA Technical Reports Server (NTRS)
Plante, I; Wu, H
2014-01-01
The code RITRACKS (Relativistic Ion Tracks) has been developed over the last few years at the NASA Johnson Space Center to simulate the effects of ionizing radiations at the microscopic scale, to understand the effects of space radiation at the biological level. The fundamental part of this code is the stochastic simulation of radiation track structure of heavy ions, an important component of space radiations. The code can calculate many relevant quantities such as the radial dose, voxel dose, and may also be used to calculate the dose in spherical and cylindrical targets of various sizes. Recently, we have incorporated DNA structure and damage simulations at the molecular scale in RITRACKS. The direct effect of radiations is simulated by introducing a slight modification of the existing particle transport algorithms, using the Binary-Encounter-Bethe model of ionization cross sections for each molecular orbitals of DNA. The simulation of radiation chemistry is done by a step-by-step diffusion-reaction program based on the Green's functions of the diffusion equation]. This approach is also used to simulate the indirect effect of ionizing radiation on DNA. The software can be installed independently on PC and tablets using the Windows operating system and does not require any coding from the user. It includes a Graphic User Interface (GUI) and a 3D OpenGL visualization interface. The calculations are executed simultaneously (in parallel) on multiple CPUs. The main features of the software will be presented.
Smurf2 Regulates DNA Repair and Packaging to Prevent Tumors | Center for Cancer Research
The blueprint for all of a cell’s functions is written in the genetic code of DNA sequences as well as in the landscape of DNA and histone modifications. DNA is wrapped around histones to package it into chromatin, which is stored in the nucleus. It is important to maintain the integrity of the chromatin structure to ensure that the cell continues to behave appropriately.
New Modeling Approaches to Study DNA Damage by the Direct and Indirect Effects of Ionizing Radiation
NASA Technical Reports Server (NTRS)
Plante, Ianik; Cucinotta, Francis A.
2012-01-01
DNA is damaged both by the direct and indirect effects of radiation. In the direct effect, the DNA itself is ionized, whereas the indirect effect involves the radiolysis of the water molecules surrounding the DNA and the subsequent reaction of the DNA with radical products. While this problem has been studied for many years, many unknowns still exist. To study this problem, we have developed the computer code RITRACKS [1], which simulates the radiation track structure for heavy ions and electrons, calculating all energy deposition events and the coordinates of all species produced by the water radiolysis. In this work, we plan to simulate DNA damage by using the crystal structure of a nucleosome and calculations performed by RITRACKS. The energy deposition events are used to calculate the dose deposited in nanovolumes [2] and therefore can be used to simulate the direct effect of the radiation. Using the positions of the radiolytic species with a radiation chemistry code [3] it will be possible to simulate DNA damage by indirect effect. The simulation results can be compared with results from previous calculations such as the frequencies of simple and complex strand breaks [4] and with newer experimental data using surrogate markers of DNA double ]strand breaks such as . ]H2AX foci [5].
Seligmann, Hervé
2013-03-01
Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Giardina, P; Cannio, R; Martirani, L; Marzullo, L; Palmieri, G; Sannia, G
1995-01-01
The gene (pox1) encoding a phenol oxidase from Pleurotus ostreatus, a lignin-degrading basidiomycete, was cloned and sequenced, and the corresponding pox1 cDNA was also synthesized and sequenced. The isolated gene consists of 2,592 bp, with the coding sequence being interrupted by 19 introns and flanked by an upstream region in which putative CAAT and TATA consensus sequences could be identified at positions -174 and -84, respectively. The isolation of a second cDNA (pox2 cDNA), showing 84% similarity, and of the corresponding truncated genomic clones demonstrated the existence of a multigene family coding for isoforms of laccase in P. ostreatus. PCR amplifications of specific regions on the DNA of isolated monokaryons proved that the two genes are not allelic forms. The POX1 amino acid sequence deduced was compared with those of other known laccases from different fungi. PMID:7793961
Gemini surfactants mediate efficient mitochondrial gene delivery and expression.
Cardoso, Ana M; Morais, Catarina M; Cruz, A Rita; Cardoso, Ana L; Silva, Sandra G; do Vale, M Luísa; Marques, Eduardo F; Pedroso de Lima, Maria C; Jurado, Amália S
2015-03-02
Gene delivery targeting mitochondria has the potential to transform the therapeutic landscape of mitochondrial genetic diseases. Taking advantage of the nonuniversal genetic code used by mitochondria, a plasmid DNA construct able to be specifically expressed in these organelles was designed by including a codon, which codes for an amino acid only if read by the mitochondrial ribosomes. In the present work, gemini surfactants were shown to successfully deliver plasmid DNA to mitochondria. Gemini surfactant-based DNA complexes were taken up by cells through a variety of routes, including endocytic pathways, and showed propensity for inducing membrane destabilization under acidic conditions, thus facilitating cytoplasmic release of DNA. Furthermore, the complexes interacted extensively with lipid membrane models mimicking the composition of the mitochondrial membrane, which predicts a favored interaction of the complexes with mitochondria in the intracellular environment. This work unravels new possibilities for gene therapy toward mitochondrial diseases.
Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database
2017-01-01
Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799
Pérez-Quintero, Alvaro L.; Rodriguez-R, Luis M.; Dereeper, Alexis; López, Camilo; Koebnik, Ralf; Szurek, Boris; Cunnac, Sebastien
2013-01-01
Transcription Activators-Like Effectors (TALEs) belong to a family of virulence proteins from the Xanthomonas genus of bacterial plant pathogens that are translocated into the plant cell. In the nucleus, TALEs act as transcription factors inducing the expression of susceptibility genes. A code for TALE-DNA binding specificity and high-resolution three-dimensional structures of TALE-DNA complexes were recently reported. Accurate prediction of TAL Effector Binding Elements (EBEs) is essential to elucidate the biological functions of the many sequenced TALEs as well as for robust design of artificial TALE DNA-binding domains in biotechnological applications. In this work a program with improved EBE prediction performances was developed using an updated specificity matrix and a position weight correction function to account for the matching pattern observed in a validation set of TALE-DNA interactions. To gain a systems perspective on the large TALE repertoires from X. oryzae strains, this program was used to predict rice gene targets for 99 sequenced family members. Integrating predictions and available expression data in a TALE-gene network revealed multiple candidate transcriptional targets for many TALEs as well as several possible instances of functional convergence among TALEs. PMID:23869221
Biology of Symbioses between Marine Invertebrates and Intracellular Bacteria
1991-01-21
bisphosphate carboxylase ( RubisCO ) from symbiotic bacteria of various origins, b) To continue methods development for 16S rRNA sequencing from symbionts in...frozen and badly preserved specimens, and c) To use these new techniques to sequence 16s DNA from a variety of symbionts a) RubisCO We have cloned the...gene coding for RubisCO from the sulfur oxidixing symbiont of the gastropod Alvinochoncha hessleri. Nucleotide sequence analysis of the cloned fragment
Kumar, P V; Sharma, S K; Rishi, N; Ghosh, D K; Baranwal, V K
Management of viral diseases relies on definite and sensitive detection methods. Citrus yellow mosaic virus (CYMV), a double stranded DNA virus of the genus Badnavirus, causes yellow mosaic disease in citrus plants. CYMV is transmitted through budwood and requires a robust and simplified indexing protocol for budwood certification programme. The present study reports development and standardization of an isothermal based recombinase polymerase amplification (RPA) assay for a sensitive, rapid, easy, and cost-effective method for detection and diagnosis of CYMV. Two different oligonucleotide primer sets were designed from ORF III (coding for polyprotein) and ORF II (coding for virion associated protein) regions of CYMV to perform amplification assays. Comparative evaluation of RPA, PCR and immuno-capture recombinase polymerase amplification (IC-RPA) based assays were done using purified DNA and plant crude sap. CYMV infection was efficiently detected from the crude sap in RPA and IC-RPA assays. The primer set used in RPA was specific and did not show any cross-amplification with banana streak MY virus (BSMYV), another Badnavirus species. The results from the present study indicated that RPA assay can be used easily in routine indexing of citrus planting material. To the best of our knowledge, this is the first report on development of a rapid and simplified isothermal detection assay for CYMV and can be utilized as an effective technique in quarantine and budwood certification process.
Leber Hereditary Optic Neuropathy: Exemplar of an mtDNA Disease.
Wallace, Douglas C; Lott, Marie T
2017-01-01
The report in 1988 that Leber Hereditary Optic Neuropathy (LHON) was the product of mitochondrial DNA (mtDNA) mutations provided the first demonstration of the clinical relevance of inherited mtDNA variation. From LHON studies, the medical importance was demonstrated for the mtDNA showing its coding for the most important energy genes, its maternal inheritance, its high mutation rate, its presence in hundreds to thousands of copies per cell, its quantitatively segregation of biallelic genotypes during both mitosis and meiosis, its preferential effect on the most energetic tissues including the eye and brain, its wide range of functional polymorphisms that predispose to common diseases, and its accumulation of mutations within somatic tissues providing the aging clock. These features of mtDNA genetics, in combination with the genetics of the 1-2000 nuclear DNA (nDNA) coded mitochondrial genes, is not only explaining the genetics of LHON but also providing a model for understanding the complexity of many common diseases. With the maturation of LHON biology and genetics, novel animal models for complex disease have been developed and new therapeutic targets and strategies envisioned, both pharmacological and genetic. Multiple somatic gene therapy approaches are being developed for LHON which are applicable to other mtDNA diseases. Moreover, the unique cytoplasmic genetics of the mtDNA has permitted the first successful human germline gene therapy via spindle nDNA transfer from mtDNA mutant oocytes to enucleated normal mtDNA oocytes. Such LHON lessons are actively being applied to common ophthalmological diseases like glaucoma and neurological diseases like Parkinsonism.
Screening for Protein-DNA Interactions by Automatable DNA-Protein Interaction ELISA
Schüssler, Axel; Kolukisaoglu, H. Üner; Koch, Grit; Wallmeroth, Niklas; Hecker, Andreas; Thurow, Kerstin; Zell, Andreas; Harter, Klaus; Wanke, Dierk
2013-01-01
DNA-binding proteins (DBPs), such as transcription factors, constitute about 10% of the protein-coding genes in eukaryotic genomes and play pivotal roles in the regulation of chromatin structure and gene expression by binding to short stretches of DNA. Despite their number and importance, only for a minor portion of DBPs the binding sequence had been disclosed. Methods that allow the de novo identification of DNA-binding motifs of known DBPs, such as protein binding microarray technology or SELEX, are not yet suited for high-throughput and automation. To close this gap, we report an automatable DNA-protein-interaction (DPI)-ELISA screen of an optimized double-stranded DNA (dsDNA) probe library that allows the high-throughput identification of hexanucleotide DNA-binding motifs. In contrast to other methods, this DPI-ELISA screen can be performed manually or with standard laboratory automation. Furthermore, output evaluation does not require extensive computational analysis to derive a binding consensus. We could show that the DPI-ELISA screen disclosed the full spectrum of binding preferences for a given DBP. As an example, AtWRKY11 was used to demonstrate that the automated DPI-ELISA screen revealed the entire range of in vitro binding preferences. In addition, protein extracts of AtbZIP63 and the DNA-binding domain of AtWRKY33 were analyzed, which led to a refinement of their known DNA-binding consensi. Finally, we performed a DPI-ELISA screen to disclose the DNA-binding consensus of a yet uncharacterized putative DBP, AtTIFY1. A palindromic TGATCA-consensus was uncovered and we could show that the GATC-core is compulsory for AtTIFY1 binding. This specific interaction between AtTIFY1 and its DNA-binding motif was confirmed by in vivo plant one-hybrid assays in protoplasts. Thus, the value and applicability of the DPI-ELISA screen for de novo binding site identification of DBPs, also under automatized conditions, is a promising approach for a deeper understanding of gene regulation in any organism of choice. PMID:24146751
Genes and Pathways Involved in Adult Onset Disorders Featuring Muscle Mitochondrial DNA Instability
Ahmed, Naghia; Ronchi, Dario; Comi, Giacomo Pietro
2015-01-01
Replication and maintenance of mtDNA entirely relies on a set of proteins encoded by the nuclear genome, which include members of the core replicative machinery, proteins involved in the homeostasis of mitochondrial dNTPs pools or deputed to the control of mitochondrial dynamics and morphology. Mutations in their coding genes have been observed in familial and sporadic forms of pediatric and adult-onset clinical phenotypes featuring mtDNA instability. The list of defects involved in these disorders has recently expanded, including mutations in the exo-/endo-nuclease flap-processing proteins MGME1 and DNA2, supporting the notion that an enzymatic DNA repair system actively takes place in mitochondria. The results obtained in the last few years acknowledge the contribution of next-generation sequencing methods in the identification of new disease loci in small groups of patients and even single probands. Although heterogeneous, these genes can be conveniently classified according to the pathway to which they belong. The definition of the molecular and biochemical features of these pathways might be helpful for fundamental knowledge of these disorders, to accelerate genetic diagnosis of patients and the development of rational therapies. In this review, we discuss the molecular findings disclosed in adult patients with muscle pathology hallmarked by mtDNA instability. PMID:26251896
Ramirez, Lisa Marie S; He, Muhan; Mailloux, Shay; George, Justin; Wang, Jun
2016-06-01
Microparticles carrying quick response (QR) barcodes are fabricated by J. Wang and co-workers on page 3259, using a massive coding of dissociated elements (MiCODE) technology. Each microparticle can bear a special custom-designed QR code that enables encryption or tagging with unlimited multiplexity, and the QR code can be easily read by cellphone applications. The utility of MiCODE particles in multiplexed DNA detection and microtagging for anti-counterfeiting is explored. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie
2003-04-02
Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.
Balintová, Jana; Plucnara, Medard; Vidláková, Pavlína; Pohl, Radek; Havran, Luděk; Fojta, Miroslav; Hocek, Michal
2013-09-16
Benzofurazane has been attached to nucleosides and dNTPs, either directly or through an acetylene linker, as a new redox label for electrochemical analysis of nucleotide sequences. Primer extension incorporation of the benzofurazane-modified dNTPs by polymerases has been developed for the construction of labeled oligonucleotide probes. In combination with nitrophenyl and aminophenyl labels, we have successfully developed a three-potential coding of DNA bases and have explored the relevant electrochemical potentials. The combination of benzofurazane and nitrophenyl reducible labels has proved to be excellent for ratiometric analysis of nucleotide sequences and is suitable for bioanalytical applications. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2011 CFR
2011-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2010 CFR
2010-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2012 CFR
2012-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2014 CFR
2014-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2013 CFR
2013-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
Castro-Chavez, Fernando
2014-01-01
Objective The objective of this article is to demonstrate that the genetic code can be studied and represented in a 3-D Sphered Cube for bioinformatics and for education by using the graphical help of the ancient “Book of Changes” or I Ching for the comparison, pair by pair, of the three basic characteristics of nucleotides: H-bonds, molecular structure, and their tautomerism. Methods The source of natural biodiversity is the high plasticity of the genetic code, analyzable with a reverse engineering of its 2-D and 3-D representations (here illustrated), but also through the classical 64-hexagrams of the ancient I Ching, as if they were the 64-codons or words of the genetic code. Results In this article, the four elements of the Yin/Yang were found by correlating the 3×2=6 sets of Cartesian comparisons of the mentioned properties of nucleic acids, to the directionality of their resulting blocks of codons grouped according to their resulting amino acids and/or functions, integrating a 384-codon Sphered Cube whose function is illustrated by comparing six brain peptides and a promoter of osteoblasts from Humans versus Neanderthal, as well as to Negadi’s work on the importance of the number 384 within the genetic code. Conclusions Starting with the codon/anticodon correlation of Nirenberg, published in full here for the first time, and by studying the genetic code and its 3-D display, the buffers of reiteration within codons codifying for the same amino acid, displayed the two long (binary number one) and older Yin/Yang arrows that travel in opposite directions, mimicking the parental DNA strands, while annealing to the two younger and broken (binary number zero) Yin/Yang arrows, mimicking the new DNA strands; the graphic analysis of the of the genetic code and its plasticity was helpful to compare compatible sequences (human compatible to human versus neanderthal compatible to neanderthal), while further exploring the wondrous biodiversity of nature for educational purposes. PMID:25340175
Arduino-based automation of a DNA extraction system.
Kim, Kyung-Won; Lee, Mi-So; Ryu, Mun-Ho; Kim, Jong-Won
2015-01-01
There have been many studies to detect infectious diseases with the molecular genetic method. This study presents an automation process for a DNA extraction system based on microfluidics and magnetic bead, which is part of a portable molecular genetic test system. This DNA extraction system consists of a cartridge with chambers, syringes, four linear stepper actuators, and a rotary stepper actuator. The actuators provide a sequence of steps in the DNA extraction process, such as transporting, mixing, and washing for the gene specimen, magnetic bead, and reagent solutions. The proposed automation system consists of a PC-based host application and an Arduino-based controller. The host application compiles a G code sequence file and interfaces with the controller to execute the compiled sequence. The controller executes stepper motor axis motion, time delay, and input-output manipulation. It drives the stepper motor with an open library, which provides a smooth linear acceleration profile. The controller also provides a homing sequence to establish the motor's reference position, and hard limit checking to prevent any over-travelling. The proposed system was implemented and its functionality was investigated, especially regarding positioning accuracy and velocity profile.
Wildman, Derek E.; Uddin, Monica; Liu, Guozhen; Grossman, Lawrence I.; Goodman, Morris
2003-01-01
What do functionally important DNA sites, those scrutinized and shaped by natural selection, tell us about the place of humans in evolution? Here we compare ≈90 kb of coding DNA nucleotide sequence from 97 human genes to their sequenced chimpanzee counterparts and to available sequenced gorilla, orangutan, and Old World monkey counterparts, and, on a more limited basis, to mouse. The nonsynonymous changes (functionally important), like synonymous changes (functionally much less important), show chimpanzees and humans to be most closely related, sharing 99.4% identity at nonsynonymous sites and 98.4% at synonymous sites. On a time scale, the coding DNA divergencies separate the human–chimpanzee clade from the gorilla clade at between 6 and 7 million years ago and place the most recent common ancestor of humans and chimpanzees at between 5 and 6 million years ago. The evolutionary rate of coding DNA in the catarrhine clade (Old World monkey and ape, including human) is much slower than in the lineage to mouse. Among the genes examined, 30 show evidence of positive selection during descent of catarrhines. Nonsynonymous substitutions by themselves, in this subset of positively selected genes, group humans and chimpanzees closest to each other and have chimpanzees diverge about as much from the common human–chimpanzee ancestor as humans do. This functional DNA evidence supports two previously offered taxonomic proposals: family Hominidae should include all extant apes; and genus Homo should include three extant species and two subgenera, Homo (Homo) sapiens (humankind), Homo (Pan) troglodytes (common chimpanzee), and Homo (Pan) paniscus (bonobo chimpanzee). PMID:12766228
Wildman, Derek E; Uddin, Monica; Liu, Guozhen; Grossman, Lawrence I; Goodman, Morris
2003-06-10
What do functionally important DNA sites, those scrutinized and shaped by natural selection, tell us about the place of humans in evolution? Here we compare approximately 90 kb of coding DNA nucleotide sequence from 97 human genes to their sequenced chimpanzee counterparts and to available sequenced gorilla, orangutan, and Old World monkey counterparts, and, on a more limited basis, to mouse. The nonsynonymous changes (functionally important), like synonymous changes (functionally much less important), show chimpanzees and humans to be most closely related, sharing 99.4% identity at nonsynonymous sites and 98.4% at synonymous sites. On a time scale, the coding DNA divergencies separate the human-chimpanzee clade from the gorilla clade at between 6 and 7 million years ago and place the most recent common ancestor of humans and chimpanzees at between 5 and 6 million years ago. The evolutionary rate of coding DNA in the catarrhine clade (Old World monkey and ape, including human) is much slower than in the lineage to mouse. Among the genes examined, 30 show evidence of positive selection during descent of catarrhines. Nonsynonymous substitutions by themselves, in this subset of positively selected genes, group humans and chimpanzees closest to each other and have chimpanzees diverge about as much from the common human-chimpanzee ancestor as humans do. This functional DNA evidence supports two previously offered taxonomic proposals: family Hominidae should include all extant apes; and genus Homo should include three extant species and two subgenera, Homo (Homo) sapiens (humankind), Homo (Pan) troglodytes (common chimpanzee), and Homo (Pan) paniscus (bonobo chimpanzee).
Changes in mitochondrial genetic codes as phylogenetic characters: Two examples from the flatworms
Telford, Maximilian J.; Herniou, Elisabeth A.; Russell, Robert B.; Littlewood, D. Timothy J.
2000-01-01
Shared molecular genetic characteristics other than DNA and protein sequences can provide excellent sources of phylogenetic information, particularly if they are complex and rare and are consequently unlikely to have arisen by chance convergence. We have used two such characters, arising from changes in mitochondrial genetic code, to define a clade within the Platyhelminthes (flatworms), the Rhabditophora. We have sampled 10 distinct classes within the Rhabditophora and find that all have the codon AAA coding for the amino acid Asn rather than the usual Lys and AUA for Ile rather than the usual Met. We find no evidence to support claims that the codon UAA codes for Tyr in the Platyhelminthes rather than the standard stop codon. The Rhabditophora are a very diverse group comprising the majority of the free-living turbellarian taxa and the parasitic Neodermata. In contrast, three other classes of turbellarian flatworm, the Acoela, Nemertodermatida, and Catenulida, have the standard invertebrate assignments for these codons and so are convincingly excluded from the rhabditophoran clade. We have developed a rapid computerized method for analyzing genetic codes and demonstrate the wide phylogenetic distribution of the standard invertebrate code as well as confirming already known metazoan deviations from it (ascidian, vertebrate, echinoderm/hemichordate). PMID:11027335
2010-01-01
Research in plant molecular biology involves DNA purification on a daily basis. Although different commercial kits enable convenient extraction of high-quality DNA from E. coli cells, PCR and agarose gel samples as well as plant tissues, each kit is designed for a particular type of DNA extraction work, and the cost of purchasing these kits over a long run can be considerable. Furthermore, a simple method for the isolation of binary plasmid from Agrobacterium tumefaciens cells with satisfactory yield is lacking. Here we describe an easy protocol using homemade silicon dioxide matrix and seven simple solutions for DNA extraction from E. coli and A. tumefaciens cells, PCR and restriction digests, agarose gel slices, and plant tissues. Compared with the commercial kits, this protocol allows rapid DNA purification from diverse sources with comparable yield and purity at negligible cost. Following this protocol, we have demonstrated: (1) DNA fragments as small as a MYC-epitope tag coding sequence can be successfully recovered from an agarose gel slice; (2) Miniprep DNA from E. coli can be eluted with as little as 5 μl water, leading to high DNA concentrations (>1 μg/μl) for efficient biolistic bombardment of Arabidopsis seedlings, polyethylene glycol (PEG)-mediated Arabidopsis protoplast transfection and maize protoplast electroporation; (3) Binary plasmid DNA prepared from A. tumefaciens is suitable for verification by restriction analysis without the need for large scale propagation; (4) High-quality genomic DNA is readily isolated from several plant species including Arabidopsis, tobacco and maize. Thus, the silicon dioxide matrix-based DNA purification protocol offers an easy, efficient and economical way to extract DNA for various purposes in plant research. PMID:20180960
Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse
Kortschak, R. Daniel
2018-01-01
The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or “churning” in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against. PMID:29677183
Neugebauer, Tomasz; Bordeleau, Eric; Burrus, Vincent; Brzezinski, Ryszard
2015-01-01
Data visualization methods are necessary during the exploration and analysis activities of an increasingly data-intensive scientific process. There are few existing visualization methods for raw nucleotide sequences of a whole genome or chromosome. Software for data visualization should allow the researchers to create accessible data visualization interfaces that can be exported and shared with others on the web. Herein, novel software developed for generating DNA data visualization interfaces is described. The software converts DNA data sets into images that are further processed as multi-scale images to be accessed through a web-based interface that supports zooming, panning and sequence fragment selection. Nucleotide composition frequencies and GC skew of a selected sequence segment can be obtained through the interface. The software was used to generate DNA data visualization of human and bacterial chromosomes. Examples of visually detectable features such as short and long direct repeats, long terminal repeats, mobile genetic elements, heterochromatic segments in microbial and human chromosomes, are presented. The software and its source code are available for download and further development. The visualization interfaces generated with the software allow for the immediate identification and observation of several types of sequence patterns in genomes of various sizes and origins. The visualization interfaces generated with the software are readily accessible through a web browser. This software is a useful research and teaching tool for genetics and structural genomics.
Bubbles and denaturation in DNA
NASA Astrophysics Data System (ADS)
van Erp, T. S.; Cuesta-López, S.; Peyrard, M.
2006-08-01
The local opening of DNA is an intriguing phenomenon from a statistical-physics point of view, but is also essential for its biological function. For instance, the transcription and replication of our genetic code cannot take place without the unwinding of the DNA double helix. Although these biological processes are driven by proteins, there might well be a relation between these biological openings and the spontaneous bubble formation due to thermal fluctuations. Mesoscopic models, like the Peyrard-Bishop-Dauxois (PBD) model, have fairly accurately reproduced some experimental denaturation curves and the sharp phase transition in the thermodynamic limit. It is, hence, tempting to see whether these models could be used to predict the biological activity of DNA. In a previous study, we introduced a method that allows to obtain very accurate results on this subject, which showed that some previous claims in this direction, based on molecular-dynamics studies, were premature. This could either imply that the present PBD model should be improved or that biological activity can only be predicted in a more complex framework that involves interactions with proteins and super helical stresses. In this article, we give a detailed description of the statistical method introduced before. Moreover, for several DNA sequences, we give a thorough analysis of the bubble-statistics as a function of position and bubble size and the so-called l-denaturation curves that can be measured experimentally. These show that some important experimental observations are missing in the present model. We discuss how the present model could be improved.
Detection of Merkel Cell Polyomavirus DNA in Serum Samples of Healthy Blood Donors
Mazzoni, Elisa; Rotondo, John C.; Marracino, Luisa; Selvatici, Rita; Bononi, Ilaria; Torreggiani, Elena; Touzé, Antoine; Martini, Fernanda; Tognon, Mauro G.
2017-01-01
Merkel cell polyomavirus (MCPyV) has been detected in 80% of Merkel cell carcinomas (MCC). In the host, the MCPyV reservoir remains elusive. MCPyV DNA sequences were revealed in blood donor buffy coats. In this study, MCPyV DNA sequences were investigated in the sera (n = 190) of healthy blood donors. Two MCPyV DNA sequences, coding for the viral oncoprotein large T antigen (LT), were investigated using polymerase chain reaction (PCR) methods and DNA sequencing. Circulating MCPyV sequences were detected in sera with a prevalence of 2.6% (5/190), at low-DNA viral load, which is in the range of 1–4 and 1–5 copies/μl by real-time PCR and droplet digital PCR, respectively. DNA sequencing carried out in the five MCPyV-positive samples indicated that the two MCPyV LT sequences which were analyzed belong to the MKL-1 strain. Circulating MCPyV LT sequences are present in blood donor sera. MCPyV-positive samples from blood donors could represent a potential vehicle for MCPyV infection in receivers, whereas an increase in viral load may occur with multiple blood transfusions. In certain patient conditions, such as immune-depression/suppression, additional disease or old age, transfusion of MCPyV-positive samples could be an additional risk factor for MCC onset. PMID:29238698
Greenberg, Jay R.; Perry, Robert P.
1971-01-01
The relationship of the DNA sequences from which polyribosomal messenger RNA (mRNA) and heterogeneous nuclear RNA (NRNA) of mouse L cells are transcribed was investigated by means of hybridization kinetics and thermal denaturation of the hybrids. Hybridization was performed in formamide solutions at DNA excess. Under these conditions most of the hybridizing mRNA and NRNA react at values of Dot (DNA concentration multiplied by time) expected for RNA transcribed from the nonrepeated or rarely repeated fraction of the genome. However, a fraction of both mRNA and NRNA hybridize at values of Dot about 10,000 times lower, and therefore must be transcribed from highly redundant DNA sequences. The fraction of NRNA hybridizing to highly repeated sequences is about 1.7 times greater than the corresponding fraction of mRNA. The hybrids formed by the rapidly reacting fractions of both NRNA and mRNA melt over a narrow temperature range with a midpoint about 11°C below that of native L cell DNA. This indicates that these hybrids consist of partially complementary sequences with approximately 11% mismatching of bases. Hybrids formed by the slowly reacting fraction of NRNA melt within 4°–6°C of native DNA, indicating very little, if any, mismatching of bases. Hybrids of the slowly reacting components of mRNA, formed under conditions of sufficiently low RNA input, have a high thermal stability, similar to that observed for hybrids of the slowly reacting NRNA component. However, when higher inputs of mRNA are used, hybrids are formed which have a strikingly lower thermal stability. This observation can be explained by assuming that there is sufficient similarity among the relatively rare DNA sequences coding for mRNA so that under hybridization conditions, in which these DNA sequences are not truly in excess, reversible hybrids exhibiting a considerable amount of mispairing are formed. The fact that a comparable phenomenon has not been observed for NRNA may mean that there is less similarity among the relatively rare DNA sequences coding for NRNA than there is among the rare sequences coding for mRNA. PMID:4999767
Nicolas, Laura; Cols, Montserrat; Choi, Jee Eun; Chaudhuri, Jayanta; Vuong, Bao
2018-01-01
Adaptive immune responses require the generation of a diverse repertoire of immunoglobulins (Igs) that can recognize and neutralize a seemingly infinite number of antigens. V(D)J recombination creates the primary Ig repertoire, which subsequently is modified by somatic hypermutation (SHM) and class switch recombination (CSR). SHM promotes Ig affinity maturation whereas CSR alters the effector function of the Ig. Both SHM and CSR require activation-induced cytidine deaminase (AID) to produce dU:dG mismatches in the Ig locus that are transformed into untemplated mutations in variable coding segments during SHM or DNA double-strand breaks (DSBs) in switch regions during CSR. Within the Ig locus, DNA repair pathways are diverted from their canonical role in maintaining genomic integrity to permit AID-directed mutation and deletion of gene coding segments. Recently identified proteins, genes, and regulatory networks have provided new insights into the temporally and spatially coordinated molecular interactions that control the formation and repair of DSBs within the Ig locus. Unravelling the genetic program that allows B cells to selectively alter the Ig coding regions while protecting non-Ig genes from DNA damage advances our understanding of the molecular processes that maintain genomic integrity as well as humoral immunity. PMID:29744038
Xiao, P; Niu, L L; Zhao, Q J; Chen, X Y; Wang, L J; Li, L; Zhang, H P; Guo, J Z; Xu, H Y; Zhong, T
2017-11-16
The origins and phylogeny of different sheep breeds has been widely studied using polymorphisms within the mitochondrial hypervariable region. However, little is known about the mitochondrial DNA (mtDNA) content and phylogeny based on mtDNA protein-coding genes. In this study, we assessed the phylogeny and copy number of the mtDNA in eight indigenous (population size, n=184) and three introduced (n=66) sheep breeds in China based on five mitochondrial coding genes (COX1, COX2, ATP8, ATP6 and COX3). The mean haplotype and nucleotide diversities were 0.944 and 0.00322, respectively. We identified a correlation between the lineages distribution and the genetic distance, whereby Valley-type Tibetan sheep had a closer genetic relationship with introduced breeds (Dorper, Poll Dorset and Suffolk) than with other indigenous breeds. Similarly, the Median-joining profile of haplotypes revealed the distribution of clusters according to genetic differences. Moreover, copy number analysis based on the five mitochondrial coding genes was affected by the genetic distance combining with genetic phylogeny; we also identified obvious non-synonymous mutations in ATP6 between the different levels of copy number expressions. These results imply that differences in mitogenomic compositions resulting from geographical separation lead to differences in mitochondrial function.
Clerc-Blain, Jessica L E; Starr, Julian R; Bull, Roger D; Saarela, Jeffery M
2010-01-01
Previous research on barcoding sedges (Carex) suggested that basic searches within a global barcoding database would probably not resolve more than 60% of the world's some 2000 species. In this study, we take an alternative approach and explore the performance of plant DNA barcoding in the Carex lineage from an explicitly regional perspective. We characterize the utility of a subset of the proposed protein-coding and noncoding plastid barcoding regions (matK, rpoB, rpoC1, rbcL, atpF-atpH, psbK-psbI) for distinguishing species of Carex and Kobresia in the Canadian Arctic Archipelago, a clearly defined eco-geographical region representing 1% of the Earth's landmass. Our results show that matK resolves the greatest number of species of any single-locus (95%), and when combined in a two-locus barcode, it provides 100% species resolution in all but one combination (matK + atpFH) during unweighted pair-group method with arithmetic mean averages (UPGMA) analyses. Noncoding regions were equally or more variable than matK, but as single markers they resolve substantially fewer taxa than matK alone. When difficulties with sequencing and alignment due to microstructural variation in noncoding regions are also considered, our results support other studies in suggesting that protein-coding regions are more practical as barcoding markers. Plastid DNA barcodes are an effective identification tool for species of Carex and Kobresia in the Canadian Arctic Archipelago, a region where the number of co-existing closely related species is limited. We suggest that if a regional approach to plant DNA barcoding was applied on a global scale, it could provide a solution to the generally poor species resolution seen in previous barcoding studies. © 2009 Blackwell Publishing Ltd.
Harnessing epigenome modifications for better crops
USDA-ARS?s Scientific Manuscript database
Chemical DNA modifications such as methylation influence translation of the DNA code to specific genetic outcomes. While such modifications can be heritable, others are transient, and their overall contribution to plant genetic diversity remains intriguing but uncertain. The focus of this article is...
Sun, Jiajie; Gao, Yuan; Liu, Dong; Ma, Wei; Xue, Jing; Zhang, Chunlei; Lan, Xianyong; Lei, Chuzhao; Chen, Hong
2012-06-01
The insulin-induced gene 1 (INSIG1) gene encodes a protein that blocks proteolytic activation of sterol regulatory element binding proteins, which are transcription factors that activate genes that regulate cholesterol, fatty acid, and glucose metabolism. However, similar research for the bovine INSIG1 gene is lacking. Therefore, in this study, polymorphisms of the bovine INSIG1 gene were detected in 643 individuals from four cattle breeds by DNA pooling, forced PCR-RFLP, PCR-SSCP, and DNA sequencing methods. Only 10 novel SNPs were identified, which included four mutations in the coding region and the others in the introns. In Nanyang individuals, seven common haplotypes were identified based on four coding region SNPs. The haplotype GACT, with a frequency of 75.4%, was the most prevalent haplotypes and SNPs formed two linkage disequilibrium blocks with strong multi-allelic D' (D' = 1). Additionally, association analysis between mutations of the bovine INSIG1 gene and growth traits in Nanyang cattle at 6, 12, 18, and 24 months old was performed, and the results indicated that the polymorphisms were not significantly associated with body mass.
DNA Mapping Made Simple: An Intellectual Activity about the Genetic Modification of Organisms
ERIC Educational Resources Information Center
Marques, Miguel; Arrabaca, Joao; Chagas, Isabel
2004-01-01
Since the discovery of the DNA double helix (in 1953 by Watson and Crick), technologies have been developed that allow scientists to manipulate the genome of bacteria to produce human hormones, as well as the genome of crop plants to achieve high yield and enhanced flavor. The universality of the genetic code has allowed DNA isolated from a…
Vlahovicek, K; Munteanu, M G; Pongor, S
1999-01-01
Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin
ERIC Educational Resources Information Center
Offner, Susan
2010-01-01
The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
NASA Technical Reports Server (NTRS)
Plante, Ianik; Ponomarev, Artem L.; Wu, Honglu; Blattnig, Steve; George, Kerry
2014-01-01
The formation of DNA double-strand breaks (DSBs) and chromosome aberrations is an important consequence of ionizing radiation. To simulate DNA double-strand breaks and the formation of chromosome aberrations, we have recently merged the codes RITRACKS (Relativistic Ion Tracks) and NASARTI (NASA Radiation Track Image). The program RITRACKS is a stochastic code developed to simulate detailed event-by-event radiation track structure: [1] This code is used to calculate the dose in voxels of 20 nm, in a volume containing simulated chromosomes, [2] The number of tracks in the volume is calculated for each simulation by sampling a Poisson distribution, with the distribution parameter obtained from the irradiation dose, ion type and energy. The program NASARTI generates the chromosomes present in a cell nucleus by random walks of 20 nm, corresponding to the size of the dose voxels, [3] The generated chromosomes are located within domains which may intertwine, and [4] Each segment of the random walks corresponds to approx. 2,000 DNA base pairs. NASARTI uses pre-calculated dose at each voxel to calculate the probability of DNA damage at each random walk segment. Using the location of double-strand breaks, possible rejoining between damaged segments is evaluated. This yields various types of chromosomes aberrations, including deletions, inversions, exchanges, etc. By performing the calculations using various types of radiations, it will be possible to obtain relative biological effectiveness (RBE) values for several types of chromosome aberrations.
A genetic investigation of Korean mummies from the Joseon Dynasty.
Kim, Na Young; Lee, Hwan Young; Park, Myung Jin; Yang, Woo Ick; Shin, Kyoung-Jin
2011-01-01
Two Korean mummies (Danwoong-mirra and Yoon-mirra) found in medieval tombs in the central region of the Korean peninsula were genetically investigated by analysis of mitochondrial DNA (mtDNA), Y-chromosomal short tandem repeat (Y-STR) and the ABO gene. Danwoong-mirra is a male child mummy and Yoon-mirra is a pregnant female mummy, dating back about 550 and 450 years, respectively. DNA was extracted from soft tissues or bones. mtDNA, Y-STR and the ABO gene were amplified using a small size amplicon strategy and were analyzed according to the criteria of ancient DNA analysis to ensure that authentic DNA typing results were obtained from these ancient samples. Analysis of mtDNA hypervariable region sequence and coding region single nucleotide polymorphism (SNP) information revealed that Danwoong-mirra and Yoon-mirra belong to the East Asian mtDNA haplogroups D4 and M7c, respectively. The Y-STRs were analyzed in the male child mummy (Danwoong-mirra) using the AmpFlSTR® Yfiler PCR Amplification Kit and an in-house Y-miniplex plus system, and could be characterized in 4 loci with small amplicon size. The analysis of ABO gene SNPs using multiplex single base extension methods revealed that the ABO blood types of Danwoong-mirra and Yoon-mirra are AO01 and AB, respectively. The small size amplicon strategy and the authentication process in the present study will be effectively applicable to future genetic analyses of various forensic and ancient samples.
Method for imaging informational biological molecules on a semiconductor substrate
NASA Technical Reports Server (NTRS)
Coles, L. Stephen (Inventor)
1994-01-01
Imaging biological molecules such as DNA at rates several times faster than conventional imaging techniques is carried out using a patterned silicon wafer having nano-machined grooves which hold individual molecular strands and periodically spaced unique bar codes permitting repeatably locating all images. The strands are coaxed into the grooves preferably using gravity and pulsed electric fields which induce electric charge attraction to the molecular strands in the bottom surfaces of the grooves. Differential imaging removes substrate artifacts.
Haben repetitive DNA-Sequenzen biologische Funktionen?
NASA Astrophysics Data System (ADS)
John, Maliyakal E.; Knöchel, Walter
1983-05-01
By DNA reassociation kinetics it is known that the eucaryotic genome consists of non-repetitive DNA, middle-repetitive DNA and highly repetitive DNA. Whereas the majority of protein-coding genes is located on non-repetitive DNA, repetitive DNA forms a constitutive part of eucaryotic DNA and its amount in most cases equals or even substantially exceeds that of non-repetitive DNA. During the past years a large body of data on repetitive DNA has accumulated and these have prompted speculations ranging from specific roles in the regulation of gene expression to that of a selfish entity with inconsequential functions. The following article summarizes recent findings on structural, transcriptional and evolutionary aspects and, although by no means being proven, some possible biological functions are discussed.
Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.
Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José
2015-05-01
Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. Copyright © 2015 Elsevier B.V. All rights reserved.
Buard, Jérôme; Rivals, Eric; Dunoyer de Segonzac, Denis; Garres, Charlotte; Caminade, Pierre; de Massy, Bernard; Boursot, Pierre
2014-01-01
In humans and mice, meiotic recombination events cluster into narrow hotspots whose genomic positions are defined by the PRDM9 protein via its DNA binding domain constituted of an array of zinc fingers (ZnFs). High polymorphism and rapid divergence of the Prdm9 gene ZnF domain appear to involve positive selection at DNA-recognition amino-acid positions, but the nature of the underlying evolutionary pressures remains a puzzle. Here we explore the variability of the Prdm9 ZnF array in wild mice, and uncovered a high allelic diversity of both ZnF copy number and identity with the caracterization of 113 alleles. We analyze features of the diversity of ZnF identity which is mostly due to non-synonymous changes at codons −1, 3 and 6 of each ZnF, corresponding to amino-acids involved in DNA binding. Using methods adapted to the minisatellite structure of the ZnF array, we infer a phylogenetic tree of these alleles. We find the sister species Mus spicilegus and M. macedonicus as well as the three house mouse (Mus musculus) subspecies to be polyphyletic. However some sublineages have expanded independently in Mus musculus musculus and M. m. domesticus, the latter further showing phylogeographic substructure. Compared to random genomic regions and non-coding minisatellites, none of these patterns appears exceptional. In silico prediction of DNA binding sites for each allele, overlap of their alignments to the genome and relative coverage of the different families of interspersed repeated elements suggest a large diversity between PRDM9 variants with a potential for highly divergent distributions of recombination events in the genome with little correlation to evolutionary distance. By compiling PRDM9 ZnF protein sequences in Primates, Muridae and Equids, we find different diversity patterns among the three amino-acids most critical for the DNA-recognition function, suggesting different diversification timescales. PMID:24454780
Xie, Qing; Shen, Kang-Ning; Hao, Xiuying; Nam, Phan Nhut; Ngoc Hieu, Bui Thi; Chen, Ching-Hung; Zhu, Changqing; Lin, Yen-Chang; Hsiao, Chung-Der
2017-03-01
abtract We decoded the complete chloroplast DNA (cpDNA) sequence of the Tianshan Snow Lotus (Saussurea involucrata), a famous traditional Chinese medicinal plant of the family Asteraceae, by using next-generation sequencing technology. The genome consists of 152 490 bp containing a pair of inverted repeats (IRs) of 25 202 bp, which was separated by a large single-copy region and a small single-copy region of 83 446 bp and 18 639 bp, respectively. The genic regions account for 57.7% of whole cpDNA, and the GC content of the cpDNA was 37.7%. The S. involucrata cpDNA encodes 114 unigenes (82 protein-coding genes, 4 rRNA genes, and 28 tRNA genes). There are eight protein-coding genes (atpF, ndhA, ndhB, rpl2, rpoC1, rps16, clpP, and ycf3) and five tRNA genes (trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) containing introns. A phylogenetic analysis of the 11 complete cpDNA from Asteracease showed that S. involucrata is closely related to Centaurea diffusa (Diffuse Knapweed). The complete cpDNA of S. involucrata provides essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Asteraceae.
Michel, Christian J.
2017-01-01
In 1996, a set X of 20 trinucleotides was identified in genes of both prokaryotes and eukaryotes which has on average the highest occurrence in reading frame compared to its two shifted frames. Furthermore, this set X has an interesting mathematical property as X is a maximal C3 self-complementary trinucleotide circular code. In 2015, by quantifying the inspection approach used in 1996, the circular code X was confirmed in the genes of bacteria and eukaryotes and was also identified in the genes of plasmids and viruses. The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e., of large and small lengths, with the same weight for searching the circular code X. As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code X is strengthened in the genes of bacteria, eukaryotes, plasmids, and viruses, and is now also identified in the genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code X. Finally, by studying viral genes, the circular code X was found in DNA genomes, RNA genomes, double-stranded genomes, and single-stranded genomes. PMID:28420220
Population-specific variation in haplotype composition and heterozygosity at the POLB locus.
Yamtich, Jennifer; Speed, William C; Straka, Eva; Kidd, Judith R; Sweasy, Joann B; Kidd, Kenneth K
2009-05-01
DNA polymerase beta plays a central role in base excision repair (BER), which removes large numbers of endogenous DNA lesions from each cell on a daily basis. Little is currently known about germline polymorphisms within the POLB locus, making it difficult to study the association of variants at this locus with human diseases such as cancer. Yet, approximately thirty percent of human tumor types show variants of DNA polymerase beta. We have assessed the global frequency distributions of coding and common non-coding SNPs in and flanking the POLB gene for a total of 14 sites typed in approximately 2400 individuals from anthropologically defined human populations worldwide. We have found a marked difference between haplotype frequencies in African populations and in non-African populations.
Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E
2013-08-15
Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi
2016-06-15
Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition
NASA Astrophysics Data System (ADS)
Štambuk, Nikola
The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of hornlip mullet Plicomugil labiosus (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,829 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop contains 1057 bp length is located between tRNA-Pro and tRNA-Phe. The overall base composition of P. labiosus is 28.0% for A, 29.3% for C, 15.5% for G and 27.2% for T. The complete mitogenome may provide essential and important DNA molecular data for further population, phylogenetic and evolutionary analysis for Mugilidae.
Shen, Kang-Ning; Tsai, Shiou-Yi; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
2016-11-01
In this study, the complete mitogenome sequence of largescale mullet (Teleostei: Mugilidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome, consisting of 16,832 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs genes, and a non-coding control region of D-loop. D-loop which has a length of 1094 bp is located between tRNA-Pro and tRNA-Phe. The overall base composition of largescale mullet is 27.8% for A, 30.1% for C, 16.2% for G, and 25.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Mugilidae.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, J; Park, S; Jeong, J
Purpose: In particle therapy and radiobiology, the investigation of mechanisms leading to the death of target cancer cells induced by ionising radiation is an active field of research. Recently, several studies based on Monte Carlo simulation codes have been initiated in order to simulate physical interactions of ionising particles at cellular scale and in DNA. Geant4-DNA is the one of them; it is an extension of the general purpose Geant4 Monte Carlo simulation toolkit for the simulation of physical interactions at sub-micrometre scale. In this study, we present Geant4-DNA Monte Carlo simulations for the prediction of DNA strand breakage usingmore » a geometrical modelling of DNA structure. Methods: For the simulation of DNA strand breakage, we developed a specific DNA geometrical structure. This structure consists of DNA components, such as the deoxynucleotide pairs, the DNA double helix, the nucleosomes and the chromatin fibre. Each component is made of water because the cross sections models currently available in Geant4-DNA for protons apply to liquid water only. Also, at the macroscopic-scale, protons were generated with various energies available for proton therapy at the National Cancer Center, obtained using validated proton beam simulations developed in previous studies. These multi-scale simulations were combined for the validation of Geant4-DNA in radiobiology. Results: In the double helix structure, the deposited energy in a strand allowed to determine direct DNA damage from physical interaction. In other words, the amount of dose and frequency of damage in microscopic geometries was related to direct radiobiological effect. Conclusion: In this report, we calculated the frequency of DNA strand breakage using Geant4- DNA physics processes for liquid water. This study is now on-going in order to develop geometries which use realistic DNA material, instead of liquid water. This will be tested as soon as cross sections for DNA material become available in Geant4-DNA.« less
How-To-Do-It: Recombinant DNA Made Easy: I. "Jumping Genes."
ERIC Educational Resources Information Center
Thomson, Robert G.
1988-01-01
Presents as part I of a two-part series a study involving the intercellular transfer of bacterial DNA that codes for the resistance to antibiotics. Demonstrates to students that such transfers occur. Discusses laboratory procedures, materials and results. (CW)
Kowalski, Madzia P.; Baylis, Howard A.; Krude, Torsten
2015-01-01
ABSTRACT Stem bulge RNAs (sbRNAs) are a family of small non-coding stem-loop RNAs present in Caenorhabditis elegans and other nematodes, the function of which is unknown. Here, we report the first functional characterisation of nematode sbRNAs. We demonstrate that sbRNAs from a range of nematode species are able to reconstitute the initiation of chromosomal DNA replication in the presence of replication proteins in vitro, and that conserved nucleotide sequence motifs are essential for this function. By functionally inactivating sbRNAs with antisense morpholino oligonucleotides, we show that sbRNAs are required for S phase progression, early embryonic development and the viability of C. elegans in vivo. Thus, we demonstrate a new and essential role for sbRNAs during the early development of C. elegans. sbRNAs show limited nucleotide sequence similarity to vertebrate Y RNAs, which are also essential for the initiation of DNA replication. Our results therefore establish that the essential function of small non-coding stem-loop RNAs during DNA replication extends beyond vertebrates. PMID:25908866
Synthesis of a multi-functional DNA nanosphere barcode system for direct cell detection.
Han, Sangwoo; Lee, Jae Sung; Lee, Jong Bum
2017-09-28
Nucleic acid-based technologies have been applied to numerous biomedical applications. As a novel material for target detection, DNA has been used to construct a barcode system with a range of structures. This paper reports multi-functionalized DNA nanospheres (DNANSs) by rolling circle amplification (RCA) with several functionalized nucleotides. DNANSs with a barcode system were designed to exhibit fluorescence for coding enhanced signals and contain biotin for more functionalities, including targeting through the biotin-streptavidin (biotin-STA) interaction. Functionalized deoxynucleotide triphosphates (dNTPs) were mixed in the RCA process and functional moieties can be expressed on the DNANSs. The anti-epidermal growth factor receptor antibodies (anti-EGFR Abs) can be conjugated on DNANSs for targeting cancer cells specifically. As a proof of concept, the potential of the multi-functional DNANS barcode was demonstrated by direct cell detection as a simple detection method. The DNANS barcode provides a new route for the simple and rapid selective recognition of cancer cells.
Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; D'Argenio, Valeria
2015-01-01
Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics. PMID:26665001
Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew
2018-05-17
Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer.
Goremykin, Vadim V; Salamini, Francesco; Velasco, Riccardo; Viola, Roberto
2009-01-01
The mitochondrial genome of grape (Vitis vinifera), the largest organelle genome sequenced so far, is presented. The genome is 773,279 nt long and has the highest coding capacity among known angiosperm mitochondrial DNAs (mtDNAs). The proportion of promiscuous DNA of plastid origin in the genome is also the largest ever reported for an angiosperm mtDNA, both in absolute and relative terms. In all, 42.4% of chloroplast genome of Vitis has been incorporated into its mitochondrial genome. In order to test if horizontal gene transfer (HGT) has also contributed to the gene content of the grape mtDNA, we built phylogenetic trees with the coding sequences of mitochondrial genes of grape and their homologs from plant mitochondrial genomes. Many incongruent gene tree topologies were obtained. However, the extent of incongruence between these gene trees is not significantly greater than that observed among optimal trees for chloroplast genes, the common ancestry of which has never been in doubt. In both cases, we attribute this incongruence to artifacts of tree reconstruction, insufficient numbers of characters, and gene paralogy. This finding leads us to question the recent phylogenetic interpretation of Bergthorsson et al. (2003, 2004) and Richardson and Palmer (2007) that rampant HGT into the mtDNA of Amborella best explains phylogenetic incongruence between mitochondrial gene trees for angiosperms. The only evidence for HGT into the Vitis mtDNA found involves fragments of two coding sequences stemming from two closteroviruses that cause the leaf roll disease of this plant. We also report that analysis of sequences shared by both chloroplast and mitochondrial genomes provides evidence for a previously unknown gene transfer route from the mitochondrion to the chloroplast.
Cheng, Rubin; Zheng, Xiaodong; Ma, Yuanyuan; Li, Qi
2013-01-01
In the present study, we determined the complete mitochondrial DNA (mtDNA) sequences of two species of Cistopus, namely C. chinensis and C. taiwanicus, and conducted a comparative mt genome analysis across the class Cephalopoda. The mtDNA length of C. chinensis and C. taiwanicus are 15706 and 15793 nucleotides with an AT content of 76.21% and 76.5%, respectively. The sequence identity of mtDNA between C. chinensis and C. taiwanicus was 88%, suggesting a close relationship. Compared with C. taiwanicus and other octopods, C. chinensis encoded two additional tRNA genes, showing a novel gene arrangement. In addition, an unusual 23 poly (A) signal structure is found in the ATP8 coding region of C. chinensis. The entire genome and each protein coding gene of the two Cistopus species displayed notable levels of AT and GC skews. Based on sliding window analysis among Octopodiformes, ND1 and DN5 were considered to be more reliable molecular beacons. Phylogenetic analyses based on the 13 protein-coding genes revealed that C. chinensis and C. taiwanicus form a monophyletic group with high statistical support, consistent with previous studies based on morphological characteristics. Our results also indicated that the phylogenetic position of the genus Cistopus is closer to Octopus than to Amphioctopus and Callistoctopus. The complete mtDNA sequence of C. chinensis and C. taiwanicus represent the first whole mt genomes in the genus Cistopus. These novel mtDNA data will be important in refining the phylogenetic relationships within Octopodiformes and enriching the resource of markers for systematic, population genetic and evolutionary biological studies of Cephalopoda. PMID:24358345
Posterior Predictive Bayesian Phylogenetic Model Selection
Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn
2014-01-01
We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892
Jézéquel, Laetitia; Loeper, Jacqueline; Pompon, Denis
2008-11-01
Combinatorial libraries coding for mosaic enzymes with predefined crossover points constitute useful tools to address and model structure-function relationships and for functional optimization of enzymes based on multivariate statistics. The presented method, called sequence-independent generation of a chimera-ordered library (SIGNAL), allows easy shuffling of any predefined amino acid segment between two or more proteins. This method is particularly well adapted to the exchange of protein structural modules. The procedure could also be well suited to generate ordered combinatorial libraries independent of sequence similarities in a robotized manner. Sequence segments to be recombined are first extracted by PCR from a single-stranded template coding for an enzyme of interest using a biotin-avidin-based method. This technique allows the reduction of parental template contamination in the final library. Specific PCR primers allow amplification of two complementary mosaic DNA fragments, overlapping in the region to be exchanged. Fragments are finally reassembled using a fusion PCR. The process is illustrated via the construction of a set of mosaic CYP2B enzymes using this highly modular approach.
DNA-DNA hybridization values and their relationship to whole-genome sequence similarities.
Goris, Johan; Konstantinidis, Konstantinos T; Klappenbach, Joel A; Coenye, Tom; Vandamme, Peter; Tiedje, James M
2007-01-01
DNA-DNA hybridization (DDH) values have been used by bacterial taxonomists since the 1960s to determine relatedness between strains and are still the most important criterion in the delineation of bacterial species. Since the extent of hybridization between a pair of strains is ultimately governed by their respective genomic sequences, we examined the quantitative relationship between DDH values and genome sequence-derived parameters, such as the average nucleotide identity (ANI) of common genes and the percentage of conserved DNA. A total of 124 DDH values were determined for 28 strains for which genome sequences were available. The strains belong to six important and diverse groups of bacteria for which the intra-group 16S rRNA gene sequence identity was greater than 94 %. The results revealed a close relationship between DDH values and ANI and between DNA-DNA hybridization and the percentage of conserved DNA for each pair of strains. The recommended cut-off point of 70 % DDH for species delineation corresponded to 95 % ANI and 69 % conserved DNA. When the analysis was restricted to the protein-coding portion of the genome, 70 % DDH corresponded to 85 % conserved genes for a pair of strains. These results reveal extensive gene diversity within the current concept of "species". Examination of reciprocal values indicated that the level of experimental error associated with the DDH method is too high to reveal the subtle differences in genome size among the strains sampled. It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available.
The Winding Road to Discovering the Link between Genetic Material and DNA
ERIC Educational Resources Information Center
Cherif, Abour H.; Roze, Maris; Movahedzadeh, Farahnaz
2015-01-01
This is an account of the three-centuries long journey to the discovery of the link between DNA and the transformation principle of heredity beginning with the discovery of the cell in 1665 and leading up to the 1953 discovery of the genetic code and the structure of DNA. This account also illustrates the way science works and how scientists do…
The changing epitome of species identification – DNA barcoding
Ajmal Ali, M.; Gyulai, Gábor; Hidvégi, Norbert; Kerti, Balázs; Al Hemaid, Fahad M.A.; Pandey, Arun K.; Lee, Joongku
2014-01-01
The discipline taxonomy (the science of naming and classifying organisms, the original bioinformatics and a basis for all biology) is fundamentally important in ensuring the quality of life of future human generation on the earth; yet over the past few decades, the teaching and research funding in taxonomy have declined because of its classical way of practice which lead the discipline many a times to a subject of opinion, and this ultimately gave birth to several problems and challenges, and therefore the taxonomist became an endangered race in the era of genomics. Now taxonomy suddenly became fashionable again due to revolutionary approaches in taxonomy called DNA barcoding (a novel technology to provide rapid, accurate, and automated species identifications using short orthologous DNA sequences). In DNA barcoding, complete data set can be obtained from a single specimen irrespective to morphological or life stage characters. The core idea of DNA barcoding is based on the fact that the highly conserved stretches of DNA, either coding or non coding regions, vary at very minor degree during the evolution within the species. Sequences suggested to be useful in DNA barcoding include cytoplasmic mitochondrial DNA (e.g. cox1) and chloroplast DNA (e.g. rbcL, trnL-F, matK, ndhF, and atpB rbcL), and nuclear DNA (ITS, and house keeping genes e.g. gapdh). The plant DNA barcoding is now transitioning the epitome of species identification; and thus, ultimately helping in the molecularization of taxonomy, a need of the hour. The ‘DNA barcodes’ show promise in providing a practical, standardized, species-level identification tool that can be used for biodiversity assessment, life history and ecological studies, forensic analysis, and many more. PMID:24955007
DNA copy number changes define spatial patterns of heterogeneity in colorectal cancer
Mamlouk, Soulafa; Childs, Liam Harold; Aust, Daniela; Heim, Daniel; Melching, Friederike; Oliveira, Cristiano; Wolf, Thomas; Durek, Pawel; Schumacher, Dirk; Bläker, Hendrik; von Winterfeld, Moritz; Gastl, Bastian; Möhr, Kerstin; Menne, Andrea; Zeugner, Silke; Redmer, Torben; Lenze, Dido; Tierling, Sascha; Möbs, Markus; Weichert, Wilko; Folprecht, Gunnar; Blanc, Eric; Beule, Dieter; Schäfer, Reinhold; Morkel, Markus; Klauschen, Frederick; Leser, Ulf; Sers, Christine
2017-01-01
Genetic heterogeneity between and within tumours is a major factor determining cancer progression and therapy response. Here we examined DNA sequence and DNA copy-number heterogeneity in colorectal cancer (CRC) by targeted high-depth sequencing of 100 most frequently altered genes. In 97 samples, with primary tumours and matched metastases from 27 patients, we observe inter-tumour concordance for coding mutations; in contrast, gene copy numbers are highly discordant between primary tumours and metastases as validated by fluorescent in situ hybridization. To further investigate intra-tumour heterogeneity, we dissected a single tumour into 68 spatially defined samples and sequenced them separately. We identify evenly distributed coding mutations in APC and TP53 in all tumour areas, yet highly variable gene copy numbers in numerous genes. 3D morpho-molecular reconstruction reveals two clusters with divergent copy number aberrations along the proximal–distal axis indicating that DNA copy number variations are a major source of tumour heterogeneity in CRC. PMID:28120820
Rotational dynamics of bases in the gene coding interferon alpha 17 (IFNA17).
Krasnobaeva, L A; Yakushevich, L V
2015-02-01
In the present work, rotational oscillations of nitrogenous bases in the DNA with the sequence of the gene coding interferon alpha 17 (IFNA17), are investigated. As a mathematical model simulating oscillations of the bases, we use a system of two coupled nonlinear partial differential equations that takes into account effects of dissipation, action of external fields and dependence of the equation coefficients on the sequence of bases. We apply the methods of the theory of oscillations to solve the equations in the linear approach and to construct the dispersive curves determining the dependence of the frequency of the plane waves (ω) on the wave vector (q). In the nonlinear case, the solutions in the form of kink are considered, and the main characteristics of the kink: the rest energy (E0), the rest mass (m0), the size (d) and sound velocity (C0), are calculated. With the help of the energetic method, the kink velocity (υ), the path (S), and the lifetime (τ) are also obtained.
Replication and Transcription of Eukaryotic DNA in Esherichia coli
Morrow, John F.; Cohen, Stanley N.; Chang, Annie C. Y.; Boyer, Herbert W.; Goodman, Howard M.; Helling, Robert B.
1974-01-01
Fragments of amplified Xenopus laevis DNA, coding for 18S and 28S ribosomal RNA and generated by EcoRI restriction endonuclease, have been linked in vitro to the bacterial plasmid pSC101; and the recombinant molecular species have been introduced into E. coli by transformation. These recombinant plasmids, containing both eukaryotic and prokaryotic DNA, replicate stably in E. coli. RNA isolated from E. coli minicells harboring the plasmids hybridizes to amplified X. laevis rDNA. Images PMID:4600264
Nadimi, Maryam; Daubois, Laurence; Hijri, Mohamed
2016-05-01
Mitochondrial (mt) genes, such as cytochrome C oxidase genes (cox), have been widely used for barcoding in many groups of organisms, although this approach has been less powerful in the fungal kingdom due to the rapid evolution of their mt genomes. The use of mt genes in phylogenetic studies of Dikarya has been met with success, while early diverging fungal lineages remain less studied, particularly the arbuscular mycorrhizal fungi (AMF). Advances in next-generation sequencing have substantially increased the number of publically available mtDNA sequences for the Glomeromycota. As a result, comparison of mtDNA across key AMF taxa can now be applied to assess the phylogenetic signal of individual mt coding genes, as well as concatenated subsets of coding genes. Here we show comparative analyses of publically available mt genomes of Glomeromycota, augmented with two mtDNA genomes that were newly sequenced for this study (Rhizophagus irregularis DAOM240159 and Glomus aggregatum DAOM240163), resulting in 16 complete mtDNA datasets. R. irregularis isolate DAOM240159 and G. aggregatum isolate DAOM240163 showed mt genomes measuring 72,293bp and 69,505bp with G+C contents of 37.1% and 37.3%, respectively. We assessed the phylogenies inferred from single mt genes and complete sets of coding genes, which are referred to as "supergenes" (16 concatenated coding genes), using Shimodaira-Hasegawa tests, in order to identify genes that best described AMF phylogeny. We found that rnl, nad5, cox1, and nad2 genes, as well as concatenated subset of these genes, provided phylogenies that were similar to the supergene set. This mitochondrial genomic analysis was also combined with principal coordinate and partitioning analyses, which helped to unravel certain evolutionary relationships in the Rhizophagus genus and for G. aggregatum within the Glomeromycota. We showed evidence to support the position of G. aggregatum within the R. irregularis 'species complex'. Copyright © 2016 Elsevier Inc. All rights reserved.
Fister, Karin; Fister, Iztok; Murovec, Jana; Bohanec, Borut
2017-02-01
Plant breeders' rights are undergoing dramatic changes due to changes in patent rights in terms of plant variety rights protection. Although differences in the interpretation of »breeder's exemption«, termed research exemption in the 1991 UPOV, did exist in the past in some countries, allowing breeders to use protected varieties as parents in the creation of new varieties of plants, current developments brought about by patenting conventionally bred varieties with the European Patent Office (such as EP2140023B1) have opened new challenges. Legal restrictions on germplasm availability are therefore imposed on breeders while, at the same time, no practical information on how to distinguish protected from non-protected varieties is given. We propose here a novel approach that would solve this problem by the insertion of short DNA stretches (labels) into protected plant varieties by genetic transformation. This information will then be available to breeders by a simple and standardized procedure. We propose that such a procedure should consist of using a pair of universal primers that will generate a sequence in a PCR reaction, which can be read and translated into ordinary text by a computer application. To demonstrate the feasibility of such approach, we conducted a case study. Using the Agrobacterium tumefaciens transformation protocol, we inserted a stretch of DNA code into Nicotiana benthamiana. We also developed an on-line application that enables coding of any text message into DNA nucleotide code and, on sequencing, decoding it back into text. In the presented case study, a short command line coding the phrase »Hello world« was transformed into a DNA sequence that was inserted in the plant genome. The encoded message was reconstructed from the resulting T1 seedlings with 100 % accuracy. The feasibility and possible other applications of this approach are discussed.
Epigenetics of Peripheral B-Cell Differentiation and the Antibody Response
Zan, Hong; Casali, Paolo
2015-01-01
Epigenetic modifications, such as histone post-translational modifications, DNA methylation, and alteration of gene expression by non-coding RNAs, including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs), are heritable changes that are independent from the genomic DNA sequence. These regulate gene activities and, therefore, cellular functions. Epigenetic modifications act in concert with transcription factors and play critical roles in B cell development and differentiation, thereby modulating antibody responses to foreign- and self-antigens. Upon antigen encounter by mature B cells in the periphery, alterations of these lymphocytes epigenetic landscape are induced by the same stimuli that drive the antibody response. Such alterations instruct B cells to undergo immunoglobulin (Ig) class switch DNA recombination (CSR) and somatic hypermutation (SHM), as well as differentiation to memory B cells or long-lived plasma cells for the immune memory. Inducible histone modifications, together with DNA methylation and miRNAs modulate the transcriptome, particularly the expression of activation-induced cytidine deaminase, which is essential for CSR and SHM, and factors central to plasma cell differentiation, such as B lymphocyte-induced maturation protein-1. These inducible B cell-intrinsic epigenetic marks guide the maturation of antibody responses. Combinatorial histone modifications also function as histone codes to target CSR and, possibly, SHM machinery to the Ig loci by recruiting specific adaptors that can stabilize CSR/SHM factors. In addition, lncRNAs, such as recently reported lncRNA-CSR and an lncRNA generated through transcription of the S region that form G-quadruplex structures, are also important for CSR targeting. Epigenetic dysregulation in B cells, including the aberrant expression of non-coding RNAs and alterations of histone modifications and DNA methylation, can result in aberrant antibody responses to foreign antigens, such as those on microbial pathogens, and generation of pathogenic autoantibodies, IgE in allergic reactions, as well as B cell neoplasia. Epigenetic marks would be attractive targets for new therapeutics for autoimmune and allergic diseases, and B cell malignancies. PMID:26697022
Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.
2015-01-01
Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID:25552301
Direct Introduction of Genes into Rats and Expression of the Genes
NASA Astrophysics Data System (ADS)
Benvenisty, Nissim; Reshef, Lea
1986-12-01
A method of introducing actively expressed genes into intact mammals is described. DNA precipitated with calcium phosphate has been injected intraperitoneally into newborn rats. The injected genes have been taken up and expressed by the animal tissues. To examine the generality of the method we have injected newborn rats with the chloramphenicol acetyltransferase prokaryotic gene fused with various viral and cellular gene promoters and the gene for hepatitis B surface antigen, and we observed appearance of chloramphenicol acetyltransferase activity and hepatitis B surface antigen in liver and spleen. In addition, administration of genes coding for hormones (insulin or growth hormone) resulted in their expression.
Basal jawed vertebrate phylogeny inferred from multiple nuclear DNA-coded genes
Kikugawa, Kanae; Katoh, Kazutaka; Kuraku, Shigehiro; Sakurai, Hiroshi; Ishida, Osamu; Iwabe, Naoyuki; Miyata, Takashi
2004-01-01
Background Phylogenetic analyses of jawed vertebrates based on mitochondrial sequences often result in confusing inferences which are obviously inconsistent with generally accepted trees. In particular, in a hypothesis by Rasmussen and Arnason based on mitochondrial trees, cartilaginous fishes have a terminal position in a paraphyletic cluster of bony fishes. No previous analysis based on nuclear DNA-coded genes could significantly reject the mitochondrial trees of jawed vertebrates. Results We have cloned and sequenced seven nuclear DNA-coded genes from 13 vertebrate species. These sequences, together with sequences available from databases including 13 jawed vertebrates from eight major groups (cartilaginous fishes, bichir, chondrosteans, gar, bowfin, teleost fishes, lungfishes and tetrapods) and an outgroup (a cyclostome and a lancelet), have been subjected to phylogenetic analyses based on the maximum likelihood method. Conclusion Cartilaginous fishes have been inferred to be basal to other jawed vertebrates, which is consistent with the generally accepted view. The minimum log-likelihood difference between the maximum likelihood tree and trees not supporting the basal position of cartilaginous fishes is 18.3 ± 13.1. The hypothesis by Rasmussen and Arnason has been significantly rejected with the minimum log-likelihood difference of 123 ± 23.3. Our tree has also shown that living holosteans, comprising bowfin and gar, form a monophyletic group which is the sister group to teleost fishes. This is consistent with a formerly prevalent view of vertebrate classification, although inconsistent with both of the current morphology-based and mitochondrial sequence-based trees. Furthermore, the bichir has been shown to be the basal ray-finned fish. Tetrapods and lungfish have formed a monophyletic cluster in the tree inferred from the concatenated alignment, being consistent with the currently prevalent view. It also remains possible that tetrapods are more closely related to ray-finned fishes than to lungfishes. PMID:15070407
Capture, Unfolding, and Detection of Individual tRNA Molecules Using a Nanopore Device
Smith, Andrew M.; Abu-Shumays, Robin; Akeson, Mark; Bernick, David L.
2015-01-01
Transfer RNAs (tRNA) are the most common RNA molecules in cells and have critical roles as both translators of the genetic code and regulators of protein synthesis. As such, numerous methods have focused on studying tRNA abundance and regulation, with the most widely used methods being RNA-seq and microarrays. Though revolutionary to transcriptomics, these assays are limited by an inability to encode tRNA modifications in the requisite cDNA. These modifications are abundant in tRNA and critical to their function. Here, we describe proof-of-concept experiments where individual tRNA molecules are examined as linear strands using a biological nanopore. This method utilizes an enzymatically ligated synthetic DNA adapter to concentrate tRNA at the lipid bilayer of the nanopore device and efficiently denature individual tRNA molecules, as they are pulled through the α-hemolysin (α-HL) nanopore. Additionally, the DNA adapter provides a loading site for ϕ29 DNA polymerase (ϕ29 DNAP), which acts as a brake on the translocating tRNA. This increases the dwell time of adapted tRNA in the nanopore, allowing us to identify the region of the nanopore signal that is produced by the translocating tRNA itself. Using adapter-modified Escherichia coli tRNAfMet and tRNALys, we show that the nanopore signal during controlled translocation is dependent on the identity of the tRNA. This confirms that adapter-modified tRNA can translocate end-to-end through nanopores and provide the foundation for future work in direct sequencing of individual transfer RNA with a nanopore-based device. PMID:26157798
Coyne, Robert S; Thiagarajan, Mathangi; Jones, Kristie M; Wortman, Jennifer R; Tallon, Luke J; Haas, Brian J; Cassidy-Hanley, Donna M; Wiley, Emily A; Smith, Joshua J; Collins, Kathleen; Lee, Suzanne R; Couvillion, Mary T; Liu, Yifan; Garg, Jyoti; Pearlman, Ronald E; Hamilton, Eileen P; Orias, Eduardo; Eisen, Jonathan A; Methé, Barbara A
2008-01-01
Background Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes. PMID:19036158
Arnaiz, Olivier; Mathy, Nathalie; Baudry, Céline; Malinsky, Sophie; Aury, Jean-Marc; Denby Wilkes, Cyril; Garnier, Olivier; Labadie, Karine; Lauderdale, Benjamin E; Le Mouël, Anne; Marmignon, Antoine; Nowacki, Mariusz; Poulain, Julie; Prajer, Malgorzata; Wincker, Patrick; Meyer, Eric; Duharcourt, Sandra; Duret, Laurent; Bétermier, Mireille; Sperling, Linda
2012-01-01
Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES) from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of -45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a -10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a fraction of the genome in which parasitic DNA is not usually tolerated.
Arnaiz, Olivier; Mathy, Nathalie; Baudry, Céline; Malinsky, Sophie; Aury, Jean-Marc; Denby Wilkes, Cyril; Garnier, Olivier; Labadie, Karine; Lauderdale, Benjamin E.; Le Mouël, Anne; Marmignon, Antoine; Nowacki, Mariusz; Poulain, Julie; Prajer, Malgorzata; Wincker, Patrick; Meyer, Eric; Duharcourt, Sandra; Duret, Laurent; Bétermier, Mireille; Sperling, Linda
2012-01-01
Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES) from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of ∼45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a ∼10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a fraction of the genome in which parasitic DNA is not usually tolerated. PMID:23071448
Halász, László; Karányi, Zsolt; Boros-Oláh, Beáta; Kuik-Rózsa, Tímea; Sipos, Éva; Nagy, Éva; Mosolygó-L, Ágnes; Mázló, Anett; Rajnavölgyi, Éva; Halmos, Gábor; Székvölgyi, Lóránt
2017-01-01
The impact of R-loops on the physiology and pathology of chromosomes has been demonstrated extensively by chromatin biology research. The progress in this field has been driven by technological advancement of R-loop mapping methods that largely relied on a single approach, DNA-RNA immunoprecipitation (DRIP). Most of the DRIP protocols use the experimental design that was developed by a few laboratories, without paying attention to the potential caveats that might affect the outcome of RNA-DNA hybrid mapping. To assess the accuracy and utility of this technology, we pursued an analytical approach to estimate inherent biases and errors in the DRIP protocol. By performing DRIP-sequencing, qPCR, and receiver operator characteristic (ROC) analysis, we tested the effect of formaldehyde fixation, cell lysis temperature, mode of genome fragmentation, and removal of free RNA on the efficacy of RNA-DNA hybrid detection and implemented workflows that were able to distinguish complex and weak DRIP signals in a noisy background with high confidence. We also show that some of the workflows perform poorly and generate random answers. Furthermore, we found that the most commonly used genome fragmentation method (restriction enzyme digestion) led to the overrepresentation of lengthy DRIP fragments over coding ORFs, and this bias was enhanced at the first exons. Biased genome sampling severely compromised mapping resolution and prevented the assignment of precise biological function to a significant fraction of R-loops. The revised workflow presented herein is established and optimized using objective ROC analyses and provides reproducible and highly specific RNA-DNA hybrid detection. PMID:28341774
Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars
Cai, Yizhi; Lux, Matthew W.; Adam, Laura; Peccoud, Jean
2009-01-01
Recognizing that certain biological functions can be associated with specific DNA sequences has led various fields of biology to adopt the notion of the genetic part. This concept provides a finer level of granularity than the traditional notion of the gene. However, a method of formally relating how a set of parts relates to a function has not yet emerged. Synthetic biology both demands such a formalism and provides an ideal setting for testing hypotheses about relationships between DNA sequences and phenotypes beyond the gene-centric methods used in genetics. Attribute grammars are used in computer science to translate the text of a program source code into the computational operations it represents. By associating attributes with parts, modifying the value of these attributes using rules that describe the structure of DNA sequences, and using a multi-pass compilation process, it is possible to translate DNA sequences into molecular interaction network models. These capabilities are illustrated by simple example grammars expressing how gene expression rates are dependent upon single or multiple parts. The translation process is validated by systematically generating, translating, and simulating the phenotype of all the sequences in the design space generated by a small library of genetic parts. Attribute grammars represent a flexible framework connecting parts with models of biological function. They will be instrumental for building mathematical models of libraries of genetic constructs synthesized to characterize the function of genetic parts. This formalism is also expected to provide a solid foundation for the development of computer assisted design applications for synthetic biology. PMID:19816554
The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).
Liang, Jian-Ying; Lin, Rui-Qing
2016-11-01
In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harford, N.; De Wilde, M.
1987-05-19
A recombinant DNA molecule is described comprising at least a portion coding for subunits A and B of cholera toxin, or a fragment or derivative of the portion wherein the fragment or derivative codes for a polypeptide have an activity which can induce an immune response to subunit A; can induce an immune response to subunit A and cause epithelial cell penetration and the enzymatic effect leading to net loss of fluid into the gut lumen; can bind to the membrane receptor for the B subunit of cholera toxin; can induce an immune response to subunit B; can induce anmore » immune response to subunit B and bind to the membrane receptor; or has a combination of the activities.« less
Cave, John W; Xia, Li; Caudy, Michael
2011-01-01
In Drosophila melanogaster, achaete (ac) and m8 are model basic helix-loop-helix activator (bHLH A) and repressor genes, respectively, that have the opposite cell expression pattern in proneural clusters during Notch signaling. Previous studies have shown that activation of m8 transcription in specific cells within proneural clusters by Notch signaling is programmed by a "combinatorial" and "architectural" DNA transcription code containing binding sites for the Su(H) and proneural bHLH A proteins. Here we show the novel result that the ac promoter contains a similar combinatorial code of Su(H) and bHLH A binding sites but contains a different Su(H) site architectural code that does not mediate activation during Notch signaling, thus programming a cell expression pattern opposite that of m8 in proneural clusters.
Getting it Right: How DNA Polymerases Select the Right Nucleotide.
Ludmann, Samra; Marx, Andreas
2016-01-01
All living organisms are defined by their genetic code encrypted in their DNA. DNA polymerases are the enzymes that are responsible for all DNA syntheses occurring in nature. For DNA replication, repair and recombination these enzymes have to read the parental DNA and recognize the complementary nucleotide out of a pool of four structurally similar deoxynucleotide triphosphates (dNTPs) for a given template. The selection of the nucleotide is in accordance with the Watson-Crick rule. In this process the accuracy of DNA synthesis is crucial for the maintenance of the genome stability. However, to spur evolution a certain degree of freedom must be allowed. This brief review highlights the mechanistic basis for selecting the right nucleotide by DNA polymerases.
Schmidt-Chanasit, Jonas; Bialonski, Alexandra; Heinemann, Patrick; Ulrich, Rainer G; Günther, Stephan; Rabenau, Holger F; Doerr, Hans Wilhelm
2010-07-01
Recently two different herpes simplex virus type 2 (HSV-2) clades (A and B) were described on DNA sequence data of the glycoprotein E (gE), G (gG) and I (gI) genes. To type the circulating HSV-2 wild-type strains in Germany by a novel approach and to monitor potential changes in the molecular epidemiology between 1997 and 2008. A total of 64 clinical HSV-2 isolates were analyzed by a novel approach using the DNA sequences of the complete open reading frames of glycoprotein B (gB) and gG. Recombination analysis of the gB and gG gene sequences was performed to reveal intragenic recombinants. Based on the phylogenetic analysis of the gB coding DNA sequence 8 of 64 (12%) isolates were classified as clade A strains and 56 of 64 (88%) isolates were classified as clade B strains. Analysis of the gG coding DNA sequence classified 4 (6%) isolates as clade A strains and 60 (94%) isolates as clade B strains. In comparison, the 8 isolates classified as clade A strains using the gB sequence data were classified as clade B strains when using the gG coding DNA sequence, suggesting intergenic recombination events. Intragenic recombination events were not detected. The first molecular survey of clinical HSV-2 isolates from Germany demonstrated the circulation of clade A and B strains and of intergenic recombinants over a period of 12 years. Copyright (c) 2010 Elsevier B.V. All rights reserved.
Kouvelis, Vassili N; Ghikas, Dimitri V; Typas, Milton A
2004-10-01
The mitochondrial genome (mtDNA) of the entomopathogenic fungus Lecanicillium muscarium (synonym Verticillium lecanii) with a total size of 24,499-bp has been analyzed. So far, it is the smallest known mitochondrial genome among Pezizomycotina, with an extremely compact gene organization and only one group-I intron in its large ribosomal RNA (rnl) gene. It contains the 14 typical genes coding for proteins related to oxidative phosphorylation, the two rRNA genes, one intronic ORF coding for a possible ribosomal protein (rps), and a set of 25 tRNA genes which recognize codons for all amino acids, except alanine and cysteine. All genes are transcribed from the same DNA strand. Gene order comparison with all available complete fungal mtDNAs-representatives of all four Phyla are included-revealed some characteristic common features like uninterrupted gene pairs, overlapping genes, and extremely variable intergenic regions, that can all be exploited for the study of fungal mitochondrial genomes. Moreover, a minimum common mtDNA gene order could be detected, in two units, for all known Sordariomycetes namely nad1-nad4-atp8-atp6 and rns-cox3-rnl, which can be extended in Hypocreales, to nad4L-nad5-cob-cox1-nad1-nad4-atp8-atp6 and rns-cox3-rnl nad2-nad3, respectively. Phylogenetic analysis of all fungal mtDNA essential protein-coding genes as one unit, clearly demonstrated the superiority of small genome (mtDNA) over single gene comparisons.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boore, Jeffrey L.; Medina, Monica; Rosenberg, Lewis A.
2004-01-31
We have determined the complete sequence of the mitochondrial genome of the scaphopod mollusk Graptacme eborea (Conrad, 1846) (14,492 nts) and completed the sequence of the mitochondrial genome of the bivalve mollusk Mytilus edulis Linnaeus, 1758 (16,740 nts). (The name Graptacme eborea is a revision of the species formerly known as Dentalium eboreum.) G. eborea mtDNA contains the 37 genes that are typically found and has the genes divided about evenly between the two strands, but M. edulis contains an extra trnM and is missing atp8, and has all genes on the same strand. Each has a highly rearranged genemore » order relative to each other and to all other studied mtDNAs. G. eborea mtDNA has almost no strand skew, but the coding strand of M. edulis mtDNA is very rich in G and T. This is reflected in differential codon usage patterns and even in amino acid compositions. G. eborea mtDNA has fewer non-coding nucleotides than any other mtDNA studied to date, with the largest non-coding region being only 24 nt long. Phylogenetic analysis using 2,420 aligned amino acid positions of concatenated proteins weakly supports an association of the scaphopod with gastropods to the exclusion of Bivalvia, Cephalopoda, and Polyplacophora, but is generally unable to convincingly resolve the relationships among major groups of the Lophotrochozoa, in contrast to the good resolution seen for several other major metazoan groups.« less
Kangaroo – A pattern-matching program for biological sequences
2002-01-01
Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718
Wieczorek, Aneta; Fornalewicz, Karolina; Mocarski, Łukasz; Łyżeń, Robert; Węgrzyn, Grzegorz
2018-04-15
Genetic evidence for a link between DNA replication and glycolysis has been demonstrated a decade ago in Bacillus subtilis, where temperature-sensitive mutations in genes coding for replication proteins could be suppressed by mutations in genes of glycolytic enzymes. Then, a strong influence of dysfunctions of particular enzymes from the central carbon metabolism (CCM) on DNA replication and repair in Escherichia coli was reported. Therefore, we asked if such a link occurs only in bacteria or it is a more general phenomenon. Here, we demonstrate that effects of silencing (provoked by siRNA) of expression of genes coding for proteins involved in DNA replication and repair (primase, DNA polymerase ι, ligase IV, and topoisomerase IIIβ) on these processes (less efficient entry into the S phase of the cell cycle and decreased level of DNA synthesis) could be suppressed by silencing of specific genes of enzymes from CMM. Silencing of other pairs of replication/repair and CMM genes resulted in enhancement of the negative effects of lower expression levels of replication/repair genes. We suggest that these results may be proposed as a genetic evidence for the link between DNA replication/repair and CMM in human cells, indicating that it is a common biological phenomenon, occurring from bacteria to humans. Copyright © 2018 Elsevier B.V. All rights reserved.
Evidence of birth-and-death evolution of 5S rRNA gene in Channa species (Teleostei, Perciformes).
Barman, Anindya Sundar; Singh, Mamta; Singh, Rajeev Kumar; Lal, Kuldeep Kumar
2016-12-01
In higher eukaryotes, minor rDNA family codes for 5S rRNA that is arranged in tandem arrays and comprises of a highly conserved 120 bp long coding sequence with a variable non-transcribed spacer (NTS). Initially the 5S rDNA repeats are considered to be evolved by the process of concerted evolution. But some recent reports, including teleost fishes suggested that evolution of 5S rDNA repeat does not fit into the concerted evolution model and evolution of 5S rDNA family may be explained by a birth-and-death evolution model. In order to study the mode of evolution of 5S rDNA repeats in Perciformes fish species, nucleotide sequence and molecular organization of five species of genus Channa were analyzed in the present study. Molecular analyses revealed several variants of 5S rDNA repeats (four types of NTS) and networks created by a neighbor net algorithm for each type of sequences (I, II, III and IV) did not show a clear clustering in species specific manner. The stable secondary structure is predicted and upstream and downstream conserved regulatory elements were characterized. Sequence analyses also shown the presence of two putative pseudogenes in Channa marulius. Present study supported that 5S rDNA repeats in genus Channa were evolved under the process of birth-and-death.
Hazes, Bart
2014-02-28
Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
Lectin cDNA and transgenic plants derived therefrom
Raikhel, N.V.
1994-01-04
Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties. GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon. .
Lectin cDNA and transgenic plants derived therefrom
Raikhel, Natasha V.
1994-01-04
Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties. GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon.
Blochlinger, K; Diggelmann, H
1984-12-01
The DNA coding sequence for the hygromycin B phosphotransferase gene was placed under the control of the regulatory sequences of a cloned long terminal repeat of Moloney sarcoma virus. This construction allowed direct selection for hygromycin B resistance after transfection of eucaryotic cell lines not naturally resistant to this antibiotic, thus providing another dominant marker for DNA transfer in eucaryotic cells.
Blochlinger, K; Diggelmann, H
1984-01-01
The DNA coding sequence for the hygromycin B phosphotransferase gene was placed under the control of the regulatory sequences of a cloned long terminal repeat of Moloney sarcoma virus. This construction allowed direct selection for hygromycin B resistance after transfection of eucaryotic cell lines not naturally resistant to this antibiotic, thus providing another dominant marker for DNA transfer in eucaryotic cells. Images PMID:6098829
Pathogenesis of Chagas' Disease: Parasite Persistence and Autoimmunity
Teixeira, Antonio R. L.; Hecht, Mariana M.; Guimaro, Maria C.; Sousa, Alessandro O.; Nitz, Nadjar
2011-01-01
Summary: Acute Trypanosoma cruzi infections can be asymptomatic, but chronically infected individuals can die of Chagas' disease. The transfer of the parasite mitochondrial kinetoplast DNA (kDNA) minicircle to the genome of chagasic patients can explain the pathogenesis of the disease; in cases of Chagas' disease with evident cardiomyopathy, the kDNA minicircles integrate mainly into retrotransposons at several chromosomes, but the minicircles are also detected in coding regions of genes that regulate cell growth, differentiation, and immune responses. An accurate evaluation of the role played by the genotype alterations in the autoimmune rejection of self-tissues in Chagas' disease is achieved with the cross-kingdom chicken model system, which is refractory to T. cruzi infections. The inoculation of T. cruzi into embryonated eggs prior to incubation generates parasite-free chicks, which retain the kDNA minicircle sequence mainly in the macrochromosome coding genes. Crossbreeding transfers the kDNA mutations to the chicken progeny. The kDNA-mutated chickens develop severe cardiomyopathy in adult life and die of heart failure. The phenotyping of the lesions revealed that cytotoxic CD45, CD8+ γδ, and CD8α+ T lymphocytes carry out the rejection of the chicken heart. These results suggest that the inflammatory cardiomyopathy of Chagas' disease is a genetically driven autoimmune disease. PMID:21734249
Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin
2013-01-01
DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.
Campo, Daniel; García-Vázquez, Eva
2012-01-01
The 5S rDNA is organized in the genome as tandemly repeated copies of a structural unit composed of a coding sequence plus a nontranscribed spacer (NTS). The coding region is highly conserved in the evolution, whereas the NTS vary in both length and sequence. It has been proposed that 5S rRNA genes are members of a gene family that have arisen through concerted evolution. In this study, we describe the molecular organization and evolution of the 5S rDNA in the genera Lepidorhombus and Scophthalmus (Scophthalmidae) and compared it with already known 5S rDNA of the very different genera Merluccius (Merluccidae) and Salmo (Salmoninae), to identify common structural elements or patterns for understanding 5S rDNA evolution in fish. High intra- and interspecific diversity within the 5S rDNA family in all the genera can be explained by a combination of duplications, deletions, and transposition events. Sequence blocks with high similarity in all the 5S rDNA members across species were identified for the four studied genera, with evidences of intense gene conversion within noncoding regions. We propose a model to explain the evolution of the 5S rDNA, in which the evolutionary units are blocks of nucleotides rather than the entire sequences or single nucleotides. This model implies a "two-speed" evolution: slow within blocks (homogenized by recombination) and fast within the gene family (diversified by duplications and deletions).
Taylor, Jared F.; Khattab, Omar S.; Chen, Yu-Han; Chen, Yumay; Jacobsen, Steven E.; Wang, Ping H.
2015-01-01
Deciphering the multitude of epigenomic and genomic factors that influence the mutation rate is an area of great interest in modern biology. Recently, chromatin has been shown to play a part in this process. To elucidate this relationship further, we integrated our own ultra-deep sequenced human nucleosomal DNA data set with a host of published human genomic and cancer genomic data sets. Our results revealed, that differences in nucleosome occupancy are associated with changes in base-specific mutation rates. Increasing nucleosome occupancy is associated with an increasing transition to transversion ratio and an increased germline mutation rate within the human genome. Additionally, cancer single nucleotide variants and microindels are enriched within nucleosomes and both the coding and non-coding cancer mutation rate increases with increasing nucleosome occupancy. There is an enrichment of cancer indels at the theoretical start (74 bp) and end (115 bp) of linker DNA between two nucleosomes. We then hypothesized that increasing nucleosome occupancy decreases access to DNA by DNA repair machinery and could account for the increasing mutation rate. Such a relationship should not exist in DNA repair knockouts, and we thus repeated our analysis in DNA repair machinery knockouts to test our hypothesis. Indeed, our results revealed no correlation between increasing nucleosome occupancy and increasing mutation rate in DNA repair knockouts. Our findings emphasize the linkage of the genome and epigenome through the nucleosome whose properties can affect genome evolution and genetic aberrations such as cancer. PMID:26308346
ILIR : SSC San Diego In-House Laboratory Independent Research 2001 Annual Report
2002-05-01
canine distemper virus (CDV) (a morbillivirus closely related to one infecting marine mammals) by intramuscular or intradermal inoculation with a...data.* 3. Sixt, N., A. Cardoso, A. Vallier, J. Fayolle, R. Buckland, T. F. Wild. 1998. “Canine Distemper Virus DNA Vaccination Induces Humoral and...Complementary Code Keying CCSK Cyclic Code Shift Keying CDMA Code Division Multiplexing CDV Canine Distemper Virus CFAR Constant False Alarm
2012-01-01
Background Tandemly arranged nuclear ribosomal DNA (rDNA), encoding 18S, 5.8S and 26S ribosomal RNA (rRNA), exhibit concerted evolution, a pattern thought to result from the homogenisation of rDNA arrays. However rDNA homogeneity at the single nucleotide polymorphism (SNP) level has not been detailed in organisms with more than a few hundred copies of the rDNA unit. Here we study rDNA complexity in species with arrays consisting of thousands of units. Methods We examined homogeneity of genic (18S) and non-coding internally transcribed spacer (ITS1) regions of rDNA using Roche 454 and/or Illumina platforms in four angiosperm species, Nicotiana sylvestris, N. tomentosiformis, N. otophora and N. kawakamii. We compared the data with Southern blot hybridisation revealing the structure of intergenic spacer (IGS) sequences and with the number and distribution of rDNA loci. Results and Conclusions In all four species the intragenomic homogeneity of the 18S gene was high; a single ribotype makes up over 90% of the genes. However greater variation was observed in the ITS1 region, particularly in species with two or more rDNA loci, where >55% of rDNA units were a single ribotype, with the second most abundant variant accounted for >18% of units. IGS heterogeneity was high in all species. The increased number of ribotypes in ITS1 compared with 18S sequences may reflect rounds of incomplete homogenisation with strong selection for functional genic regions and relaxed selection on ITS1 variants. The relationship between the number of ITS1 ribotypes and the number of rDNA loci leads us to propose that rDNA evolution and complexity is influenced by locus number and/or amplification of orphaned rDNA units at new chromosomal locations. PMID:23259460
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
Tang, Songsong; Gu, Yuan; Lu, Huiting; Dong, Haifeng; Zhang, Kai; Dai, Wenhao; Meng, Xiangdan; Yang, Fan; Zhang, Xueji
2018-04-03
Herein, a highly-sensitive microRNA (miRNA) detection strategy was developed by combining bio-bar-code assay (BBA) with catalytic hairpin assembly (CHA). In the proposed system, two nanoprobes of magnetic nanoparticles functionalized with DNA probes (MNPs-DNA) and gold nanoparticles with numerous barcode DNA (AuNPs-DNA) were designed. In the presence of target miRNA, the MNP-DNA and AuNP-DNA hybridized with target miRNA to form a "sandwich" structure. After "sandwich" structures were separated from the solution by the magnetic field and dehybridized by high temperature, the barcode DNA sequences were released by dissolving AuNPs. The released barcode DNA sequences triggered the toehold strand displacement assembly of two hairpin probes, leading to recycle of barcode DNA sequences and producing numerous fluorescent CHA products for miRNA detection. Under the optimal experimental conditions, the proposed two-stage amplification system could sensitively detect target miRNA ranging from 10 pM to 10 aM with a limit of detection (LOD) down to 97.9 zM. It displayed good capability to discriminate single base and three bases mismatch due to the unique sandwich structure. Notably, it presented good feasibility for selective multiplexed detection of various combinations of synthetic miRNA sequences and miRNAs extracted from different cell lysates, which were in agreement with the traditional polymerase chain reaction analysis. The two-stage amplification strategy may be significant implication in the biological detection and clinical diagnosis. Copyright © 2017 Elsevier B.V. All rights reserved.
de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas
2014-01-01
The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. PMID:24792163
Conditional sterility in plants
Meagher, Richard B.; McKinney, Elizabeth; Kim, Tehryung
2010-02-23
The present disclosure provides methods, recombinant DNA molecules, recombinant host cells containing the DNA molecules, and transgenic plant cells, plant tissue and plants which contain and express at least one antisense or interference RNA specific for a thiamine biosynthetic coding sequence or a thiamine binding protein or a thiamine-degrading protein, wherein the RNA or thiamine binding protein is expressed under the regulatory control of a transcription regulatory sequence which directs expression in male and/or female reproductive tissue. These transgenic plants are conditionally sterile; i.e., they are fertile only in the presence of exogenous thiamine. Such plants are especially appropriate for use in the seed industry or in the environment, for example, for use in revegetation of contaminated soils or phytoremediation, especially when those transgenic plants also contain and express one or more chimeric genes which confer resistance to contaminants.
Yao, Chiou-Ju; Chen, Ching-Hung; Hsiao, Chung-Der
2016-07-01
In this study, we used the next-generation sequencing method to deduce the complete mitogenome of Ginkgo-toothed beaked whale (Mesoplodon ginkgodens) for the first time. The nucleotide composition was asymmetric (33.3% A, 25.3% C, 12.6% G, and 28.7% T) with an overall GC content of 37.9%. The length of the assembled mitogenome was 16,339 bp and follows the typical vertebrate arrangement, including 13 protein coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes, and a non-coding control region of D-loop. The D-loop contains 870 bp and is located between tRNA-Pro and tRNA-Phe. The complete mitogenome of Ginkgo-toothed beaked whale deduced in this study provides essential and important DNA molecular data for further phylogenetic and evolutionary analysis for cetaceans.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pace, J.V. III; Cramer, S.N.; Knight, J.R.
1980-09-01
Calculations of the skyshine gamma-ray dose rates from three spent fuel storage pools under worst case accident conditions have been made using the discrete ordinates code DOT-IV and the Monte Carlo code MORSE and have been compared to those of two previous methods. The DNA 37N-21G group cross-section library was utilized in the calculations, together with the Claiborne-Trubey gamma-ray dose factors taken from the same library. Plots of all results are presented. It was found that the dose was a strong function of the iron thickness over the fuel assemblies, the initial angular distribution of the emitted radiation, and themore » photon source near the top of the assemblies. 16 refs., 11 figs., 7 tabs.« less
Short segment search method for phylogenetic analysis using nested sliding windows
NASA Astrophysics Data System (ADS)
Iskandar, A. A.; Bustamam, A.; Trimarsanto, H.
2017-10-01
To analyze phylogenetics in Bioinformatics, coding DNA sequences (CDS) segment is needed for maximal accuracy. However, analysis by CDS cost a lot of time and money, so a short representative segment by CDS, which is envelope protein segment or non-structural 3 (NS3) segment is necessary. After sliding window is implemented, a better short segment than envelope protein segment and NS3 is found. This paper will discuss a mathematical method to analyze sequences using nested sliding window to find a short segment which is representative for the whole genome. The result shows that our method can find a short segment which more representative about 6.57% in topological view to CDS segment than an Envelope segment or NS3 segment.
The Genetic Privacy Act and commentary
DOE Office of Scientific and Technical Information (OSTI.GOV)
Annas, G.J.; Glantz, L.H.; Roche, P.A.
1995-02-28
The Genetic Privacy Act is a proposal for federal legislation. The Act is based on the premise that genetic information is different from other types of personal information in ways that require special protection. The DNA molecule holds an extensive amount of currently indecipherable information. The major goal of the Human Genome Project is to decipher this code so that the information it contains is accessible. The privacy question is, accessible to whom? The highly personal nature of the information contained in DNA can be illustrated by thinking of DNA as containing an individual`s {open_quotes}future diary.{close_quotes} A diary is perhapsmore » the most personal and private document a person can create. It contains a person`s innermost thoughts and perceptions, and is usually hidden and locked to assure its secrecy. Diaries describe the past. The information in one`s genetic code can be thought of as a coded probabilistic future diary because it describes an important part of a unique and personal future. This document presents an introduction to the proposal for federal legislation `the Genetic Privacy Act`; a copy of the proposed act; and comment.« less
Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske
2007-02-14
The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.
Monoterpene synthases from common sage (Salvia officinalis)
Croteau, Rodney Bruce; Wise, Mitchell Lynn; Katahira, Eva Joy; Savage, Thomas Jonathan
1999-01-01
cDNAs encoding (+)-bornyl diphosphate synthase, 1,8-cineole synthase and (+)-sabinene synthase from common sage (Salvia officinalis) have been isolated and sequenced, and the corresponding amino acid sequences has been determined. Accordingly, isolated DNA sequences (SEQ ID No:1; SEQ ID No:3 and SEQ ID No:5) are provided which code for the expression of (+)-bornyl diphosphate synthase (SEQ ID No:2), 1,8-cineole synthase (SEQ ID No:4) and (+)-sabinene synthase SEQ ID No:6), respectively, from sage (Salvia officinalis). In other aspects, replicable recombinant cloning vehicles are provided which code for (+)-bornyl diphosphate synthase, 1,8-cineole synthase or (+)-sabinene synthase, or for a base sequence sufficiently complementary to at least a portion of (+)-bornyl diphosphate synthase, 1,8-cineole synthase or (+)-sabinene synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding (+)-bornyl diphosphate synthase, 1,8-cineole synthase or (+)-sabinene synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant monoterpene synthases that may be used to facilitate their production, isolation and purification in significant amounts. Recombinant (+)-bornyl diphosphate synthase, 1,8-cineole synthase and (+)-sabinene synthase may be used to obtain expression or enhanced expression of (+)-bornyl diphosphate synthase, 1,8-cineole synthase and (+)-sabinene synthase in plants in order to enhance the production of monoterpenoids, or may be otherwise employed for the regulation or expression of (+)-bornyl diphosphate synthase, 1,8-cineole synthase and (+)-sabinene synthase, or the production of their products.
A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction.
Du, Xiuquan; Cheng, Jiaxing; Zheng, Tingting; Duan, Zheng; Qian, Fulan
2014-07-18
Protein-protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp.
WordCluster: detecting clusters of DNA words and genomic elements
2011-01-01
Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981
Busti, Elena; Bordoni, Roberta; Castiglioni, Bianca; Monciardini, Paolo; Sosio, Margherita; Donadio, Stefano; Consolandi, Clarissa; Rossi Bernardi, Luigi; Battaglia, Cristina; De Bellis, Gianluca
2002-01-01
Background PCR amplification of bacterial 16S rRNA genes provides the most comprehensive and flexible means of sampling bacterial communities. Sequence analysis of these cloned fragments can provide a qualitative and quantitative insight of the microbial population under scrutiny although this approach is not suited to large-scale screenings. Other methods, such as denaturing gradient gel electrophoresis, heteroduplex or terminal restriction fragment analysis are rapid and therefore amenable to field-scale experiments. A very recent addition to these analytical tools is represented by microarray technology. Results Here we present our results using a Universal DNA Microarray approach as an analytical tool for bacterial discrimination. The proposed procedure is based on the properties of the DNA ligation reaction and requires the design of two probes specific for each target sequence. One oligo carries a fluorescent label and the other a unique sequence (cZipCode or complementary ZipCode) which identifies a ligation product. Ligated fragments, obtained in presence of a proper template (a PCR amplified fragment of the 16s rRNA gene) contain either the fluorescent label or the unique sequence and therefore are addressed to the location on the microarray where the ZipCode sequence has been spotted. Such an array is therefore "Universal" being unrelated to a specific molecular analysis. Here we present the design of probes specific for some groups of bacteria and their application to bacterial diagnostics. Conclusions The combined use of selective probes, ligation reaction and the Universal Array approach yielded an analytical procedure with a good power of discrimination among bacteria. PMID:12243651
Gomes, S L; Gober, J W; Shapiro, L
1990-01-01
Caulobacter crescentus has a single dnaK gene that is highly homologous to the hsp70 family of heat shock genes. Analysis of the cloned and sequenced dnaK gene has shown that the deduced amino acid sequence could encode a protein of 67.6 kilodaltons that is 68% identical to the DnaK protein of Escherichia coli and 49% identical to the Drosophila and human hsp70 protein family. A partial open reading frame 165 base pairs 3' to the end of dnaK encodes a peptide of 190 amino acids that is 59% identical to DnaJ of E. coli. Northern blot analysis revealed a single 4.0-kilobase mRNA homologous to the cloned fragment. Since the dnaK coding region is 1.89 kilobases, dnaK and dnaJ may be transcribed as a polycistronic message. S1 mapping and primer extension experiments showed that transcription initiated at two sites 5' to the dnaK coding sequence. A single start site of transcription was identified during heat shock at 42 degrees C, and the predicted promoter sequence conformed to the consensus heat shock promoters of E. coli. At normal growth temperature (30 degrees C), a different start site was identified 3' to the heat shock start site that conformed to the E. coli sigma 70 promoter consensus sequence. S1 protection assays and analysis of expression of the dnaK gene fused to the lux transcription reporter gene showed that expression of dnaK is temporally controlled under normal physiological conditions and that transcription occurs just before the initiation of DNA replication. Thus, in both human cells (I. K. L. Milarski and R. I. Morimoto, Proc. Natl. Acad. Sci. USA 83:9517-9521, 1986) and in a simple bacterium, the transcription of a hsp70 gene is temporally controlled as a function of the cell cycle under normal growth conditions. Images PMID:2345134
Ube2V2 Is a Rosetta Stone Bridging Redox and Ubiquitin Codes, Coordinating DNA Damage Responses.
Zhao, Yi; Long, Marcus J C; Wang, Yiran; Zhang, Sheng; Aye, Yimon
2018-02-28
Posttranslational modifications (PTMs) are the lingua franca of cellular communication. Most PTMs are enzyme-orchestrated. However, the reemergence of electrophilic drugs has ushered mining of unconventional/non-enzyme-catalyzed electrophile-signaling pathways. Despite the latest impetus toward harnessing kinetically and functionally privileged cysteines for electrophilic drug design, identifying these sensors remains challenging. Herein, we designed "G-REX"-a technique that allows controlled release of reactive electrophiles in vivo. Mitigating toxicity/off-target effects associated with uncontrolled bolus exposure, G-REX tagged first-responding innate cysteines that bind electrophiles under true k cat / K m conditions. G-REX identified two allosteric ubiquitin-conjugating proteins-Ube2V1/Ube2V2-sharing a novel privileged-sensor-cysteine. This non-enzyme-catalyzed-PTM triggered responses specific to each protein. Thus, G-REX is an unbiased method to identify novel functional cysteines. Contrasting conventional active-site/off-active-site cysteine-modifications that regulate target activity, modification of Ube2V2 allosterically hyperactivated its enzymatically active binding-partner Ube2N, promoting K63-linked client ubiquitination and stimulating H2AX-dependent DNA damage response. This work establishes Ube2V2 as a Rosetta-stone bridging redox and ubiquitin codes to guard genome integrity.
Mitochondrial DNA Genetics and the Heteroplasmy Conundrum in Evolution and Disease
Wallace, Douglas C.; Chalkia, Dimitra
2013-01-01
The unorthodox genetics of the mtDNA is providing new perspectives on the etiology of the common “complex” diseases. The maternally inherited mtDNA codes for essential energy genes, is present in thousands of copies per cell, and has a very high mutation rate. New mtDNA mutations arise among thousands of other mtDNAs. The mechanisms by which these “heteroplasmic” mtDNA mutations come to predominate in the female germline and somatic tissues is poorly understood, but essential for understanding the clinical variability of a range of diseases. Maternal inheritance and heteroplasmy also pose major challengers for the diagnosis and prevention of mtDNA disease. PMID:24186072
Simulation of the charge migration in DNA under irradiation with heavy ions.
Belov, Oleg V; Boyda, Denis L; Plante, Ianik; Shirmovsky, Sergey Eh
2015-01-01
A computer model to simulate the processes of charge injection and migration through DNA after irradiation by a heavy charged particle was developed. The most probable sites of charge injection were obtained by merging spatial models of short DNA sequence and a single 1 GeV/u iron particle track simulated by the code RITRACKS (Relativistic Ion Tracks). Charge migration was simulated by using a quantum-classical nonlinear model of the DNA-charge system. It was found that charge migration depends on the environmental conditions. The oxidative damage in DNA occurring during hole migration was simulated concurrently, which allowed the determination of probable locations of radiation-induced DNA lesions.
Sub-10 nm patterning with DNA nanostructures: a short perspective
NASA Astrophysics Data System (ADS)
Du, Ke; Park, Myeongkee; Ding, Junjun; Hu, Huan; Zhang, Zheng
2017-11-01
DNA is the hereditary material that contains our unique genetic code. Since the first demonstration of two-dimensional (2D) nanopatterns by using designed DNA origami ˜10 years ago, DNA has evolved into a novel technique for 2D and 3D nanopatterning. It is now being used as a template for the creation of sub-10 nm structures via either ‘top-down’ or ‘bottom-up’ approaches for various applications spanning from nanoelectronics, plasmonic sensing, and nanophotonics. This perspective starts with an histroric overview and discusses the current state-of-the-art in DNA nanolithography. Emphasis is put on the challenges and prospects of DNA nanolithography as the next generation nanomanufacturing technique.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leong, JoAnn Ching
The nucleotide sequence of the IHNV glycoprotein gene has been determined from a cDNA clone containing the entire coding region. The glycoprotein cDNA clone contained a leader sequence of 48 bases, a coding region of 1524 nucleotides, and 39 bases at the 3 foot end. The entire cDNA clone contains 1609 nucleodites and encodes a protein of 508 amino acids. The deduced amino acid sequence gave a translated molecular weight of 56,795 daltons. A hydropathicity profile of the deduced amino acid sequence indicated that there were two major hydrophobic domains: one,at the N-terminus,delineating a signal peptide of 18 amino acidsmore » and the other, at the C-terminus,delineating the region of the transmembrane. Five possible sites of N-linked glyscoylation were identified. Although no nucleic acid homology existed between the IHNV glycoprotein gene and the glycoprotein genes of rabies and VSV, there was significant homology at the amino acid level between all three rhabdovirus glycoproteins.« less
Altruistic functions for selfish DNA.
Faulkner, Geoffrey J; Carninci, Piero
2009-09-15
Mammalian genomes are comprised of 30-50% transposed elements (TEs). The vast majority of these TEs are truncated and mutated fragments of retrotransposons that are no longer capable of transposition. Although initially regarded as important factors in the evolution of gene regulatory networks, TEs are now commonly perceived as neutrally evolving and non-functional genomic elements. In a major development, recent works have strongly contradicted this "selfish DNA" or "junk DNA" dogma by demonstrating that TEs use a host of novel promoters to generate RNA on a massive scale across most eukaryotic cells. This transcription frequently functions to control the expression of protein-coding genes via alternative promoters, cis regulatory non protein-coding RNAs and the formation of double stranded short RNAs. If considered in sum, these findings challenge the designation of TEs as selfish and neutrally evolving genomic elements. Here, we will expand upon these themes and discuss challenges in establishing novel TE functions in vivo.
Kobayashi, Shintaro; Yoshii, Kentaro; Hirano, Minato; Muto, Memi; Kariwa, Hiroaki
2017-02-01
Reverse genetics systems facilitate investigation of many aspects of the life cycle and pathogenesis of viruses. However, genetic instability in Escherichia coli has hampered development of a reverse genetics system for West Nile virus (WNV). In this study, we developed a novel reverse genetics system for WNV based on homologous recombination in mammalian cells. Introduction of the DNA fragment coding for the WNV structural protein together with a DNA-based replicon resulted in the release of infectious WNV. The growth rate and plaque size of the recombinant virus were almost identical to those of the parent WNV. Furthermore, chimeric WNV was produced by introducing the DNA fragment coding for the structural protein and replicon plasmid derived from various strains. Here, we report development of a novel system that will facilitate research into WNV infection. Copyright © 2016 Elsevier B.V. All rights reserved.
Biomimetic Artificial Epigenetic Code for Targeted Acetylation of Histones.
Taniguchi, Junichi; Feng, Yihong; Pandian, Ganesh N; Hashiya, Fumitaka; Hidaka, Takuya; Hashiya, Kaori; Park, Soyoung; Bando, Toshikazu; Ito, Shinji; Sugiyama, Hiroshi
2018-06-13
While the central role of locus-specific acetylation of histone proteins in eukaryotic gene expression is well established, the availability of designer tools to regulate acetylation at particular nucleosome sites remains limited. Here, we develop a unique strategy to introduce acetylation by constructing a bifunctional molecule designated Bi-PIP. Bi-PIP has a P300/CBP-selective bromodomain inhibitor (Bi) as a P300/CBP recruiter and a pyrrole-imidazole polyamide (PIP) as a sequence-selective DNA binder. Biochemical assays verified that Bi-PIPs recruit P300 to the nucleosomes having their target DNA sequences and extensively accelerate acetylation. Bi-PIPs also activated transcription of genes that have corresponding cognate DNA sequences inside living cells. Our results demonstrate that Bi-PIPs could act as a synthetic programmable histone code of acetylation, which emulates the bromodomain-mediated natural propagation system of histone acetylation to activate gene expression in a sequence-selective manner.
DNA typing in forensic medicine and in criminal investigations: a current survey.
Benecke, M
1997-05-01
Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.
DNA typing in forensic medicine and in criminal investigations: a current survey
NASA Astrophysics Data System (ADS)
Benecke, Mark
Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.
Structural diversity of supercoiled DNA
Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn
2015-01-01
By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function. PMID:26455586
Structural diversity of supercoiled DNA
NASA Astrophysics Data System (ADS)
Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn
2015-10-01
By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function.
Teh, L K; Lee, W L; Amir, J; Salleh, M Z; Ismail, R
2007-06-01
P-glycoprotein (PgP) is the most extensively studied ATP-binding cassette (ABC) coded by MDR1 gene. To date, 29 single nucleotide polymorphisms (SNPs) have been identified; but only SNP C3435T has been correlated with intestinal PgP expression levels and shown to influence the absorption of orally taken drugs that are PgP substrates. Individuals homozygous for the T allele have more than fourfold lower PgP expression compared with C/C individuals. We developed a one step primer based allele specific PCR method to detect SNP at C3435T to investigate the distribution of this genotype in the local population. DNA was extracted from 5 mL of whole blood using standard salting-out method. Primers were designed specific to 3' end which amplify the variants of C3435T. The method was validated by direct DNA sequencing. Seven hundred and sixty-three healthy blood donors comprising of three major ethnic groups in Malaysia were recruited and DNA subjected to genotyping of C3435T using this method. The method was found to be robust and reproducible in detecting SNP of C3435T. Interethnic variations in genotype and allele frequency were observed in PgP among the ethnic groups. In comparison to both the Caucasians and the other Asian countries, the Malay and Chinese showed a higher frequency of allele C (50-60%); while the Indian exhibits a lower frequency (40%), similar to other Indian populations. Using a new simple method to investigate the distribution of C3435T, we found that the allele frequency of MDR1 showed variablity between the different ethnic groups within the Malaysian population.
Rhipicephalus microplus strain Deutsch, 10 BAC clone sequences
USDA-ARS?s Scientific Manuscript database
The cattle tick, Rhipicephalus (Boophilus) microplus, has a genome over 2.4 times the size of the human genome, and with over 70% of repetitive DNA, this genome would prove very costly to sequence at today's prices and difficult to assemble and analyze. We used labeled DNA probes from the coding reg...
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.
Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi
2016-03-01
Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Cipriano, Andrea; Ballarino, Monica
2018-01-01
The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
Chernicky, C L; Tan, H; Burfeind, P; Ilan, J; Ilan, J
1996-02-01
There are several cell types within the placenta that produce cytokines which can contribute to the regulatory mechanisms that ensure normal pregnancy. The immunological milieu at the maternofetal interface is considered to be crucial for survival of the fetus. Interleukin-2 (IL-2) is expressed by the syncytiotrophoblast, the cell layer between the mother and the fetus. IL-2 appears to be a key factor in maintenance of pregnancy. Therefore, it was important to determine the sequence of human placental interleukin-2. Direct sequencing of human placental IL-2 cDNA was determined for the coding region. Subclone sequencing was carried out for the 5'- and 3'-untranslated regions (5'-UTR and 3'-UTR). The 5'-UTR for human placental IL-2 cDNA is 294 bp, which is 247 nucleotides longer than that reported for cDNA IL-2 derived from T cells. The sequence of the coding region is identical to that reported for T cell IL-2, while sequence analysis of the polymerase chain reaction (PCR) product showed that the cDNA from the 3' end was the same as that reported for cDNA from T cells. Human placental IL-2 cDNA is 1,028 base pairs (excluding the poly A tail), which is 247 bp longer at the 5' end than that reported for IL-2 T cell cDNA. Therefore, the extended 5'-UTR of the placental IL-2 cDNA may be a consequence of alternative promoter utilization in the placenta.
Mechanisms of radiation-induced gene responses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woloschak, G.E.; Paunesku, T.
1996-10-01
In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
I-Ching, dyadic groups of binary numbers and the geno-logic coding in living bodies.
Hu, Zhengbing; Petoukhov, Sergey V; Petukhova, Elena S
2017-12-01
The ancient Chinese book I-Ching was written a few thousand years ago. It introduces the system of symbols Yin and Yang (equivalents of 0 and 1). It had a powerful impact on culture, medicine and science of ancient China and several other countries. From the modern standpoint, I-Ching declares the importance of dyadic groups of binary numbers for the Nature. The system of I-Ching is represented by the tables with dyadic groups of 4 bigrams, 8 trigrams and 64 hexagrams, which were declared as fundamental archetypes of the Nature. The ancient Chinese did not know about the genetic code of protein sequences of amino acids but this code is organized in accordance with the I-Ching: in particularly, the genetic code is constructed on DNA molecules using 4 nitrogenous bases, 16 doublets, and 64 triplets. The article also describes the usage of dyadic groups as a foundation of the bio-mathematical doctrine of the geno-logic code, which exists in parallel with the known genetic code of amino acids but serves for a different goal: to code the inherited algorithmic processes using the logical holography and the spectral logic of systems of genetic Boolean functions. Some relations of this doctrine with the I-Ching are discussed. In addition, the ratios of musical harmony that can be revealed in the parameters of DNA structure are also represented in the I-Ching book. Copyright © 2017 Elsevier Ltd. All rights reserved.
Computational fishing of new DNA methyltransferase inhibitors from natural products.
Maldonado-Rojas, Wilson; Olivero-Verbel, Jesus; Marrero-Ponce, Yovani
2015-07-01
DNA methyltransferase inhibitors (DNMTis) have become an alternative for cancer therapies. However, only two DNMTis have been approved as anticancer drugs, although with some restrictions. Natural products (NPs) are a promising source of drugs. In order to find NPs with novel chemotypes as DNMTis, 47 compounds with known activity against these enzymes were used to build a LDA-based QSAR model for active/inactive molecules (93% accuracy) based on molecular descriptors. This classifier was employed to identify potential DNMTis on 800 NPs from NatProd Collection. 447 selected compounds were docked on two human DNA methyltransferase (DNMT) structures (PDB codes: 3SWR and 2QRV) using AutoDock Vina and Surflex-Dock, prioritizing according to their score values, contact patterns at 4 Å and molecular diversity. Six consensus NPs were identified as virtual hits against DNMTs, including 9,10-dihydro-12-hydroxygambogic, phloridzin, 2',4'-dihydroxychalcone 4'-glucoside, daunorubicin, pyrromycin and centaurein. This method is an innovative computational strategy for identifying DNMTis, useful in the identification of potent and selective anticancer drugs. Copyright © 2015 Elsevier Inc. All rights reserved.
Epigenetic Patterns of PTSD: DNA Methylation in Serum of OIF/OEF Service Members
2009-03-01
DNA methylation patterns in cytokines in soldiers prior to OIF or OEF deployment; serum derived DNA is being used. PTSD cases with existing serum...having an outpatient record with a primary diagnosis of PTSD, based on ICD-9 codes; an appropriate control group was identified. For each PTSD case ... cases and controls and between pre- and post-deployments of each group. We will also measure levels of these specific cytokines using an ELISA
NASA Astrophysics Data System (ADS)
Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong
2015-10-01
Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity.
Zhuo, L; Reed, K M; Phillips, R B
1995-06-01
Variation in the intergenic spacer (IGS) of the ribosomal DNA (rDNA) of lake trout (Salvelinus namaycush) was examined. Digestion of genomic DNA with restriction enzymes showed that almost every individual had a unique combination of length variants with most of this variation occurring within rather than between populations. Sequence analysis of a 2.3 kilobase (kb) EcoRI-DraI fragment spanning the 3' end of the 28S coding region and approximately 1.8 kb of the IGS revealed two blocks of repetitive DNA. Putative transcriptional termination sites were found approximately 220 bases (b) downstream from the end of the 28S coding region. Comparison of the 2.3-kb fragments with two longer (3.1 kb) fragments showed that the major difference in length resulted from variation in the number of short (89 b) repeats located 3' to the putative terminator. Repeat units within a single nucleolus organizer region (NOR) appeared relatively homogeneous and genetic analysis found variants to be stably inherited. A comparison of the number of spacer-length variants with the number of NORs found that the number of length variants per individual was always less than the number of NORs. Examination of spacer variants in five populations showed that populations with more NORs had more spacer variants, indicating that variants are present at different rDNA sites on nonhomologous chromosomes.
Frimodt-Møller, Jakob; Charbon, Godefroid; Krogfelt, Karen A; Løbner-Olesen, Anders
2017-09-11
The optimal chromosomal position(s) of a given DNA element was/were determined by transposon-mediated random insertion followed by fitness selection. In bacteria, the impact of the genetic context on the function of a genetic element can be difficult to assess. Several mechanisms, including topological effects, transcriptional interference from neighboring genes, and/or replication-associated gene dosage, may affect the function of a given genetic element. Here, we describe a method that permits the random integration of a DNA element into the chromosome of Escherichia coli and select the most favorable locations using a simple growth competition experiment. The method takes advantage of a well-described transposon-based system of random insertion, coupled with a selection of the fittest clone(s) by growth advantage, a procedure that is easily adjustable to experimental needs. The nature of the fittest clone(s) can be determined by whole-genome sequencing on a complex multi-clonal population or by easy gene walking for the rapid identification of selected clones. Here, the non-coding DNA region DARS2, which controls the initiation of chromosome replication in E. coli, was used as an example. The function of DARS2 is known to be affected by replication-associated gene dosage; the closer DARS2 gets to the origin of DNA replication, the more active it becomes. DARS2 was randomly inserted into the chromosome of a DARS2-deleted strain. The resultant clones containing individual insertions were pooled and competed against one another for hundreds of generations. Finally, the fittest clones were characterized and found to contain DARS2 inserted in close proximity to the original DARS2 location.
Piecing together cis-regulatory networks: insights from epigenomics studies in plants.
Huang, Shao-Shan C; Ecker, Joseph R
2018-05-01
5-Methylcytosine, a chemical modification of DNA, is a covalent modification found in the genomes of both plants and animals. Epigenetic inheritance of phenotypes mediated by DNA methylation is well established in plants. Most of the known mechanisms of establishing, maintaining and modifying DNA methylation have been worked out in the reference plant Arabidopsis thaliana. Major functions of DNA methylation in plants include regulation of gene expression and silencing of transposable elements (TEs) and repetitive sequences, both of which have parallels in mammalian biology, involve interaction with the transcriptional machinery, and may have profound effects on the regulatory networks in the cell. Methylome and transcriptome dynamics have been investigated in development and environmental responses in Arabidopsis and agriculturally and ecologically important plants, revealing the interdependent relationship among genomic context, methylation patterns, and expression of TE and protein coding genes. Analyses of methylome variation among plant natural populations and species have begun to quantify the extent of genetic control of methylome variation vs. true epimutation, and model the evolutionary forces driving methylome evolution in both short and long time scales. The ability of DNA methylation to positively or negatively modulate binding affinity of transcription factors (TFs) provides a natural link from genome sequence and methylation changes to transcription. Technologies that allow systematic determination of methylation sensitivities of TFs, in native genomic and methylation context without confounding factors such as histone modifications, will provide baseline datasets for building cell-type- and individual-specific regulatory networks that underlie the establishment and inheritance of complex traits. This article is categorized under: Laboratory Methods and Technologies > Genetic/Genomic Methods Biological Mechanisms > Regulatory Biology. © 2017 Wiley Periodicals, Inc.
Fu, Cheng-Jie; Sheikh, Sanea; Miao, Wei; Andersson, Siv G E; Baldauf, Sandra L
2014-08-21
Discoba (Excavata) is an ancient group of eukaryotes with great morphological and ecological diversity. Unlike the other major divisions of Discoba (Jakobida and Euglenozoa), little is known about the mitochondrial DNAs (mtDNAs) of Heterolobosea. We have assembled a complete mtDNA genome from the aggregating heterolobosean amoeba, Acrasis kona, which consists of a single circular highly AT-rich (83.3%) molecule of 51.5 kb. Unexpectedly, A. kona mtDNA is missing roughly 40% of the protein-coding genes and nearly half of the transfer RNAs found in the only other sequenced heterolobosean mtDNAs, those of Naegleria spp. Instead, over a quarter of A. kona mtDNA consists of novel open reading frames. Eleven of the 16 protein-coding genes missing from A. kona mtDNA were identified in its nuclear DNA and polyA RNA, and phylogenetic analyses indicate that at least 10 of these 11 putative nuclear-encoded mitochondrial (NcMt) proteins arose by direct transfer from the mitochondrion. Acrasis kona mtDNA also employs C-to-U type RNA editing, and 12 homologs of DYW-type pentatricopeptide repeat (PPR) proteins implicated in plant organellar RNA editing are found in A. kona nuclear DNA. A mapping of mitochondrial gene content onto a consensus phylogeny reveals a sporadic pattern of relative stasis and rampant gene loss in Discoba. Rampant loss occurred independently in the unique common lineage leading to Heterolobosea + Tsukubamonadida and later in the unique lineage leading to Acrasis. Meanwhile, mtDNA gene content appears to be remarkably stable in the Acrasis sister lineage leading to Naegleria and in their distant relatives Jakobida. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
A code of ethics for evidence-based research with ancient human remains.
Kreissl Lonfat, Bettina M; Kaufmann, Ina Maria; Rühli, Frank
2015-06-01
As clinical research constantly advances and the concept of evolution becomes a strong and influential part of basic medical research, the absence of a discourse that deals with the use of ancient human remains in evidence-based research is becoming unbearable. While topics such as exhibition and excavation of human remains are established ethical fields of discourse, when faced with instrumentalization of ancient human remains for research (i.e., ancient DNA extractions for disease marker analyses) the answers from traditional ethics or even more practical fields of bio-ethics or more specific biomedical ethics are rare to non-existent. The Centre for Evolutionary Medicine at the University of Zurich solved their needs for discursive action through the writing of a self-given code of ethics which was written in dialogue with the researchers at the Institute and was published online in Sept. 2011: http://evolutionäremedizin.ch/coe/. The philosophico-ethical basis for this a code of conduct and ethics and the methods are published in this article. © 2015 Wiley Periodicals, Inc.
Zhang, Shanxin; Zhou, Zhiping; Chen, Xinmeng; Hu, Yong; Yang, Lindong
2017-08-07
DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM. Copyright © 2017 Elsevier Ltd. All rights reserved.
Gammon, B.L.; Kraft, S.A.; Michie, M.; Allyse, M.
2016-01-01
Background The recent introduction of cell-free DNA-based non-invasive prenatal screening (cfDNA screening) into clinical practice was expected to revolutionize prenatal testing. cfDNA screening for fetal aneuploidy has demonstrated higher test sensitivity and specificity for some conditions than conventional serum screening and can be conducted early in the pregnancy. However, it is not clear whether and how clinical practices are assimilating this new type of testing into their informed consent and counselling processes. Since the introduction of cfDNA screening into practice in 2011, the uptake and scope have increased dramatically. Prenatal care providers are under pressure to stay up to date with rapidly changing cfDNA screening panels, manage increasing patient demands, and keep up with changing test costs, all while attempting to use the technology responsibly and ethically. While clinical literature on cfDNA screening has shown benefits for specific patient populations, it has also identified significant misunderstandings among providers and patients alike about the power of the technology. The unique features of cfDNA screening, in comparison to established prenatal testing technologies, have implications for informed decision-making and genetic counselling that must be addressed to ensure ethical practice. Objectives This study explored the experiences of prenatal care providers at the forefront of non-invasive genetic screening in the United States to understand how this testing changes the practice of prenatal medicine. We aimed to learn how the experience of providing and offering this testing differs from established prenatal testing methodologies. These differences may necessitate changes to patient education and consent procedures to maintain ethical practice. Methods We used the online American Congress of Obstetricians and Gynecologists Physician Directory to identify a systematic sample of five prenatal care providers in each U.S. state and the District of Columbia. Beginning with the lowest zip code in each state, we took every fifth name from the directory, excluding providers who were retired, did not currently practice in the state in which they were listed, or were not involved in a prenatal specialty. After repeating this step twice and sending a total of 461 invitations, 37 providers expressed interest in participating, and we completed telephone interviews with 21 providers (4.6%). We developed a semi-structured interview guide including questions about providers’ use of and attitudes toward cfDNA screening. A single interviewer conducted and audio-recorded all interviews by telephone, and the interviews lasted approximately 30 minutes each. We collaboratively developed a codebook through an iterative process of transcript review and code application, and a primary coder coded all transcripts. Results Prenatal care providers have varying perspectives on the advantages of cfDNA screening and express a range of concerns regarding the implementation of cfDNA screening in practice. While providers agreed on several advantages of cfDNA, including increased accuracy, earlier return of results, and decreased risk of complications, many expressed concern that there is not enough time to adequately counsel and educate patients on their prenatal screening and testing options. Providers also agreed that demand for cfDNA screening has increased and expressed a desire for more information from professional societies, labs, and publications. Providers disagreed about the healthcare implications and future of cfDNA screening. Some providers anticipated that cfDNA screening would decrease healthcare costs when implemented widely and expressed optimism for expanded cfDNA screening panels. Others were concerned that cfDNA screening would increase costs over time and questioned whether the expansion to include microdeletions could be done ethically. Conclusions The perspectives and experiences of the providers in this study allow insight into the clinical benefit, burden on prenatal practice, and potential future of cfDNA screening in clinical practice. Given the likelihood that the scope and uptake of cfDNA screening will continue to increase, it is essential to consider how these changes will affect frontline prenatal care providers and, in turn, patients. Providers’ requests for additional guidance and data as well as their concerns with the lack of time available to explain screening and testing options indicate significant potential issues with patient care. It is important to ensure that the clinical integration of cfDNA screening is managed responsibly and ethically before it expands further, exacerbating pre-existing issues. As prenatal screening evolves, so should informed consent and the resources available to women making decisions. The field must take steps to maximize the advantages of cfDNA screening and responsibly manage its ethical issues. PMID:28180146
Topological events in single molecules of E. coli DNA confined in nanochannels
Reifenberger, Jeffrey G.; Dorfman, Kevin D.; Cao, Han
2015-01-01
We present experimental data concerning potential topological events such as folds, internal backfolds, and/or knots within long molecules of double-stranded DNA when they are stretched by confinement in a nanochannel. Genomic DNA from E. coli was labeled near the ‘GCTCTTC’ sequence with a fluorescently labeled dUTP analog and stained with the DNA intercalator YOYO. Individual long molecules of DNA were then linearized and imaged using methods based on the NanoChannel Array technology (Irys® System) available from BioNano Genomics. Data were collected on 189,153 molecules of length greater than 50 kilobases. A custom code was developed to search for abnormal intensity spikes in the YOYO backbone profile along the length of individual molecules. By correlating the YOYO intensity spikes with the aligned barcode pattern to the reference, we were able to correlate the bright intensity regions of YOYO with abnormal stretching in the molecule, which suggests these events were either a knot or a region of internal backfolding within the DNA. We interpret the results of our experiments involving molecules exceeding 50 kilobases in the context of existing simulation data for relatively short DNA, typically several kilobases. The frequency of these events is lower than the predictions from simulations, while the size of the events is larger than simulation predictions and often exceeds the molecular weight of the simulated molecules. We also identified DNA molecules that exhibit large, single folds as they enter the nanochannels. Overall, topological events occur at a low frequency (~7% of all molecules) and pose an easily surmountable obstacle for the practice of genome mapping in nanochannels. PMID:25991508
Early Adoption of a Multi-target Stool DNA Test for Colorectal Cancer Screening
Finney Rutten, Lila J.; Jacobson, Robert M.; Wilson, Patrick M.; Jacobson, Debra J.; Fan, Chun; Kisiel, John B.; Sweetser, Seth R.; Tulledge-Scheitel, Sidna M.; St. Sauver, Jennifer L.
2017-01-01
Objective To characterize early adoption of a novelmulti-target stool deoxyribonucleic acid (MTsDNA) screening test for colorectal cancer (CRC) and test the hypothesis that adoption differs by demographic characteristics, prior CRC screening behavior, and proceeds predictably over time. Patients and Methods We used the Rochester Epidemiology Project infrastructure to assess MTsDNA screening test use among adults aged 50–75 years, and identified 27,147 individuals eligible/due for screening colonoscopy from November 1, 2014 through November 30, 2015, and living in Olmsted County, Minnesota in2014. We used electronic Current Procedure Terminology and Health Care Common Procedure codes to evaluate early adoption of MTsDNA screening test in this population and to test whether early adoption varies by age, sex, race, and prior screening behavior. Results Overall, 2,193 (8.1%) and 974 (3.6%) of individuals were screened by colonoscopy and MT-sDNA, respectively. Age, sex, race, and prior screening were significantly and independently associated with MT-sDNA screening use compared to colonoscopy use after adjustment for all other variables. Rates of adoption of MTsDNA screening increased over time and were highest among those aged 50–54 years, females, whites, and had a prior history of screening. MT-sDNA screening use varied predictably by insurance coverage. Rates of colonoscopy decreased over time, while overall CRC screening rates remained steady. Conclusion Our results are generally consistent with predictions derived from prior research and Diffusion of Innovation framework, pointing to increasing use of the new screening test over time, and early adoption by younger patients, females, whites and those with prior CRC screening. PMID:28473037
Structural modeling and molecular simulation analysis of HvAP2/EREBP from barley.
Pandey, Bharati; Sharma, Pradeep; Tyagi, Chetna; Goyal, Sukriti; Grover, Abhinav; Sharma, Indu
2016-06-01
AP2/ERF transcription factors play a critical role in plant development and stress adaptation. This study reports the three-dimensional ab initio-based model of AP2/EREBP protein of barley and its interaction with DNA. Full-length coding sequence of HvAP2/EREBP gene isolated from two Indian barley cultivars, RD 2503 and RD 31, was used to model the protein. Of five protein models obtained, the one with lowest C-score was chosen for further analysis. The N- and C-terminal regions of HvAP2 protein were found to be highly disordered. The dynamic properties of AP2/EREBP and its interaction with DNA were investigated by molecular dynamics simulation. Analysis of trajectories from simulation yielded the equilibrated conformation between 2-10ns for protein and 7-15ns for protein-DNA complex. We established relationship between DNA having GCC box and DNA-binding domain of HvAP2/EREBP was established by modeling 11-base-pair-long nucleotide sequence and HvAP2/EREBP protein using ab initio method. Analysis of protein-DNA interaction showed that a β-sheet motif constituting amino acid residues THR105, ARG100, ARG93, and ARG83 seems to play important role in stabilizing the complex as they form strong hydrogen bond interactions with the DNA motif. Taken together, this study provides first-hand comprehensive information detailing structural conformation and interactions of HvAP2/EREBP proteins in barley. The study intensifies the role of computational approaches for preliminary examination of unknown proteins in the absence of experimental information. It also provides molecular insight into protein-DNA binding for understanding and enhancing abiotic stress resistance for improving the water use efficiency in crop plants.
Systematic cloning of an ORFeome using the Gateway system.
Matsuyama, Akihisa; Yoshida, Minoru
2009-01-01
With the completion of the genome projects, there are increasing demands on the experimental systems that enable to exploit the entire set of protein-coding open reading frames (ORFs), viz. ORFeome, en masse. Systematic proteomic studies based on cloned ORFeomes are called "reverse proteomics," and have been launched in many organisms in recent years. Cloning of an ORFeome is such an attractive way for comprehensive understanding of biological phenomena, but is a challenging and daunting task. However, recent advances in techniques for DNA cloning using site-specific recombination and for high-throughput experimental techniques have made it feasible to clone an ORFeome with the minimum of exertion. The Gateway system is one of such the approaches, employing the recombination reaction of the bacteriophage lambda. Combining traditional DNA manipulation methods with modern technique of the recombination-based cloning system, it is possible to clone an ORFeome of an organism on an individual level.
Metal binding proteins, recombinant host cells and methods
Summers, Anne O.; Caguiat, Jonathan J.
2004-06-15
The present disclosure provides artificial heavy metal binding proteins termed chelons by the inventors. These chelons bind cadmium and/or mercuric ions with relatively high affinity. Also disclosed are coding sequences, recombinant DNA molecules and recombinant host cells comprising those recombinant DNA molecules for expression of the chelon proteins. In the recombinant host cells or transgenic plants, the chelons can be used to bind heavy metals taken up from contaminated soil, groundwater or irrigation water and to concentrate and sequester those ions. Recombinant enteric bacteria can be used within the gastrointestinal tracts of animals or humans exposed to toxic metal ions such as mercury and/or cadmium, where the chelon recombinantly expressed in chosen in accordance with the ion to be rededicated. Alternatively, the chelons can be immobilized to solid supports to bind and concentrate heavy metals from a contaminated aqueous medium including biological fluids.
Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
Just, Rebecca S; Scheible, Melissa K; Fast, Spence A; Sturk-Andreaggi, Kimberly; Higginbotham, Jennifer L; Lyons, Elizabeth A; Bush, Jocelyn M; Peck, Michelle A; Ring, Joseph D; Diegoli, Toni M; Röck, Alexander W; Huber, Gabriela E; Nagl, Simone; Strobl, Christina; Zimmermann, Bettina; Parson, Walther; Irwin, Jodi A
2014-05-01
Forensic mitochondrial DNA (mtDNA) testing requires appropriate, high quality reference population data for estimating the rarity of questioned haplotypes and, in turn, the strength of the mtDNA evidence. Available reference databases (SWGDAM, EMPOP) currently include information from the mtDNA control region; however, novel methods that quickly and easily recover mtDNA coding region data are becoming increasingly available. Though these assays promise to both facilitate the acquisition of mitochondrial genome (mtGenome) data and maximize the general utility of mtDNA testing in forensics, the appropriate reference data and database tools required for their routine application in forensic casework are lacking. To address this deficiency, we have undertaken an effort to: (1) increase the large-scale availability of high-quality entire mtGenome reference population data, and (2) improve the information technology infrastructure required to access/search mtGenome data and employ them in forensic casework. Here, we describe the application of a data generation and analysis workflow to the development of more than 400 complete, forensic-quality mtGenomes from low DNA quantity blood serum specimens as part of a U.S. National Institute of Justice funded reference population databasing initiative. We discuss the minor modifications made to a published mtGenome Sanger sequencing protocol to maintain a high rate of throughput while minimizing manual reprocessing with these low template samples. The successful use of this semi-automated strategy on forensic-like samples provides practical insight into the feasibility of producing complete mtGenome data in a routine casework environment, and demonstrates that large (>2kb) mtDNA fragments can regularly be recovered from high quality but very low DNA quantity specimens. Further, the detailed empirical data we provide on the amplification success rates across a range of DNA input quantities will be useful moving forward as PCR-based strategies for mtDNA enrichment are considered for targeted next-generation sequencing workflows. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Falconi, M; Oteri, F; Eliseo, T; Cicero, D O; Desideri, A
2008-08-01
The structural dynamics of the DNA binding domains of the human papillomavirus strain 16 and the bovine papillomavirus strain 1, complexed with their DNA targets, has been investigated by modeling, molecular dynamics simulations, and nuclear magnetic resonance analysis. The simulations underline different dynamical features of the protein scaffolds and a different mechanical interaction of the two proteins with DNA. The two protein structures, although very similar, show differences in the relative mobility of secondary structure elements. Protein structural analyses, principal component analysis, and geometrical and energetic DNA analyses indicate that the two transcription factors utilize a different strategy in DNA recognition and deformation. Results show that the protein indirect DNA readout is not only addressable to the DNA molecule flexibility but it is finely tuned by the mechanical and dynamical properties of the protein scaffold involved in the interaction.
Sub-10 nm patterning with DNA nanostructures: a short perspective
Du, Ke; Park, Myeongkee; Ding, Junjun; ...
2017-09-04
DNA is the hereditary material that contains our unique genetic code. Since the first demonstration of two-dimensional (2D) nanopatterns by using designed DNA origami ~10 years ago, DNA has evolved into a novel technique for 2D and 3D nanopatterning. It is now being used as a template for the creation of sub-10 nm structures via either 'top-down' or 'bottom-up' approaches for various applications spanning from nanoelectronics, plasmonic sensing, and nanophotonics. This paper starts with an histroric overview and discusses the current state-of-the-art in DNA nanolithography. Finally, emphasis is put on the challenges and prospects of DNA nanolithography as the nextmore » generation nanomanufacturing technique.« less
Sub-10 nm patterning with DNA nanostructures: a short perspective
DOE Office of Scientific and Technical Information (OSTI.GOV)
Du, Ke; Park, Myeongkee; Ding, Junjun
DNA is the hereditary material that contains our unique genetic code. Since the first demonstration of two-dimensional (2D) nanopatterns by using designed DNA origami ~10 years ago, DNA has evolved into a novel technique for 2D and 3D nanopatterning. It is now being used as a template for the creation of sub-10 nm structures via either 'top-down' or 'bottom-up' approaches for various applications spanning from nanoelectronics, plasmonic sensing, and nanophotonics. This paper starts with an histroric overview and discusses the current state-of-the-art in DNA nanolithography. Finally, emphasis is put on the challenges and prospects of DNA nanolithography as the nextmore » generation nanomanufacturing technique.« less
Phenotypic characterization of an Arabidopsis T-DNA insertion line SALK_063500.
Sng, Natasha J; Paul, Anna-Lisa; Ferl, Robert J
2018-06-01
In this article we report the identification of a homozygous lethal T-DNA (transfer DNA) line within the coding region of the At1G05290 gene in the genome of Arabidopsis thaliana (Arabidopsis) line, SALK_063500. The T-DNA insertion is found within exon one of the AT1G05290 gene, however a homozygous T-DNA allele is unattainable. In the heterozygous T-DNA allele the expression levels of AT1G05290 were compared to wild type Arabidopsis (Col-0, Columbia). Further analyses revealed an aberrant silique phenotype found in the heterozygous SALK_063500 plants that is attributed to the reduced rate of pollen tube germination. These data are original and have not been published elsewhere.
The complete mitochondrial genome of Conus tulipa (Neogastropoda: Conidae).
Chen, Po-Wei; Hsiao, Sheng-Tai; Huang, Chih-Wei; Chen, Kao-Sung; Tseng, Chen-Te; Wu, Wen-Lung; Hwang, Deng-Fwu
2016-07-01
The complete mitogenome sequence of the cone snail Conus tulipa (Linnaeus, 1758) has been sequenced by next-generation sequencing method. The assembled mitogenome is 16,599 bp in length, including 13 protein-coding genes, 22 transfer RNA genes and 2 ribosomal RNA genes. The overall base composition of C. tulipa is 28.7% A, 15.2% C, 18.4% G and 37.7% T. It shows 81.1% identity to the cone snail C. consors, 78.5% to C. borgesi and 77.5% to C. textile. Using the 13 protein-coding genes and 2 ribosomal RNA genes of C. tulipa in this study, together with 18 other closely species, we constructed the species phylogenetic tree to verify the accuracy and utility of new determined mitogenome sequence. The complete mitogenome of the C. tulipa provides an essential and important DNA molecular data for further phylogeography and evolutionary analysis for cone snail phylogeny.
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
2016-09-01
In this study, the complete mitogenome sequence of a cryptic species from East Australia (Mugil sp. H) belonging to the worldwide Mugil cephalus species complex (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,845 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop consists of 1067 bp length, and is located between tRNA-Pro and tRNA-Phe. The overall base composition of East Australia M. cephalus is 28.4% for A, 29.3% for C, 15.4% for G and 26.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Li, Huei-Ying; Chen, Pei-Lung; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of Northwestern Pacific 2 (NWP2) cryptic species of flathead mullet, Mugil cephalus (Teleostei: Mugilidae) has been amplified by long-range PCR and sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,686 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop was 909 bp length and was located between tRNA-Pro and tRNA-Phe. The overall base composition of NWP2 M. cephalus was 28.4% for A, 29.8% for C, 26.5% for T and 15.3% for G. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Ye, Jeng-Jia; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of the cryptic "lineage B" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by next-generation sequencing method. The assembled mitogenome consisting of 16,694 bp, includes 13 protein coding genes, 25 transfer RNAs, 2 ribosomal RNAs genes. The overall base composition of "lineage B" S. lessoniana is 36.7% for A, 18.9 % for C, 34.5 % for T and 9.8 % for G and show 90% identities to "lineage C" S. lessoniana. It is also exhibits high T + A content (71.2%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage B" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.
Hsiao, Chung-Der; Shen, Kang-Ning; Ching, Tzu-Yun; Wang, Ya-Hsien; Ye, Jeng-Jia; Tsai, Shiou-Yi; Wu, Shan-Chun; Chen, Ching-Hung; Wang, Chia-Hui
2016-07-01
In this study, the complete mitogenome sequence of the cryptic "lineage A" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome consists of 16,605 bp, which includes 13 protein-coding genes, 22 transfer RNAs, and 2 ribosomal RNAs genes. The overall base composition of "lineage A" S. lessoniana is 37.5% for A, 17.4% for C, 9.1% for G, and 35.9% for T and shows 87% identities to "lineage C" S. lessoniana. It is also noticed by its high T + A content (73.4%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage A" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.
NASA Technical Reports Server (NTRS)
Reddy, A. S.; Czernik, A. J.; An, G.; Poovaiah, B. W.
1992-01-01
We cloned and sequenced a plant cDNA that encodes U1 small nuclear ribonucleoprotein (snRNP) 70K protein. The plant U1 snRNP 70K protein cDNA is not full length and lacks the coding region for 68 amino acids in the amino-terminal region as compared to human U1 snRNP 70K protein. Comparison of the deduced amino acid sequence of the plant U1 snRNP 70K protein with the amino acid sequence of animal and yeast U1 snRNP 70K protein showed a high degree of homology. The plant U1 snRNP 70K protein is more closely related to the human counter part than to the yeast 70K protein. The carboxy-terminal half is less well conserved but, like the vertebrate 70K proteins, is rich in charged amino acids. Northern analysis with the RNA isolated from different parts of the plant indicates that the snRNP 70K gene is expressed in all of the parts tested. Southern blotting of genomic DNA using the cDNA indicates that the U1 snRNP 70K protein is coded by a single gene.
Conserved Curvature of RNA Polymerase I Core Promoter Beyond rRNA Genes: The Case of the Tritryps
Smircich, Pablo; Duhagon, María Ana; Garat, Beatriz
2015-01-01
In trypanosomatids, the RNA polymerase I (RNAPI)-dependent promoters controlling the ribosomal RNA (rRNA) genes have been well identified. Although the RNAPI transcription machinery recognizes the DNA conformation instead of the DNA sequence of promoters, no conformational study has been reported for these promoters. Here we present the in silico analysis of the intrinsic DNA curvature of the rRNA gene core promoters in Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major. We found that, in spite of the absence of sequence conservation, these promoters hold conformational properties similar to other eukaryotic rRNA promoters. Our results also indicated that the intrinsic DNA curvature pattern is conserved within the Leishmania genus and also among strains of T. cruzi and T. brucei. Furthermore, we analyzed the impact of point mutations on the intrinsic curvature and their impact on the promoter activity. Furthermore, we found that the core promoters of protein-coding genes transcribed by RNAPI in T. brucei show the same conserved conformational characteristics. Overall, our results indicate that DNA intrinsic curvature of the rRNA gene core promoters is conserved in these ancient eukaryotes and such conserved curvature might be a requirement of RNAPI machinery for transcription of not only rRNA genes but also protein-coding genes. PMID:26718450
Fusion of Escherichia coli heat-stable enterotoxin and heat-labile enterotoxin B subunit.
Guzman-Verduzco, L M; Kupersztoch, Y M
1987-11-01
The 3' terminus of the DNA coding for the extracellular Escherichia coli heat-stable enterotoxin (ST) devoid of transcription and translation stop signals was fused to the 5' terminus of the DNA coding for the periplasmic B subunit of the heat-labile enterotoxin (LTB) deleted of ribosomal binding sites and leader peptide. By RNA-DNA hybridization analysis, it was shown that the fused DNA was transcribed in vivo into an RNA species in close agreement with the expected molecular weight inferred from the nucleotide sequence. The translation products of the fused DNA resulted in a hybrid molecule recognized in Western blots (immunoblots) with antibodies directed against the heat-labile moiety. Anti-LTB antibodies coupled to a solid support bound ST and LTB simultaneously when incubated with ST-LTB cellular extracts. By [35S]cysteine pulse-chase experiments, it was shown that the fused ST-LTB polypeptide was converted from a precursor with an equivalent electrophoretic mobility of 20,800 daltons to an approximately 18,500-dalton species, which accumulated within the cell. The data suggest that wild-type ST undergoes at least two processing steps during its export to the culture supernatant. Blocking the natural carboxy terminus of ST inhibited the second proteolytic step and extracellular delivery of the hybrid molecule.
de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas
2014-06-01
The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
J Genes for Heavy Chain Immunoglobulins of Mouse
NASA Astrophysics Data System (ADS)
Newell, Nanette; Richards, Julia E.; Tucker, Philip W.; Blattner, Frederick R.
1980-09-01
A 15.8-kilobase pair fragment of BALB/c mouse liver DNA, cloned in the Charon 4Aλ phage vector system, was shown to contain the μ heavy chain constant region (CHμ ) gene for the mouse immunoglobulin M. In addition, this fragment of DNA contains at least two J genes, used to code for the carboxyl terminal portion of heavy chain variable regions. These genes are located in genomic DNA about eight kilobase pairs to the 5' side of the CHμ gene. The complete nucleotide sequence of a 1120-base pair stretch of DNA that includes the two J genes has been determined.
Bhattacharya, D; Steinkötter, J; Melkonian, M
1993-12-01
Centrin (= caltractin) is a ubiquitous, cytoskeletal protein which is a member of the EF-hand superfamily of calcium-binding proteins. A centrin-coding cDNA was isolated and characterized from the prasinophyte green alga Scherffelia dubia. Centrin PCR amplification primers were used to isolate partial, homologous cDNA sequences from the green algae Tetraselmis striata and Spermatozopsis similis. Annealing analyses suggested that centrin is a single-copy-coding region in T. striata and S. similis and other green algae studied. Centrin-coding regions from S. dubia, S. similis and T. striata encode four colinear EF-hand domains which putatively bind calcium. Phylogenetic analyses, including homologous sequences from Chlamydomonas reinhardtii and the land plant Atriplex nummularia, demonstrate that the domains of centrins are congruent and arose from the two-fold duplication of an ancestral EF hand with Domains 1+3 and Domains 2+4 clustering. The domains of centrins are also congruent with those of calmodulins demonstrating that, like calmodulin, centrin is an ancient protein which arose within the ancestor of all eukaryotes via gene duplication. Phylogenetic relationships inferred from centrin-coding region comparisons mirror results of small subunit ribosomal RNA sequence analyses suggesting that centrin-coding regions are useful evolutionary markers within the green algae.
Making Ordered DNA and Protein Structures from Computer-Printed Transparency Film Cut-Outs
ERIC Educational Resources Information Center
Jittivadhna, Karnyupha; Ruenwongsa, Pintip; Panijpan, Bhinyo
2009-01-01
Instructions are given for building physical scale models of ordered structures of B-form DNA, protein [alpha]-helix, and parallel and antiparallel protein [beta]-pleated sheets made from colored computer printouts designed for transparency film sheets. Cut-outs from these sheets are easily assembled. Conventional color coding for atoms are used…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilkins, T.A.
1993-06-01
This study investigates the molecular events of vacuole ontogeny in rapidly elongated cotton plant cells. Within the DNA coding region, the cotton and carrot cDNA clones exhibit 82.2% nucleotide sequence homology; at the amino acid level cotton and carrot catalytic subunits exhibited 95.7% identity and 2.1% amino acid similarity. When aligned with the analogous sequences from yeast, the cotton protein shared only 60.5% amino acid identity and 12.7% similarity. 10 refs., 1 tab.
The mitochondrial genome of Moniliophthora roreri, the frosty pod rot pathogen of cacao.
Costa, Gustavo G L; Cabrera, Odalys G; Tiburcio, Ricardo A; Medrano, Francisco J; Carazzolle, Marcelo F; Thomazella, Daniela P T; Schuster, Stephen C; Carlson, John E; Guiltinan, Mark J; Bailey, Bryan A; Mieczkowski, Piotr; Pereira, Gonçalo A G; Meinhardt, Lyndel W
2012-05-01
In this study, we report the sequence of the mitochondrial (mt) genome of the Basidiomycete fungus Moniliophthora roreri, which is the etiologic agent of frosty pod rot of cacao (Theobroma cacao L.). We also compare it to the mtDNA from the closely-related species Moniliophthora perniciosa, which causes witches' broom disease of cacao. The 94 Kb mtDNA genome of M. roreri has a circular topology and codes for the typical 14 mt genes involved in oxidative phosphorylation. It also codes for both rRNA genes, a ribosomal protein subunit, 13 intronic open reading frames (ORFs), and a full complement of 27 tRNA genes. The conserved genes of M. roreri mtDNA are completely syntenic with homologous genes of the 109 Kb mtDNA of M. perniciosa. As in M. perniciosa, M. roreri mtDNA contains a high number of hypothetical ORFs (28), a remarkable feature that make Moniliophthoras the largest reservoir of hypothetical ORFs among sequenced fungal mtDNA. Additionally, the mt genome of M. roreri has three free invertron-like linear mt plasmids, one of which is very similar to that previously described as integrated into the main M. perniciosa mtDNA molecule. Moniliophthora roreri mtDNA also has a region of suspected plasmid origin containing 15 hypothetical ORFs distributed in both strands. One of these ORFs is similar to an ORF in the mtDNA gene encoding DNA polymerase in Pleurotus ostreatus. The comparison to M. perniciosa showed that the 15 Kb difference in mtDNA sizes is mainly attributed to a lower abundance of repetitive regions in M. roreri (5.8 Kb vs 20.7 Kb). The most notable differences between M. roreri and M. perniciosa mtDNA are attributed to repeats and regions of plasmid origin. These elements might have contributed to the rapid evolution of mtDNA. Since M. roreri is the second species of the genus Moniliophthora whose mtDNA genome has been sequenced, the data presented here contribute valuable information for understanding the evolution of fungal mt genomes among closely-related species. Crown Copyright © 2012. Published by Elsevier Ltd. All rights reserved.
Molecular dynamics study of some non-hydrogen-bonding base pair DNA strands
NASA Astrophysics Data System (ADS)
Tiwari, Rakesh K.; Ojha, Rajendra P.; Tiwari, Gargi; Pandey, Vishnudatt; Mall, Vijaysree
2018-05-01
In order to elucidate the structural activity of hydrophobic modified DNA, the DMMO2-D5SICS, base pair is introduced as a constituent in different set of 12-mer and 14-mer DNA sequences for the molecular dynamics (MD) simulation in explicit water solvent. AMBER 14 force field was employed for each set of duplex during the 200ns production-dynamics simulation in orthogonal-box-water solvent by the Particle-Mesh-Ewald (PME) method in infinite periodic boundary conditions (PBC) to determine conformational parameters of the complex. The force-field parameters of modified base-pair were calculated by Gaussian-code using Hartree-Fock /ab-initio methodology. RMSD Results reveal that the conformation of the duplex is sequence dependent and the binding energy of the complex depends on the position of the modified base-pair in the nucleic acid strand. We found that non-bonding energy had a significant contribution to stabilising such type of duplex in comparison to electrostatic energy. The distortion produced within strands by such type of base-pair was local and destabilised the duplex integrity near to substitution, moreover the binding energy of duplex depends on the position of substitution of hydrophobic base-pair and the DNA sequence and strongly supports the corresponding experimental study.
Conrad, Cheyenne C; Gilroyed, Brandon H; McAllister, Tim A; Reuter, Tim
2012-10-01
Non-O157 Shiga toxin producing Escherichia coli (STEC) are gaining recognition as human pathogens, but no standardized method exists to identify them. Sequence analysis revealed that STEC can be classified on the base of variable O antigen regions into different O serotypes. Polymerase chain reaction is a powerful technique for thorough screening and complex diagnosis for these pathogens, but requires a positive control to verify qualitative and/or quantitative DNA-fragment amplification. Due to the pathogenic nature of STEC, controls are not readily available and cell culturing of STEC reference strains requires biosafety conditions of level 2 or higher. In order to bypass this limitation, controls of stacked O-type specific DNA-fragments coding for primer recognition sites were designed to screen for nine STEC serotypes frequently associated with human infection. The synthetic controls were amplified by PCR, cloned into a plasmid vector and transferred into bacteria host cells. Plasmids amplified by bacterial expression were purified, serially diluted and tested as standards for real-time PCR using SYBR Green and TaqMan assays. Utility of synthetic DNA controls was demonstrated in conventional and real-time PCR assays and validated with DNA from natural STEC strains. Copyright © 2012 Elsevier B.V. All rights reserved.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.
Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R
1982-01-01
The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
Shapiro, James A
2016-06-08
The 21st century genomics-based analysis of evolutionary variation reveals a number of novel features impossible to predict when Dobzhansky and other evolutionary biologists formulated the neo-Darwinian Modern Synthesis in the middle of the last century. These include three distinct realms of cell evolution; symbiogenetic fusions forming eukaryotic cells with multiple genome compartments; horizontal organelle, virus and DNA transfers; functional organization of proteins as systems of interacting domains subject to rapid evolution by exon shuffling and exonization; distributed genome networks integrated by mobile repetitive regulatory signals; and regulation of multicellular development by non-coding lncRNAs containing repetitive sequence components. Rather than single gene traits, all phenotypes involve coordinated activity by multiple interacting cell molecules. Genomes contain abundant and functional repetitive components in addition to the unique coding sequences envisaged in the early days of molecular biology. Combinatorial coding, plus the biochemical abilities cells possess to rearrange DNA molecules, constitute a powerful toolbox for adaptive genome rewriting. That is, cells possess "Read-Write Genomes" they alter by numerous biochemical processes capable of rapidly restructuring cellular DNA molecules. Rather than viewing genome evolution as a series of accidental modifications, we can now study it as a complex biological process of active self-modification.
Shapiro, James A.
2016-01-01
The 21st century genomics-based analysis of evolutionary variation reveals a number of novel features impossible to predict when Dobzhansky and other evolutionary biologists formulated the neo-Darwinian Modern Synthesis in the middle of the last century. These include three distinct realms of cell evolution; symbiogenetic fusions forming eukaryotic cells with multiple genome compartments; horizontal organelle, virus and DNA transfers; functional organization of proteins as systems of interacting domains subject to rapid evolution by exon shuffling and exonization; distributed genome networks integrated by mobile repetitive regulatory signals; and regulation of multicellular development by non-coding lncRNAs containing repetitive sequence components. Rather than single gene traits, all phenotypes involve coordinated activity by multiple interacting cell molecules. Genomes contain abundant and functional repetitive components in addition to the unique coding sequences envisaged in the early days of molecular biology. Combinatorial coding, plus the biochemical abilities cells possess to rearrange DNA molecules, constitute a powerful toolbox for adaptive genome rewriting. That is, cells possess “Read–Write Genomes” they alter by numerous biochemical processes capable of rapidly restructuring cellular DNA molecules. Rather than viewing genome evolution as a series of accidental modifications, we can now study it as a complex biological process of active self-modification. PMID:27338490
Cryptic splice site in the complementary DNA of glucocerebrosidase causes inefficient expression.
Bukovac, Scott W; Bagshaw, Richard D; Rigat, Brigitte A; Callahan, John W; Clarke, Joe T R; Mahuran, Don J
2008-10-15
The low levels of human lysosomal glucocerebrosidase activity expressed in transiently transfected Chinese hamster ovary (CHO) cells were investigated. Reverse transcription PCR (RT-PCR) demonstrated that a significant portion of the transcribed RNA was misspliced owing to the presence of a cryptic splice site in the complementary DNA (cDNA). Missplicing results in the deletion of 179 bp of coding sequence and a premature stop codon. A repaired cDNA was constructed abolishing the splice site without changing the amino acid sequence. The level of glucocerebrosidase expression was increased sixfold. These data demonstrate that for maximum expression of any cDNA construct, the transcription products should be examined.
Wijetunga, N. Ari; Belbin, Thomas J.; Burk, Robert D.; Whitney, Kathleen; Abadi, Maria; Greally, John M.; Einstein, Mark H.; Schlecht, Nicolas F.
2016-01-01
Objective To conduct a comprehensive mapping of the genomic DNA methylation in CDKN2A, which codes for the p16INK4A and p14ARF proteins, and 14 of the most promising DNA methylation marker candidates previously reported to be associated with progression of low-grade cervical intraepithelial neoplasia (CIN1) to cervical cancer. Methods We analyzed DNA methylation in 68 HIV-seropositive and negative women with incident CIN1, CIN2, CIN3 and invasive cervical cancer, assaying 120 CpG dinucleotide sites spanning APC, CDH1, CDH13, CDKN2A, CDKN2B, DAPK1, FHIT, GSTP1, HIC1, MGMT, MLH1, RARB, RASSF1, TERT and TIMP3 using the Illumina Infinium array. Validation was performed using high resolution mapping of the target genes with HELP-tagging for 286 CpGs, followed by fine mapping of candidate genes with targeted bisulfite sequencing. We assessed for statistical differences in DNA methylation levels for each CpG loci assayed using univariate and multivariate methods correcting for multiple comparisons. Results In our discovery sample set, we identified dose dependent differences in DNA methylation with grade of disease in CDKN2A, APC, MGMT, MLH1 and HIC1, whereas single CpG locus differences between CIN2/3 and cancer groups were seen for CDH13, DAPK1 and TERT. Only those CpGs in the gene body of CDKN2A showed a monotonic increase in methylation between persistent CIN1, CIN2, CIN3 and cancers. Conclusion Our data suggests a novel link between early cervical disease progression and DNA methylation in a region downstream of the CDKN2A transcription start site that may lead to increased p16INK4A/p14ARF expression prior to development of malignant disease. PMID:27401842
QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically
Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel
2015-01-01
Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082
Carneiro, Marcia W.; Miranda, José Carlos; Clarêncio, Jorge; Barral-Netto, Manoel; Brodskyn, Cláudia; Barral, Aldina; Ribeiro, José M. C.; Valenzuela, Jesus G.; de Oliveira, Camila I.
2013-01-01
Background Leishmania parasites are transmitted in the presence of sand fly saliva. Together with the parasite, the sand fly injects salivary components that change the environment at the feeding site. Mice immunized with Phlebotomus papatasi salivary gland (SG) homogenate are protected against Leishmania major infection, while immunity to Lutzomyia intermedia SG homogenate exacerbated experimental Leishmania braziliensis infection. In humans, antibodies to Lu. intermedia saliva are associated with risk of acquiring L. braziliensis infection. Despite these important findings, there is no information regarding the repertoire of Lu. intermedia salivary proteins. Methods and Findings A cDNA library from the Salivary Glands (SGs) of wild-caught Lu. intermedia was constructed, sequenced, and complemented by a proteomic approach based on 1D SDS PAGE and mass/mass spectrometry to validate the transcripts present in this cDNA library. We identified the most abundant transcripts and proteins reported in other sand fly species as well as novel proteins such as neurotoxin-like proteins, peptides with ML domain, and three small peptides found so far only in this sand fly species. DNA plasmids coding for ten selected transcripts were constructed and used to immunize BALB/c mice to study their immunogenicity. Plasmid Linb-11—coding for a 4.5-kDa protein—induced a cellular immune response and conferred protection against L. braziliensis infection. This protection correlated with a decreased parasite load and an increased frequency of IFN-γ-producing cells. Conclusions We identified the most abundant and novel proteins present in the SGs of Lu. intermedia, a vector of cutaneous leishmaniasis in the Americas. We also show for the first time that immunity to a single salivary protein from Lu. intermedia can protect against cutaneous leishmaniasis caused by L. braziliensis. PMID:23717705