Single-Molecule Electrical Random Resequencing of DNA and RNA
NASA Astrophysics Data System (ADS)
Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji
2012-07-01
Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.
Shinozuka, Hiroshi; Cogan, Noel O I; Shinozuka, Maiko; Marshall, Alexis; Kay, Pippa; Lin, Yi-Han; Spangenberg, German C; Forster, John W
2015-04-11
Fragmentation at random nucleotide locations is an essential process for preparation of DNA libraries to be used on massively parallel short-read DNA sequencing platforms. Although instruments for physical shearing, such as the Covaris S2 focused-ultrasonicator system, and products for enzymatic shearing, such as the Nextera technology and NEBNext dsDNA Fragmentase kit, are commercially available, a simple and inexpensive method is desirable for high-throughput sequencing library preparation. MspJI is a recently characterised restriction enzyme which recognises the sequence motif CNNR (where R = G or A) when the first base is modified to 5-methylcytosine or 5-hydroxymethylcytosine. A semi-random enzymatic DNA amplicon fragmentation method was developed based on the unique cleavage properties of MspJI. In this method, random incorporation of 5-methyl-2'-deoxycytidine-5'-triphosphate is achieved through DNA amplification with DNA polymerase, followed by DNA digestion with MspJI. Due to the recognition sequence of the enzyme, DNA amplicons are fragmented in a relatively sequence-independent manner. The size range of the resulting fragments was capable of control through optimisation of 5-methyl-2'-deoxycytidine-5'-triphosphate concentration in the reaction mixture. A library suitable for sequencing using the Illumina MiSeq platform was prepared and processed using the proposed method. Alignment of generated short reads to a reference sequence demonstrated a relatively high level of random fragmentation. The proposed method may be performed with standard laboratory equipment. Although the uniformity of coverage was slightly inferior to the Covaris physical shearing procedure, due to efficiencies of cost and labour, the method may be more suitable than existing approaches for implementation in large-scale sequencing activities, such as bacterial artificial chromosome (BAC)-based genome sequence assembly, pan-genomic studies and locus-targeted genotyping-by-sequencing.
Sequence and Structure Dependent DNA-DNA Interactions
NASA Astrophysics Data System (ADS)
Kopchick, Benjamin; Qiu, Xiangyun
Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.
Simulations Using Random-Generated DNA and RNA Sequences
ERIC Educational Resources Information Center
Bryce, C. F. A.
1977-01-01
Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.
Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab
2012-01-01
RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.
Competition between B-Z and B-L transitions in a single DNA molecule: Computational studies
NASA Astrophysics Data System (ADS)
Kwon, Ah-Young; Nam, Gi-Moon; Johner, Albert; Kim, Seyong; Hong, Seok-Cheol; Lee, Nam-Kyung
2016-02-01
Under negative torsion, DNA adopts left-handed helical forms, such as Z-DNA and L-DNA. Using the random copolymer model developed for a wormlike chain, we represent a single DNA molecule with structural heterogeneity as a helical chain consisting of monomers which can be characterized by different helical senses and pitches. By Monte Carlo simulation, where we take into account bending and twist fluctuations explicitly, we study sequence dependence of B-Z transitions under torsional stress and tension focusing on the interaction with B-L transitions. We consider core sequences, (GC) n repeats or (TG) n repeats, which can interconvert between the right-handed B form and the left-handed Z form, imbedded in a random sequence, which can convert to left-handed L form with different (tension dependent) helical pitch. We show that Z-DNA formation from the (GC) n sequence is always supported by unwinding torsional stress but Z-DNA formation from the (TG) n sequence, which are more costly to convert but numerous, can be strongly influenced by the quenched disorder in the surrounding random sequence.
Systematic Evaluation of the Dependence of Deoxyribozyme Catalysis on Random Region Length
Velez, Tania E.; Singh, Jaydeep; Xiao, Ying; Allen, Emily C.; Wong, On Yi; Chandra, Madhavaiah; Kwon, Sarah C.; Silverman, Scott K.
2012-01-01
Functional nucleic acids are DNA and RNA aptamers that bind targets, or they are deoxyribozymes and ribozymes that have catalytic activity. These functional DNA and RNA sequences can be identified from random-sequence pools by in vitro selection, which requires choosing the length of the random region. Shorter random regions allow more complete coverage of sequence space but may not permit the structural complexity necessary for binding or catalysis. In contrast, longer random regions are sampled incompletely but may allow adoption of more complicated structures that enable function. In this study, we systematically examined random region length (N20 through N60) for two particular deoxyribozyme catalytic activities, DNA cleavage and tyrosine-RNA nucleopeptide linkage formation. For both activities, we previously identified deoxyribozymes using only N40 regions. In the case of DNA cleavage, here we found that shorter N20 and N30 regions allowed robust catalytic function, either by DNA hydrolysis or by DNA deglycosylation and strand scission via β-elimination, whereas longer N50 and N60 regions did not lead to catalytically active DNA sequences. Follow-up selections with N20, N30, and N40 regions revealed an interesting interplay of metal ion cofactors and random region length. Separately, for Tyr-RNA linkage formation, N30 and N60 regions provided catalytically active sequences, whereas N20 was unsuccessful, and the N40 deoxyribozymes were functionally superior (in terms of rate and yield) to N30 and N60. Collectively, the results indicate that with future in vitro selection experiments for DNA and RNA catalysts, and by extension for aptamers, random region length should be an important experimental variable. PMID:23088677
Long-range correlations and charge transport properties of DNA sequences
NASA Astrophysics Data System (ADS)
Liu, Xiao-liang; Ren, Yi; Xie, Qiong-tao; Deng, Chao-sheng; Xu, Hui
2010-04-01
By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that λ-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5
Rényi continuous entropy of DNA sequences.
Vinga, Susana; Almeida, Jonas S
2004-12-07
Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon's formalism is proposed. Renyi's quadratic entropy calculated with Parzen window density estimation method applied to CGR/USM continuous maps of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behaviour of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. The results obtained were compared with the distributions of the null model of randomness obtained by simulation. The biological sequences have shown a different p-value according to the kernel resolution of Parzen's method, which might indicate an unknown level of organization of their patterns. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation. The main MATLAB applications developed and additional material are available at the webpage . Specialized functions can be obtained from the authors.
Hiding message into DNA sequence through DNA coding and chaotic maps.
Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman
2014-09-01
The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.
DNA-based random number generation in security circuitry.
Gearheart, Christy M; Arazi, Benjamin; Rouchka, Eric C
2010-06-01
DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. This research focuses on further developing DNA-based methodologies to mimic digital data manipulation. While exhibiting fundamental principles, this work was done in conjunction with the vision that DNA-based circuitry, when the technology matures, will form the basis for a tamper-proof security module, revolutionizing the meaning and concept of tamper-proofing and possibly preventing it altogether based on accurate scientific observations. A paramount part of such a solution would be self-generation of random numbers. A novel prototype schema employs solid phase synthesis of oligonucleotides for random construction of DNA sequences; temporary storage and retrieval is achieved through plasmid vectors. A discussion of how to evaluate sequence randomness is included, as well as how these techniques are applied to a simulation of the random number generation circuitry. Simulation results show generated sequences successfully pass three selected NIST random number generation tests specified for security applications.
Portable and Error-Free DNA-Based Data Storage.
Yazdi, S M Hossein Tabatabaei; Gabrys, Ryan; Milenkovic, Olgica
2017-07-10
DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis
Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab
2012-01-01
RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611
DNA/RNA hybrid substrates modulate the catalytic activity of purified AID.
Abdouni, Hala S; King, Justin J; Ghorbani, Atefeh; Fifield, Heather; Berghuis, Lesley; Larijani, Mani
2018-01-01
Activation-induced cytidine deaminase (AID) converts cytidine to uridine at Immunoglobulin (Ig) loci, initiating somatic hypermutation and class switching of antibodies. In vitro, AID acts on single stranded DNA (ssDNA), but neither double-stranded DNA (dsDNA) oligonucleotides nor RNA, and it is believed that transcription is the in vivo generator of ssDNA targeted by AID. It is also known that the Ig loci, particularly the switch (S) regions targeted by AID are rich in transcription-generated DNA/RNA hybrids. Here, we examined the binding and catalytic behavior of purified AID on DNA/RNA hybrid substrates bearing either random sequences or GC-rich sequences simulating Ig S regions. If substrates were made up of a random sequence, AID preferred substrates composed entirely of DNA over DNA/RNA hybrids. In contrast, if substrates were composed of S region sequences, AID preferred to mutate DNA/RNA hybrids over substrates composed entirely of DNA. Accordingly, AID exhibited a significantly higher affinity for binding DNA/RNA hybrid substrates composed specifically of S region sequences, than any other substrates composed of DNA. Thus, in the absence of any other cellular processes or factors, AID itself favors binding and mutating DNA/RNA hybrids composed of S region sequences. AID:DNA/RNA complex formation and supporting mutational analyses suggest that recognition of DNA/RNA hybrids is an inherent structural property of AID. Copyright © 2017 Elsevier Ltd. All rights reserved.
Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso
2015-07-01
In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.
Recombination of polynucleotide sequences using random or defined primers
Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin H; Giver, Lorraine J.
2000-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Recombination of polynucleotide sequences using random or defined primers
Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin; Giver, Lorraine J.
2001-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.
Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred
2018-01-01
The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.
Distribution and sequence homogeneity of an abundant satellite DNA in the beetle, Tenebrio molitor.
Davis, C A; Wyatt, G R
1989-01-01
The mealworm beetle, Tenebrio molitor, contains an unusually abundant and homogeneous satellite DNA which constitutes up to 60% of its genome. The satellite DNA is shown to be present in all of the chromosomes by in situ hybridization. 18 dimers of the repeat unit were cloned and sequenced. The consensus sequence is 142 nt long and lacks any internal repeat structure. Monomers of the sequence are very similar, showing on average a 2% divergence from the calculated consensus. Variant nucleotides are scattered randomly throughout the sequence although some variants are more common than others. Neighboring repeat units are no more alike than randomly chosen ones. The results suggest that some mechanism, perhaps gene conversion, is acting to maintain the homogeneity of the satellite DNA despite its abundance and distribution on all of the chromosomes. Images PMID:2762148
Theory on the mechanism of site-specific DNA-protein interactions in the presence of traps
NASA Astrophysics Data System (ADS)
Niranjani, G.; Murugan, R.
2016-08-01
The speed of site-specific binding of transcription factor (TFs) proteins with genomic DNA seems to be strongly retarded by the randomly occurring sequence traps. Traps are those DNA sequences sharing significant similarity with the original specific binding sites (SBSs). It is an intriguing question how the naturally occurring TFs and their SBSs are designed to manage the retarding effects of such randomly occurring traps. We develop a simple random walk model on the site-specific binding of TFs with genomic DNA in the presence of sequence traps. Our dynamical model predicts that (a) the retarding effects of traps will be minimum when the traps are arranged around the SBS such that there is a negative correlation between the binding strength of TFs with traps and the distance of traps from the SBS and (b) the retarding effects of sequence traps can be appeased by the condensed conformational state of DNA. Our computational analysis results on the distribution of sequence traps around the putative binding sites of various TFs in mouse and human genome clearly agree well the theoretical predictions. We propose that the distribution of traps can be used as an additional metric to efficiently identify the SBSs of TFs on genomic DNA.
Transposon facilitated DNA sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berg, D.E.; Berg, C.M.; Huang, H.V.
1990-01-01
The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses,more » and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.« less
NASA Astrophysics Data System (ADS)
Tsao, Shih-Ming; Lai, Ji-Ching; Horng, Horng-Er; Liu, Tu-Chen; Hong, Chin-Yih
2017-04-01
Aptamers are oligonucleotides that can bind to specific target molecules. Most aptamers are generated using random libraries in the standard systematic evolution of ligands by exponential enrichment (SELEX). Each random library contains oligonucleotides with a randomized central region and two fixed primer regions at both ends. The fixed primer regions are necessary for amplifying target-bound sequences by PCR. However, these extra-sequences may cause non-specific bindings, which potentially interfere with good binding for random sequences. The Magnetic-Assisted Rapid Aptamer Selection (MARAS) is a newly developed protocol for generating single-strand DNA aptamers. No repeat selection cycle is required in the protocol. This study proposes and demonstrates a method to isolate aptamers for C-reactive proteins (CRP) from a randomized ssDNA library containing no fixed sequences at 5‧ and 3‧ termini using the MARAS platform. Furthermore, the isolated primer-free aptamer was sequenced and binding affinity for CRP was analyzed. The specificity of the obtained aptamer was validated using blind serum samples. The result was consistent with monoclonal antibody-based nephelometry analysis, which indicated that a primer-free aptamer has high specificity toward targets. MARAS is a feasible platform for efficiently generating primer-free aptamers for clinical diagnoses.
Hoshino, Tatsuhiko; Inagaki, Fumio
2017-01-01
Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and relative abundance based on a standard sequence library. We demonstrated that the qSeq protocol proposed here is advantageous for providing less-biased absolute copy numbers of each target DNA with NGS sequencing at one time. By this new experiment scheme in microbial ecology, microbial community compositions can be explored in more quantitative manner, thus expanding our knowledge of microbial ecosystems in natural environments.
Random access in large-scale DNA data storage.
Organick, Lee; Ang, Siena Dumas; Chen, Yuan-Jyue; Lopez, Randolph; Yekhanin, Sergey; Makarychev, Konstantin; Racz, Miklos Z; Kamath, Govinda; Gopalan, Parikshit; Nguyen, Bichlien; Takahashi, Christopher N; Newman, Sharon; Parker, Hsing-Yeh; Rashtchian, Cyrus; Stewart, Kendall; Gupta, Gagan; Carlson, Robert; Mulligan, John; Carmean, Douglas; Seelig, Georg; Ceze, Luis; Strauss, Karin
2018-03-01
Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. Here, we encode and store 35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides, and show that we can recover each file individually and with no errors, using a random access approach. We design and validate a large library of primers that enable individual recovery of all files stored within the DNA. We also develop an algorithm that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. These advances demonstrate a viable, large-scale system for DNA data storage and retrieval.
Estimating Genomic Distance from DNA Sequence Location in Cell Nuclei by a Random Walk Model
NASA Astrophysics Data System (ADS)
van den Engh, Ger; Sachs, Rainer; Trask, Barbara J.
1992-09-01
The folding of chromatin in interphase cell nuclei was studied by fluorescent in situ hybridization with pairs of unique DNA sequence probes. The sites of DNA sequences separated by 100 to 2000 kilobase pairs (kbp) are distributed in interphase chromatin according to a random walk model. This model provides the basis for calculating the spacing of sequences along the linear DNA molecule from interphase distance measurements. An interphase mapping strategy based on this model was tested with 13 probes from a 4-megabase pair (Mbp) region of chromosome 4 containing the Huntington disease locus. The results confirmed the locations of the probes and showed that the remaining gap in the published maps of this region is negligible in size. Interphase distance measurements should facilitate construction of chromosome maps with an average marker density of one per 100 kbp, approximately ten times greater than that achieved by hybridization to metaphase chromosomes.
Xian, Zhi-Hong; Cong, Wen-Ming; Zhang, Shu-Hui; Wu, Meng-Chao
2005-01-01
AIM: To study the genetic alterations and their association with clinicopathological characteristics of hepatocellular carcinoma (HCC), and to find the tumor related DNA fragments. METHODS: DNA isolated from tumors and corresponding noncancerous liver tissues of 56 HCC patients was amplified by random amplified polymorphic DNA (RAPD) with 10 random 10-mer arbitrary primers. The RAPD bands showing obvious differences in tumor tissue DNA corresponding to that of normal tissue were separated, purified, cloned and sequenced. DNA sequences were analyzed and compared with GenBank data. RESULTS: A total of 56 cases of HCC were demonstrated to have genetic alterations, which were detected by at least one primer. The detestability of genetic alterations ranged from 20% to 70% in each case, and 17.9% to 50% in each primer. Serum HBV infection, tumor size, histological grade, tumor capsule, as well as tumor intrahepatic metastasis, might be correlated with genetic alterations on certain primers. A band with a higher intensity of 480 bp or so amplified fragments in tumor DNA relative to normal DNA could be seen in 27 of 56 tumor samples using primer 4. Sequence analysis of these fragments showed 91% homology with Homo sapiens double homeobox protein DUX10 gene. CONCLUSION: Genetic alterations are a frequent event in HCC, and tumor related DNA fragments have been found in this study, which may be associated with hepatocarcin-ogenesis. RAPD is an effective method for the identification and analysis of genetic alterations in HCC, and may provide new information for further evaluating the molecular mechanism of hepatocarcinogenesis. PMID:15996039
Toward DNA-based Security Circuitry: First Step - Random Number Generation.
Bogard, Christy M; Arazi, Benjamin; Rouchka, Eric C
2008-08-10
DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. Our team investigates the implications of DNA-based circuit design in serving security applications. As an initial step we develop a random number generation circuitry. A novel prototype schema employs solid-phase synthesis of oligonucleotides for random construction of DNA sequences. Temporary storage and retrieval is achieved through plasmid vectors.
Analysis on the DNA Fingerprinting of Aspergillus Oryzae Mutant Induced by High Hydrostatic Pressure
NASA Astrophysics Data System (ADS)
Wang, Hua; Zhang, Jian; Yang, Fan; Wang, Kai; Shen, Si-Le; Liu, Bing-Bing; Zou, Bo; Zou, Guang-Tian
2011-01-01
The mutant strains of aspergillus oryzae (HP300a) are screened under 300 MPa for 20 min. Compared with the control strains, the screened mutant strains have unique properties such as genetic stability, rapid growth, lots of spores, and high protease activity. Random amplified polymorphic DNA (RAPD) and inter simple sequence repeats (ISSR) are used to analyze the DNA fingerprinting of HP300a and the control strains. There are 67.9% and 51.3% polymorphic bands obtained by these two markers, respectively, indicating significant genetic variations between HP300a and the control strains. In addition, comparison of HP300a and the control strains, the genetic distances of random sequence and simple sequence repeat of DNA are 0.51 and 0.34, respectively.
High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.
Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie
2015-06-17
High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This sequence structure can be harnessed to improve bioinformatics algorithms, in particular for CNV and structural variant detection. Descriptive measures for cell-free DNA features developed here could also be used in biomarker analysis to monitor the changes that occur during different pathological conditions.
Unexpected substrate specificity of T4 DNA ligase revealed by in vitro selection
NASA Technical Reports Server (NTRS)
Harada, Kazuo; Orgel, Leslie E.
1993-01-01
We have used in vitro selection techniques to characterize DNA sequences that are ligated efficiently by T4 DNA ligase. We find that the ensemble of selected sequences ligates about 50 times as efficiently as the random mixture of sequences used as the input for selection. Surprisingly many of the selected sequences failed to produce a match at or close to the ligation junction. None of the 20 selected oligomers that we sequenced produced a match two bases upstream from the ligation junction.
Singular over-representation of an octameric palindrome, HIP1, in DNA from many cyanobacteria.
Robinson, N J; Robinson, P J; Gupta, A; Bleasby, A J; Whitton, B A; Morby, A P
1995-03-11
An octameric palindrome (5'-GCGATCGC-3') is abundant in cyanobacterial sequences within databases (GenBank/EMBL) and was designated HIP1 (highly iterated palindrome). The frequency of occurrence of all 256 octameric palindromes has now been determined in sub-databases revealing large and unique over-representation of HIP1 in cyanobacterial entries. DNA sequences from other bacteria were searched for any over-represented octameric palindromes analogous to HIP1. Only two sequences were identified, in the genomes of a thermophile and halophilic archaebacteria, although these were less abundant than HIP1 in cyanobacteria and relate to codon usage. To test the proposed widespread distribution of HIP1 in DNA from the cyanobacterium Synechococcus PCC 6301, randomly selected genomic clones were partly sequenced. HIP1 constituted 2.5% of the novel sequences, equivalent to a site on average once every 320 nucleotides. An oligonucleotide including HIP1 was also tested in PCR. Multiple products were obtained using template DNA from cyanobacterial strains in which HIP1 is abundant in known sequences, and some strains generated characteristic HIP-PCR banding patterns. However, analysis of DNA from one strain (not previously represented in databases) by random sequencing, HIP-PCR and Pvul digestion, confirms that not all cyanobacterial genomes are rich in HIP1.
DNA capture elements for rapid detection and identification of biological agents
NASA Astrophysics Data System (ADS)
Kiel, Johnathan L.; Parker, Jill E.; Holwitt, Eric A.; Vivekananda, Jeeva
2004-08-01
DNA capture elements (DCEs; aptamers) are artificial DNA sequences, from a random pool of sequences, selected for their specific binding to potential biological warfare agents. These sequences were selected by an affinity method using filters to which the target agent was attached and the DNA isolated and amplified by polymerase chain reaction (PCR) in an iterative, increasingly stringent, process. Reporter molecules were attached to the finished sequences. To date, we have made DCEs to Bacillus anthracis spores, Shiga toxin, Venezuelan Equine Encephalitis (VEE) virus, and Francisella tularensis. These DCEs have demonstrated specificity and sensitivity equal to or better than antibody.
Using Playing Cards to Simulate a Molecular Clock
ERIC Educational Resources Information Center
Westerling, Karin E.
2008-01-01
Changes in DNA base-repair may serve as an indicator of the time elapsed since divergence from a common ancestor. DNA sequences can now be analyzed. The simulation presented in this article allows students to observe the accumulation of changes in a randomly mutating sequence of playing cards. The cards are analogous to DNA nucleotide or protein…
Kerschner, Joseph E; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J Christopher; Ehrlich, Garth D
2010-04-01
We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription-polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis.
Kerschner, Joseph E.; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J. Christopher; Ehrlich, Garth D.
2010-01-01
Objectives We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Methods Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription–polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Results Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Conclusions Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis. PMID:20433028
Duyk, G M; Kim, S W; Myers, R M; Cox, D R
1990-11-01
Identification and recovery of transcribed sequences from cloned mammalian genomic DNA remains an important problem in isolating genes on the basis of their chromosomal location. We have developed a strategy that facilitates the recovery of exons from random pieces of cloned genomic DNA. The basis of this "exon trapping" strategy is that, during a retroviral life cycle, genomic sequences of nonviral origin are correctly spliced and may be recovered as a cDNA copy of the introduced segment. By using this genetic assay for cis-acting sequences required for RNA splicing, we have screened approximately 20 kilobase pairs of cloned genomic DNA and have recovered all four predicted exons.
Duyk, G M; Kim, S W; Myers, R M; Cox, D R
1990-01-01
Identification and recovery of transcribed sequences from cloned mammalian genomic DNA remains an important problem in isolating genes on the basis of their chromosomal location. We have developed a strategy that facilitates the recovery of exons from random pieces of cloned genomic DNA. The basis of this "exon trapping" strategy is that, during a retroviral life cycle, genomic sequences of nonviral origin are correctly spliced and may be recovered as a cDNA copy of the introduced segment. By using this genetic assay for cis-acting sequences required for RNA splicing, we have screened approximately 20 kilobase pairs of cloned genomic DNA and have recovered all four predicted exons. PMID:2247475
[Establishment of systemic lupus erythematosus-like murine model with Sm mimotope].
Xie, Hong-Fu; Feng, Hao; Zeng, Hai-Yan; Li, Ji; Shi, Wei; Yi, Mei; Wu, Bin
2007-04-01
To establish systemic lupus erythematosus (SLE) -like murine model by immunizing BALB/C mice with Sm mimotope. Sm mimotope was identified by screening a 12-mer random peptide library with monoclonal anti-Smith antibody. Sm mimotope was initially defined with sandwich ELISA, DNA sequencing, and deduced amino acid sequence; and BALB/C mice were subcutaneously injected with mixture phages clones. Sera Sm antibody, anti-double stranded DNA (dsDNA) antibody, and antinuclear antibody (ANA) of mice were detected using direct immunofluorescence; kidney histological changes were examined by HE staining. Five randomly selected peptides were sequenced and the amino acid sequences IR, SQ, and PP were detected in a higher frequency. High-titer IgG autoantibodies of dsDNA, Sm, and ANA in the sera of experiment group were detected by ELISA 28 days after having been immunized by Sm mimotope. Proteinuria was detected 33 days later; immune complex and nephritis were observed in kidney specimens. SLE-like murine model can be successfully induced by Sm phage mimotope.
2004-01-01
The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5′-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline. PMID:15489334
Kilo-sequencing: an ordered strategy for rapid DNA sequence data acquisition.
Barnes, W M; Bevan, M
1983-01-01
A strategy for rapid DNA sequence acquisition in an ordered, nonrandom manner, while retaining all of the conveniences of the dideoxy method with M13 transducing phage DNA template, is described. Target DNA 3 to 14 kb in size can be stably carried by our M13 vectors. Suitable targets are stretches of DNA which lack an enzyme recognition site which is unique on our cloning vectors and adjacent to the sequencing primer; current sites that are so useful when lacking are Pst, Xba, HindIII, BglII, EcoRI. By an in vitro procedure, we cut RF DNA once randomly and once specifically, to create thousands of deletions which start at the unique restriction site adjacent to the dideoxy sequencing primer and extend various distances across the target DNA. Phage carrying a desired size of deletions, whose DNA as template will give rise to DNA sequence data in a desired location along the target DNA, may be purified by electrophoresis alive on agarose gels. Phage running in the same location on the agarose gel thus conveniently give rise to nucleotide sequence data from the same kilobase of target DNA. Images PMID:6298723
DNA polymerase preference determines PCR priming efficiency.
Pan, Wenjing; Byrne-Steele, Miranda; Wang, Chunlin; Lu, Stanley; Clemmons, Scott; Zahorchak, Robert J; Han, Jian
2014-01-30
Polymerase chain reaction (PCR) is one of the most important developments in modern biotechnology. However, PCR is known to introduce biases, especially during multiplex reactions. Recent studies have implicated the DNA polymerase as the primary source of bias, particularly initiation of polymerization on the template strand. In our study, amplification from a synthetic library containing a 12 nucleotide random portion was used to provide an in-depth characterization of DNA polymerase priming bias. The synthetic library was amplified with three commercially available DNA polymerases using an anchored primer with a random 3' hexamer end. After normalization, the next generation sequencing (NGS) results of the amplified libraries were directly compared to the unamplified synthetic library. Here, high throughput sequencing was used to systematically demonstrate and characterize DNA polymerase priming bias. We demonstrate that certain sequence motifs are preferred over others as primers where the six nucleotide sequences at the 3' end of the primer, as well as the sequences four base pairs downstream of the priming site, may influence priming efficiencies. DNA polymerases in the same family from two different commercial vendors prefer similar motifs, while another commercially available enzyme from a different DNA polymerase family prefers different motifs. Furthermore, the preferred priming motifs are GC-rich. The DNA polymerase preference for certain sequence motifs was verified by amplification from single-primer templates. We incorporated the observed DNA polymerase preference into a primer-design program that guides the placement of the primer to an optimal location on the template. DNA polymerase priming bias was characterized using a synthetic library amplification system and NGS. The characterization of DNA polymerase priming bias was then utilized to guide the primer-design process and demonstrate varying amplification efficiencies among three commercially available DNA polymerases. The results suggest that the interaction of the DNA polymerase with the primer:template junction during the initiation of DNA polymerization is very important in terms of overall amplification bias and has broader implications for both the primer design process and multiplex PCR.
Isolation and characterization of target sequences of the chicken CdxA homeobox gene.
Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A
1993-01-01
The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943
Wang, Yongjie; Kleespies, Regina G; Ramle, Moslim B; Jehle, Johannes A
2008-09-01
The genomic sequence analysis of many large dsDNA viruses is hampered by the lack of enough sample materials. Here, we report a whole genome amplification of the Oryctes rhinoceros nudivirus (OrNV) isolate Ma07 starting from as few as about 10 ng of purified viral DNA by application of phi29 DNA polymerase- and exonuclease-resistant random hexamer-based multiple displacement amplification (MDA) method. About 60 microg of high molecular weight DNA with fragment sizes of up to 25 kbp was amplified. A genomic DNA clone library was generated using the product DNA. After 8-fold sequencing coverage, the 127,615 bp of OrNV whole genome was sequenced successfully. The results demonstrate that the MDA-based whole genome amplification enables rapid access to genomic information from exiguous virus samples.
Identification of Prostate Cancer-Specific microDNAs
2016-02-01
circular DNA by rolling circle amplification (RCA) and then amplified DNA fragments were subject to deep sequencing. Deep sequencing of the...demonstrate the existence of microDNAs in prostate cancer. We adopted multiple displacement amplification (MDA) with random 2 primers for enriched...prostate cancer cells through multiple displacement amplification and next generation sequencing. R e la ti v e c e ll g ro w th ( % ) 0 20
Ribosomal RNA Genes Contribute to the Formation of Pseudogenes and Junk DNA in the Human Genome.
Robicheau, Brent M; Susko, Edward; Harrigan, Amye M; Snyder, Marlene
2017-02-01
Approximately 35% of the human genome can be identified as sequence devoid of a selected-effect function, and not derived from transposable elements or repeated sequences. We provide evidence supporting a known origin for a fraction of this sequence. We show that: 1) highly degraded, but near full length, ribosomal DNA (rDNA) units, including both 45S and Intergenic Spacer (IGS), can be found at multiple sites in the human genome on chromosomes without rDNA arrays, 2) that these rDNA sequences have a propensity for being centromere proximal, and 3) that sequence at all human functional rDNA array ends is divergent from canonical rDNA to the point that it is pseudogenic. We also show that small sequence strings of rDNA (from 45S + IGS) can be found distributed throughout the genome and are identifiable as an "rDNA-like signal", representing 0.26% of the q-arm of HSA21 and ∼2% of the total sequence of other regions tested. The size of sequence strings found in the rDNA-like signal intergrade into the size of sequence strings that make up the full-length degrading rDNA units found scattered throughout the genome. We conclude that the displaced and degrading rDNA sequences are likely of a similar origin but represent different stages in their evolution towards random sequence. Collectively, our data suggests that over vast evolutionary time, rDNA arrays contribute to the production of junk DNA. The concept that the production of rDNA pseudogenes is a by-product of concerted evolution represents a previously under-appreciated process; we demonstrate here its importance. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Antipova, Valeriya N; Zheleznaya, Lyudmila A; Zyrina, Nadezhda V
2014-08-01
In the absence of added DNA, thermophilic DNA polymerases synthesize double-stranded DNA from free dNTPs, which consist of numerous repetitive units (ab initio DNA synthesis). The addition of thermophilic restriction endonuclease (REase), or nicking endonuclease (NEase), effectively stimulates ab initio DNA synthesis and determines the nucleotide sequence of reaction products. We have found that NEases Nt.AlwI, Nb.BbvCI, and Nb.BsmI with non-palindromic recognition sites stimulate the synthesis of sequences organized mainly as palindromes. Moreover, the nucleotide sequence of the palindromes appeared to be dependent on NEase recognition/cleavage modes. Thus, the heterodimeric Nb.BbvCI stimulated the synthesis of palindromes composed of two recognition sites of this NEase, which were separated by AT-reach sequences or (A)n (T)m spacers. Palindromic DNA sequences obtained in the ab initio DNA synthesis with the monomeric NEases Nb.BsmI and Nt.AlwI contained, along with the sites of these NEases, randomly synthesized sequences consisted of blocks of short repeats. These findings could help investigation of the potential abilities of highly productive ab initio DNA synthesis for the creation of DNA molecules with desirable sequence. © 2014 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.
Extracting DNA words based on the sequence features: non-uniform distribution and integrity.
Li, Zhi; Cao, Hongyan; Cui, Yuehua; Zhang, Yanbo
2016-01-25
DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences. We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract "DNA words" that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods. The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary. Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.
Studier, F. William
1995-04-18
Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.
Studier, F.W.
1995-04-18
Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.
High-density fiber-optic DNA random microsphere array.
Ferguson, J A; Steemers, F J; Walt, D R
2000-11-15
A high-density fiber-optic DNA microarray sensor was developed to monitor multiple DNA sequences in parallel. Microarrays were prepared by randomly distributing DNA probe-functionalized 3.1-microm-diameter microspheres in an array of wells etched in a 500-microm-diameter optical imaging fiber. Registration of the microspheres was performed using an optical encoding scheme and a custom-built imaging system. Hybridization was visualized using fluorescent-labeled DNA targets with a detection limit of 10 fM. Hybridization times of seconds are required for nanomolar target concentrations, and analysis is performed in minutes.
Antonov, V A; Altukhova, V V; Savchenko, S S; Zamaraev, V S; Iliukhin, V I; Alekseev, V V
2007-01-01
Burkholderia mallei is highly pathogenic microorganism for both humans and animals. In this work, the possibility of the use of the genotyping method for differentiation between strains of B. mallei was studied. A collection of 14 isolates of B. mallei was characterized using randomly amplified polymorphic DNA (RAPD) and multilocus sequence typing (MLST). RAPD was the best method used for detecting strain differences of B. mallei. It was suggested that this method would be an increasingly useful molecular epidemiological tool.
In silico evidence for sequence-dependent nucleosome sliding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lequieu, Joshua; Schwartz, David C.; de Pablo, Juan J.
Nucleosomes represent the basic building block of chromatin and provide an important mechanism by which cellular processes are controlled. The locations of nucleosomes across the genome are not random but instead depend on both the underlying DNA sequence and the dynamic action of other proteins within the nucleus. These processes are central to cellular function, and the molecular details of the interplay between DNA sequence and nudeosome dynamics remain poorly understood. In this work, we investigate this interplay in detail by relying on a molecular model, which permits development of a comprehensive picture of the underlying free energy surfaces andmore » the corresponding dynamics of nudeosome repositioning. The mechanism of nudeosome repositioning is shown to be strongly linked to DNA sequence and directly related to the binding energy of a given DNA sequence to the histone core. It is also demonstrated that chromatin remodelers can override DNA-sequence preferences by exerting torque, and the histone H4 tail is then identified as a key component by which DNA-sequence, histone modifications, and chromatin remodelers could in fact be coupled.« less
Nucleosome Positioning and Epigenetics
NASA Astrophysics Data System (ADS)
Schwab, David; Bruinsma, Robijn
2008-03-01
The role of chromatin structure in gene regulation has recently taken center stage in the field of epigenetics, phenomena that change the phenotype without changing the DNA sequence. Recent work has also shown that nucleosomes, a complex of DNA wrapped around a histone octamer, experience a sequence dependent energy landscape due to the variation in DNA bend stiffness with sequence composition. In this talk, we consider the role nucleosome positioning might play in the formation of heterochromatin, a compact form of DNA generically responsible for gene silencing. In particular, we discuss how different patterns of nucleosome positions, periodic or random, could either facilitate or suppress heterochromatin stability and formation.
Howland, Shanshan W; Poh, Chek-Meng; Rénia, Laurent
2011-09-01
Directional cloning of complementary DNA (cDNA) primed by oligo(dT) is commonly achieved by appending a restriction site to the primer, whereas the second strand is synthesized through the combined action of RNase H and Escherichia coli DNA polymerase I (PolI). Although random primers provide more uniform and complete coverage, directional cloning with the same strategy is highly inefficient. We report that phosphorothioate linkages protect the tail sequence appended to random primers from the 5'→3' exonuclease activity of PolI. We present a simple strategy for constructing a random-primed cDNA library using the efficient, size-independent, and seamless In-Fusion cloning method instead of restriction enzymes. Copyright © 2011 Elsevier Inc. All rights reserved.
Random-breakage mapping method applied to human DNA sequences
NASA Technical Reports Server (NTRS)
Lobrich, M.; Rydberg, B.; Cooper, P. K.; Chatterjee, A. (Principal Investigator)
1996-01-01
The random-breakage mapping method [Game et al. (1990) Nucleic Acids Res., 18, 4453-4461] was applied to DNA sequences in human fibroblasts. The methodology involves NotI restriction endonuclease digestion of DNA from irradiated calls, followed by pulsed-field gel electrophoresis, Southern blotting and hybridization with DNA probes recognizing the single copy sequences of interest. The Southern blots show a band for the unbroken restriction fragments and a smear below this band due to radiation induced random breaks. This smear pattern contains two discontinuities in intensity at positions that correspond to the distance of the hybridization site to each end of the restriction fragment. By analyzing the positions of those discontinuities we confirmed the previously mapped position of the probe DXS1327 within a NotI fragment on the X chromosome, thus demonstrating the validity of the technique. We were also able to position the probes D21S1 and D21S15 with respect to the ends of their corresponding NotI fragments on chromosome 21. A third chromosome 21 probe, D21S11, has previously been reported to be close to D21S1, although an uncertainty about a second possible location existed. Since both probes D21S1 and D21S11 hybridized to a single NotI fragment and yielded a similar smear pattern, this uncertainty is removed by the random-breakage mapping method.
Linear and Nonlinear Statistical Characterization of DNA
NASA Astrophysics Data System (ADS)
Norio Oiwa, Nestor; Goldman, Carla; Glazier, James
2002-03-01
We find spatial order in the distribution of protein-coding (including RNAs) and control segments of GenBank genomic sequences, irrespective of ATCG content. This is achieved by correlations, histograms, fractal dimensions and singularity spectra. Estimates of these quantities in complete nuclear genome indicate that coding sequences are long-range correlated and their disposition are self-similar (multifractal) for eukaryotes. These characteristics are absent in prokaryotes, where there are few noncoding sequences, suggesting the `junk' DNA play a relevant role to the genome structure and function. Concerning the genetic message of ATCG sequences, we build a random walk (Levy flight), using DNA symmetry arguments, where we associate A, T, C and G as left, right, down and up steps, respectively. Nonlinear analysis of mitochondrial DNA walks reveal multifractal pattern based on palindromic sequences, which fold in hairpins and loops.
Pryde, S E; Richardson, A J; Stewart, C S; Flint, H J
1999-12-01
Random clones of 16S ribosomal DNA gene sequences were isolated after PCR amplification with eubacterial primers from total genomic DNA recovered from samples of the colonic lumen, colonic wall, and cecal lumen from a pig. Sequences were also obtained for cultures isolated anaerobically from the same colonic-wall sample. Phylogenetic analysis showed that many sequences were related to those of Lactobacillus or Streptococcus spp. or fell into clusters IX, XIVa, and XI of gram-positive bacteria. In addition, 59% of randomly cloned sequences showed less than 95% similarity to database entries or sequences from cultivated organisms. Cultivation bias is also suggested by the fact that the majority of isolates (54%) recovered from the colon wall by culturing were related to Lactobacillus and Streptococcus, whereas this group accounted for only one-third of the sequence variation for the same sample from random cloning. The remaining cultured isolates were mainly Selenomonas related. A higher proportion of Lactobacillus reuteri-related sequences than of Lactobacillus acidophilus- and Lactobacillus amylovorus-related sequences were present in the colonic-wall sample. Since the majority of bacterial ribosomal sequences recovered from the colon wall are less than 95% related to known organisms, the roles of many of the predominant wall-associated bacteria remain to be defined.
Pryde, Susan E.; Richardson, Anthony J.; Stewart, Colin S.; Flint, Harry J.
1999-01-01
Random clones of 16S ribosomal DNA gene sequences were isolated after PCR amplification with eubacterial primers from total genomic DNA recovered from samples of the colonic lumen, colonic wall, and cecal lumen from a pig. Sequences were also obtained for cultures isolated anaerobically from the same colonic-wall sample. Phylogenetic analysis showed that many sequences were related to those of Lactobacillus or Streptococcus spp. or fell into clusters IX, XIVa, and XI of gram-positive bacteria. In addition, 59% of randomly cloned sequences showed less than 95% similarity to database entries or sequences from cultivated organisms. Cultivation bias is also suggested by the fact that the majority of isolates (54%) recovered from the colon wall by culturing were related to Lactobacillus and Streptococcus, whereas this group accounted for only one-third of the sequence variation for the same sample from random cloning. The remaining cultured isolates were mainly Selenomonas related. A higher proportion of Lactobacillus reuteri-related sequences than of Lactobacillus acidophilus- and Lactobacillus amylovorus-related sequences were present in the colonic-wall sample. Since the majority of bacterial ribosomal sequences recovered from the colon wall are less than 95% related to known organisms, the roles of many of the predominant wall-associated bacteria remain to be defined. PMID:10583991
An extended sequence specificity for UV-induced DNA damage.
Chung, Long H; Murray, Vincent
2018-01-01
The sequence specificity of UV-induced DNA damage was determined with a higher precision and accuracy than previously reported. UV light induces two major damage adducts: cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). Employing capillary electrophoresis with laser-induced fluorescence and taking advantages of the distinct properties of the CPDs and 6-4PPs, we studied the sequence specificity of UV-induced DNA damage in a purified DNA sequence using two approaches: end-labelling and a polymerase stop/linear amplification assay. A mitochondrial DNA sequence that contained a random nucleotide composition was employed as the target DNA sequence. With previous methodology, the UV sequence specificity was determined at a dinucleotide or trinucleotide level; however, in this paper, we have extended the UV sequence specificity to a hexanucleotide level. With the end-labelling technique (for 6-4PPs), the consensus sequence was found to be 5'-GCTC*AC (where C* is the breakage site); while with the linear amplification procedure, it was 5'-TCTT*AC. With end-labelling, the dinucleotide frequency of occurrence was highest for 5'-TC*, 5'-TT* and 5'-CC*; whereas it was 5'-TT* for linear amplification. The influence of neighbouring nucleotides on the degree of UV-induced DNA damage was also examined. The core sequences consisted of pyrimidine nucleotides 5'-CTC* and 5'-CTT* while an A at position "1" and C at position "2" enhanced UV-induced DNA damage. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Wang, Yongming; Lin, Xiuyun; Dong, Bo; Wang, Yingdian; Liu, Bao
2004-01-01
RAPD (randomly amplified polymorphic DNA) and ISSR (inter-simple sequence repeat) fingerprinting on HpaII/MspI-digested genomic DNA of nine elite japonica rice cultivars implies inter-cultivar DNA methylation polymorphism. Using both DNA fragments isolated from RAPD or ISSR gels and selected low-copy sequences as probes, methylation-sensitive Southern blot analysis confirms the existence of extensive DNA methylation polymorphism in both genes and DNA repeats among the rice cultivars. The cultivar-specific methylation patterns are stably maintained, and can be used as reliable molecular markers. Transcriptional analysis of four selected sequences (RdRP, AC9, HSP90 and MMR) on leaves and roots from normal and 5-azacytidine-treated seedlings of three representative cultivars shows an association between the transcriptional activity of one of the genes, the mismatch repair (MMR) gene, and its CG methylation patterns.
NASA Technical Reports Server (NTRS)
Ho, P. S.; Ellison, M. J.; Quigley, G. J.; Rich, A.
1986-01-01
The ease with which a particular DNA segment adopts the left-handed Z-conformation depends largely on the sequence and on the degree of negative supercoiling to which it is subjected. We describe a computer program (Z-hunt) that is designed to search long sequences of naturally occurring DNA and retrieve those nucleotide combinations of up to 24 bp in length which show a strong propensity for Z-DNA formation. Incorporated into Z-hunt is a statistical mechanical model based on empirically determined energetic parameters for the B to Z transition accumulated to date. The Z-forming potential of a sequence is assessed by ranking its behavior as a function of negative superhelicity relative to the behavior of similar sized randomly generated nucleotide sequences assembled from over 80,000 combinations. The program makes it possible to compare directly the Z-forming potential of sequences with different base compositions and different sequence lengths. Using Z-hunt, we have analyzed the DNA sequences of the bacteriophage phi X174, plasmid pBR322, the animal virus SV40 and the replicative form of the eukaryotic adenovirus-2. The results are compared with those previously obtained by others from experiments designed to locate Z-DNA forming regions in these sequences using probes which show specificity for the left-handed DNA conformation.
Randrianjatovo-Gbalou, Irina; Rosario, Sandrine; Sismeiro, Odile; Varet, Hugo; Legendre, Rachel; Coppée, Jean-Yves; Huteau, Valérie; Pochet, Sylvie; Delarue, Marc
2018-05-21
Nucleic acid aptamers, especially RNA, exhibit valuable advantages compared to protein therapeutics in terms of size, affinity and specificity. However, the synthesis of libraries of large random RNAs is still difficult and expensive. The engineering of polymerases able to directly generate these libraries has the potential to replace the chemical synthesis approach. Here, we start with a DNA polymerase that already displays a significant template-free nucleotidyltransferase activity, human DNA polymerase theta, and we mutate it based on the knowledge of its three-dimensional structure as well as previous mutational studies on members of the same polA family. One mutant exhibited a high tolerance towards ribonucleotides (NTPs) and displayed an efficient ribonucleotidyltransferase activity that resulted in the assembly of long RNA polymers. HPLC analysis and RNA sequencing of the products were used to quantify the incorporation of the four NTPs as a function of initial NTP concentrations and established the randomness of each generated nucleic acid sequence. The same mutant revealed a propensity to accept other modified nucleotides and to extend them in long fragments. Hence, this mutant can deliver random natural and modified RNA polymers libraries ready to use for SELEX, with custom lengths and balanced or unbalanced ratios.
DNA unzipping phase diagram calculated via replica theory.
Roland, C Brian; Hatch, Kristi Adamson; Prentiss, Mara; Shakhnovich, Eugene I
2009-05-01
We show how single-molecule unzipping experiments can provide strong evidence that the zero-force melting transition of long molecules of natural dsDNA should be classified as a phase transition of the higher-order type (continuous). Toward this end, we study a statistical-mechanics model for the fluctuating structure of a long molecule of dsDNA, and compute the equilibrium phase diagram for the experiment in which the molecule is unzipped under applied force. We consider a perfect-matching dsDNA model, in which the loops are volume-excluding chains with arbitrary loop exponent c . We include stacking interactions, hydrogen bonds, and main-chain entropy. We include sequence heterogeneity at the level of random sequences; in particular, there is no correlation in the base-pairing (bp) energy from one sequence position to the next. We present heuristic arguments to demonstrate that the low-temperature macrostate does not exhibit degenerate ergodicity breaking. We use this claim to understand the results of our replica-theoretic calculation of the equilibrium properties of the system. As a function of temperature, we obtain the minimal force at which the molecule separates completely. This critical-force curve is a line in the temperature-force phase diagram that marks the regions where the molecule exists primarily as a double helix versus the region where the molecule exists as two separate strands. We compare our random-sequence model to magnetic tweezer experiments performed on the 48 502 bp genome of bacteriophage lambda . We find good agreement with the experimental data, which is restricted to temperatures between 24 and 50 degrees C . At higher temperatures, the critical-force curve of our random-sequence model is very different for that of the homogeneous-sequence version of our model. For both sequence models, the critical force falls to zero at the melting temperature T_{c} like |T-T_{c}|;{alpha} . For the homogeneous-sequence model, alpha=1/2 almost exactly, while for the random-sequence model, alpha approximately 0.9 . Importantly, the shape of the critical-force curve is connected, via our theory, to the manner in which the helix fraction falls to zero at T_{c} . The helix fraction is the property that is used to classify the melting transition as a type of phase transition. In our calculation, the shape of the critical-force curve holds strong evidence that the zero-force melting transition of long natural dsDNA should be classified as a higher-order (continuous) phase transition. Specifically, the order is 3rd or greater.
Al-Khalifah, Nasser S; Shanavaskhan, A E
2017-01-01
Ambiguity in the total number of date palm cultivars across the world is pointing toward the necessity for an enumerative study using standard morphological and molecular markers. Among molecular markers, DNA markers are more suitable and ubiquitous to most applications. They are highly polymorphic in nature, frequently occurring in genomes, easy to access, and highly reproducible. Various molecular markers such as restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), simple sequence repeats (SSR), inter-simple sequence repeats (ISSR), and random amplified polymorphic DNA (RAPD) markers have been successfully used as efficient tools for analysis of genetic variation in date palm. This chapter explains a stepwise protocol for extracting total genomic DNA from date palm leaves. A user-friendly protocol for RAPD analysis and a table showing the primers used in different molecular techniques that produce polymorphisms in date palm are also provided.
mtDNA sequence diversity of Hazara ethnic group from Pakistan.
Rakha, Allah; Fatima; Peng, Min-Sheng; Adan, Atif; Bi, Rui; Yasmin, Memona; Yao, Yong-Gang
2017-09-01
The present study was undertaken to investigate mitochondrial DNA (mtDNA) control region sequences of Hazaras from Pakistan, so as to generate mtDNA reference database for forensic casework in Pakistan and to analyze phylogenetic relationship of this particular ethnic group with geographically proximal populations. Complete mtDNA control region (nt 16024-576) sequences were generated through Sanger Sequencing for 319 Hazara individuals from Quetta, Baluchistan. The population sample set showed a total of 189 distinct haplotypes, belonging mainly to West Eurasian (51.72%), East & Southeast Asian (29.78%) and South Asian (18.50%) haplogroups. Compared with other populations from Pakistan, the Hazara population had a relatively high haplotype diversity (0.9945) and a lower random match probability (0.0085). The dataset has been incorporated into EMPOP database under accession number EMP00680. The data herein comprises the largest, and likely most thoroughly examined, control region mtDNA dataset from Hazaras of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.
Effects of 16S rDNA sampling on estimates of the number of endosymbiont lineages in sucking lice
Burleigh, J. Gordon; Light, Jessica E.; Reed, David L.
2016-01-01
Phylogenetic trees can reveal the origins of endosymbiotic lineages of bacteria and detect patterns of co-evolution with their hosts. Although taxon sampling can greatly affect phylogenetic and co-evolutionary inference, most hypotheses of endosymbiont relationships are based on few available bacterial sequences. Here we examined how different sampling strategies of Gammaproteobacteria sequences affect estimates of the number of endosymbiont lineages in parasitic sucking lice (Insecta: Phthirapatera: Anoplura). We estimated the number of louse endosymbiont lineages using both newly obtained and previously sequenced 16S rDNA bacterial sequences and more than 42,000 16S rDNA sequences from other Gammaproteobacteria. We also performed parametric and nonparametric bootstrapping experiments to examine the effects of phylogenetic error and uncertainty on these estimates. Sampling of 16S rDNA sequences affects the estimates of endosymbiont diversity in sucking lice until we reach a threshold of genetic diversity, the size of which depends on the sampling strategy. Sampling by maximizing the diversity of 16S rDNA sequences is more efficient than randomly sampling available 16S rDNA sequences. Although simulation results validate estimates of multiple endosymbiont lineages in sucking lice, the bootstrap results suggest that the precise number of endosymbiont origins is still uncertain. PMID:27547523
Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database
2017-01-01
Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799
Arnold, Frances H.; Shao, Zhixin; Zhao, Huimin; Giver, Lorraine J.
2002-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
El Sharabasy, Sherif F; Soliman, Khaled A
2017-01-01
The date palm is an ancient domesticated plant with great diversity and has been cultivated in the Middle East and North Africa for at last 5000 years. Date palm cultivars are classified based on the fruit moisture content, as dry, semidry, and soft dates. There are a number of biochemical and molecular techniques available for characterization of the date palm variation. This chapter focuses on the DNA-based markers random amplified polymorphic DNA (RAPD) and inter-simple sequence repeats (ISSR) techniques, in addition to biochemical markers based on isozyme analysis. These techniques coupled with appropriate statistical tools proved useful for determining phylogenetic relationships among date palm cultivars and provide information resources for date palm gene banks.
iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model
Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen
2011-01-01
DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. PMID:21935457
PCV2d-2 is the predominant type of PCV2 DNA in pig samples collected in the U.S. during 2014-2016.
Xiao, Chao-Ting; Harmon, Karen M; Halbur, Patrick G; Opriessnig, Tanja
2016-12-25
Porcine circovirus type 2 (PCV2) vaccination was introduced in the US in 2006 and since has been adopted by most pig producers. While porcine circovirus associated disease (PCVAD) outbreaks are now relatively uncommon in the US, PCV2 remains a concern which is emphasized by increasing numbers of PCR and sequencing requests for PCV2. In the present study, randomly selected lung tissues from 586 pigs submitted in 2015 were tested for presence of PCV2 DNA. Positive samples were further characterized by sequencing and combined with available PCV2 open-reading-frame (ORF) 2 sequences from the client data base of the Iowa State University Veterinary Diagnostic Laboratory. The prevalence of PCV2 in the randomly selected lung tissues was 23% (135/586) with 11.3% PCV2a, 29% PCV2b and 71.8% for PCV2d subgroup PCV2d-2. A total of 455 ORF2 sequences obtained from 2014 through 2016 were analyzed and PCV2d accounted for 66.7% of the 2014 sequences, 71.8% of the 2015 sequences, and 72% of the 2016 sequences. Interestingly, only 1.9% (9/455) of the sequences belonged to the recently identified PCV2e genotype. The present data indicates that despite an almost 100% PCV2 vaccine coverage in the US, PCV2 DNA can still be detected in almost 1 of 4 randomly selected pig tissues. PCV2d-2 is now the predominant genotype in the USA suggesting that PCV2d-2 may have some advantage over PCV2a and PCV2b in its ability to replicate in pigs under vaccination pressure. Copyright © 2016. Published by Elsevier B.V.
Damodar R. Kethidi; David B. Roden; Tim R. Ladd; Peter J. Krell; Arthur Ratnakaran; Qili Feng
2003-01-01
DNA markers were identified for the molecular detection of the Asian long-horned beetle (ALB), Anoplophora glabripennis (Mot.), based on sequence charaterized amplified regions (SCARS) derived from random amplified polymorphic DNA (RAPD) fragments. A 2,740-bp DNA fragment that was present only in ALB and not in other Cerambycids was identified after...
Chen, Jianchi; Civerolo, Edwin L; Jarret, Robert L; Van Sluys, Marie-Anne; de Oliveira, Mariana C
2005-02-01
Xylella fastidiosa causes many important plant diseases including Pierce's disease (PD) in grape and almond leaf scorch disease (ALSD). DNA-based methodologies, such as randomly amplified polymorphic DNA (RAPD) analysis, have been playing key roles in genetic information collection of the bacterium. This study further analyzed the nucleotide sequences of selected RAPDs from X. fastidiosa strains in conjunction with the available genome sequence databases and unveiled several previously unknown novel genetic traits. These include a sequence highly similar to those in the phage family of Podoviridae. Genome comparisons among X. fastidiosa strains suggested that the "phage" is currently active. Two other RAPDs were also related to horizontal gene transfer: one was part of a broadly distributed cryptic plasmid and the other was associated with conjugal transfer. One RAPD inferred a genomic rearrangement event among X. fastidiosa PD strains and another identified a single nucleotide polymorphism of evolutionary value.
Polanski, A; Kimmel, M; Chakraborty, R
1998-05-12
Distribution of pairwise differences of nucleotides from data on a sample of DNA sequences from a given segment of the genome has been used in the past to draw inferences about the past history of population size changes. However, all earlier methods assume a given model of population size changes (such as sudden expansion), parameters of which (e.g., time and amplitude of expansion) are fitted to the observed distributions of nucleotide differences among pairwise comparisons of all DNA sequences in the sample. Our theory indicates that for any time-dependent population size, N(tau) (in which time tau is counted backward from present), a time-dependent coalescence process yields the distribution, p(tau), of the time of coalescence between two DNA sequences randomly drawn from the population. Prediction of p(tau) and N(tau) requires the use of a reverse Laplace transform known to be unstable. Nevertheless, simulated data obtained from three models of monotone population change (stepwise, exponential, and logistic) indicate that the pattern of a past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mtDNA sequences indicates that the current mtDNA sequence variation is not inconsistent with a logistic growth of the human population.
Reducing DNA context dependence in bacterial promoters
Carr, Swati B.; Densmore, Douglas M.
2017-01-01
Variation in the DNA sequence upstream of bacterial promoters is known to affect the expression levels of the products they regulate, sometimes dramatically. While neutral synthetic insulator sequences have been found to buffer promoters from upstream DNA context, there are no established methods for designing effective insulator sequences with predictable effects on expression levels. We address this problem with Degenerate Insulation Screening (DIS), a novel method based on a randomized 36-nucleotide insulator library and a simple, high-throughput, flow-cytometry-based screen that randomly samples from a library of 436 potential insulated promoters. The results of this screen can then be compared against a reference uninsulated device to select a set of insulated promoters providing a precise level of expression. We verify this method by insulating the constitutive, inducible, and repressible promotors of a four transcriptional-unit inverter (NOT-gate) circuit, finding both that order dependence is largely eliminated by insulation and that circuit performance is also significantly improved, with a 5.8-fold mean improvement in on/off ratio. PMID:28422998
Time- and Cost-Efficient Identification of T-DNA Insertion Sites through Targeted Genomic Sequencing
Lepage, Étienne; Zampini, Éric; Boyle, Brian; Brisson, Normand
2013-01-01
Forward genetic screens enable the unbiased identification of genes involved in biological processes. In Arabidopsis, several mutant collections are publicly available, which greatly facilitates such practice. Most of these collections were generated by agrotransformation of a T-DNA at random sites in the plant genome. However, precise mapping of T-DNA insertion sites in mutants isolated from such screens is a laborious and time-consuming task. Here we report a simple, low-cost and time efficient approach to precisely map T-DNA insertions simultaneously in many different mutants. By combining sequence capture, next-generation sequencing and 2D-PCR pooling, we developed a new method that allowed the rapid localization of T-DNA insertion sites in 55 out of 64 mutant plants isolated in a screen for gyrase inhibition hypersensitivity. PMID:23951038
Identification of Prostate Cancer-Specific microDNAs
2014-12-01
displacement amplification (MDA). 2 adopted multiple displacement amplification (MDA) with random primers for enriched circular DNA by rolling circle ... amplification (RCA) (Fig. 1) and then amplified DNA fragments were subject to deep sequencing. Sequence NO of Reads seq 1 184 seq 2 133 seq 3 2407 seq...prostate cancer cells through multiple displacement amplification . Clone #7 is the top candidate which has been cloned in an expression vector and it
ERIC Educational Resources Information Center
Wernersson, Rasmus
2007-01-01
An important part of teaching students how to use the BLAST tool for searching large sequence databases, is to train the students to think critically about the quality of the sequence hits found--both in terms of the statistical significance and how informative the individual hits are. This paper describes how generating truly random sequences by…
Ma, Xin; Guo, Jing; Sun, Xiao
2016-01-01
DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.
Sites of instability in the human TCF3 (E2A) gene adopt G-quadruplex DNA structures in vitro
Williams, Jonathan D.; Fleetwood, Sara; Berroyer, Alexandra; Kim, Nayun; Larson, Erik D.
2015-01-01
The formation of highly stable four-stranded DNA, called G-quadruplex (G4), promotes site-specific genome instability. G4 DNA structures fold from repetitive guanine sequences, and increasing experimental evidence connects G4 sequence motifs with specific gene rearrangements. The human transcription factor 3 (TCF3) gene (also termed E2A) is subject to genetic instability associated with severe disease, most notably a common translocation event t(1;19) associated with acute lymphoblastic leukemia. The sites of instability in TCF3 are not randomly distributed, but focused to certain sequences. We asked if G4 DNA formation could explain why TCF3 is prone to recombination and mutagenesis. Here we demonstrate that sequences surrounding the major t(1;19) break site and a region associated with copy number variations both contain G4 sequence motifs. The motifs identified readily adopt G4 DNA structures that are stable enough to interfere with DNA synthesis in physiological salt conditions in vitro. When introduced into the yeast genome, TCF3 G4 motifs promoted gross chromosomal rearrangements in a transcription-dependent manner. Our results provide a molecular rationale for the site-specific instability of human TCF3, suggesting that G4 DNA structures contribute to oncogenic DNA breaks and recombination. PMID:26029241
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Suhkmann; Zhang, Ziming; Upchurch, Sean
2004-04-16
2 ARID is a homologous family of DNA-binding domains that occur in DNA binding proteins from a wide variety of species, ranging from yeast to nematodes, insects, mammals and plants. SWI1, a member of the SWI/SNF protein complex that is involved in chromatin remodeling during transcription, contains the ARID motif. The ARID domain of human SWI1 (also known as p270) does not select for a specific DNA sequence from a random sequence pool. The lack of sequence specificity shown by the SWI1 ARID domain stands in contrast to the other characterized ARID domains, which recognize specific AT-rich sequences. We havemore » solved the three-dimensional structure of human SWI1 ARID using solution NMR methods. In addition, we have characterized non-specific DNA-binding by the SWI1 ARID domain. Results from this study indicate that a flexible long internal loop in ARID motif is likely to be important for sequence specific DNA-recognition. The structure of human SWI1 ARID domain also represents a distinct structural subfamily. Studies of ARID indicate that boundary of the DNA binding structural and functional domains can extend beyond the sequence homologous region in a homologous family of proteins. Structural studies of homologous domains such as ARID family of DNA-binding domains should provide information to better predict the boundary of structural and functional domains in structural genomic studies. Key Words: ARID, SWI1, NMR, structural genomics, protein-DNA interaction.« less
Molecular identification of Mango, Mangifera indica L.var. totupura
Jagarlamudi, Sankar; G, Rosaiah; Kurapati, Ravi Kumar; Pinnamaneni, Rajasekhar
2011-01-01
Mango (>Mangifera indica) belonging to Anacardiaceae family is a fruit that grows in tropical regions. It is considered as the King of fruits. The present work was taken up to identify a tool in identifying the mango species at the molecular level. The chloroplast trnL-F region was amplified from extracted total genomic DNA using the polymerase chain reaction (PCR) and sequenced. Sequence of the dominant DGGE band revealed that Mangifera indica in tested leaves was Mangifera indica (100% similarity to the ITS sequences of Mangifera indica). This sequence was deposited in NCBI with the accession no. GQ927757. Abbreviations AFLP - Amplified fragment length polymorphism , cpDNA - Chloroplast DNA, DDGE - Denaturing gradient gel electrophoresis, DNA - Deoxyribo nucleic acid, EDTA - Ethylenediamine tetraacetic acid, HCl - Hydrochloric acid, ISSR - Inter simple sequence repeats, ITS - Internal transcribed spacer, MATAB - Methyl Ammonium Bromide, Na2SO3 - Sodium sulphite, NaCl - Sodium chloride, NCBI - National Centre for Biotechnology Information, PCR - Polymerase chain reaction, PEG - Polyethylene glycol, RAPD - Randomly amplified polymorphic DNA, trnL-F - Transfer RNA genes start codon- termination codon. PMID:21423885
Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J
2004-10-01
Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.
Oligo Design: a computer program for development of probes for oligonucleotide microarrays.
Herold, Keith E; Rasooly, Avraham
2003-12-01
Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.
Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney
2012-01-01
RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676
Alternative DNA structure formation in the mutagenic human c-MYC promoter
del Mundo, Imee Marie A.; Zewail-Foote, Maha; Kerwin, Sean M.
2017-01-01
Abstract Mutation ‘hotspot’ regions in the genome are susceptible to genetic instability, implicating them in diseases. These hotspots are not random and often co-localize with DNA sequences potentially capable of adopting alternative DNA structures (non-B DNA, e.g. H-DNA and G4-DNA), which have been identified as endogenous sources of genomic instability. There are regions that contain overlapping sequences that may form more than one non-B DNA structure. The extent to which one structure impacts the formation/stability of another, within the sequence, is not fully understood. To address this issue, we investigated the folding preferences of oligonucleotides from a chromosomal breakpoint hotspot in the human c-MYC oncogene containing both potential G4-forming and H-DNA-forming elements. We characterized the structures formed in the presence of G4-DNA-stabilizing K+ ions or H-DNA-stabilizing Mg2+ ions using multiple techniques. We found that under conditions favorable for H-DNA formation, a stable intramolecular triplex DNA structure predominated; whereas, under K+-rich, G4-DNA-forming conditions, a plurality of unfolded and folded species were present. Thus, within a limited region containing sequences with the potential to adopt multiple structures, only one structure predominates under a given condition. The predominance of H-DNA implicates this structure in the instability associated with the human c-MYC oncogene. PMID:28334873
Brain Connectivity as a DNA Sequencing Problem
NASA Astrophysics Data System (ADS)
Zador, Anthony
The mammalian cortex consists of millions or billions of neurons, each connected to thousands of other neurons. Traditional methods for determining the brain connectivity rely on microscopy to visualize neuronal connections, but such methods are slow, labor-intensive and often lack single neuron resolution. We have recently developed a new method, MAPseq, to recast the determination of brain wiring into a form that can exploit the tremendous recent advances in high-throughput DNA sequencing. DNA sequencing technology has outpaced even Moore's law, so that the cost of sequencing the human genome has dropped from a billion dollars in 2001 to below a thousand dollars today. MAPseq works by introducing random sequences of DNA-``barcodes''-to tag neurons uniquely. With MAPseq, we can determine the connectivity of over 50K single neurons in a single mouse cortex in about a week, an unprecedented throughput, ushering in the era of ``big data'' for brain wiring. We are now developing analytical tools and algorithms to make sense of these novel data sets.
Oishi, M; Gohma, H; Lejukole, H Y; Taniguchi, Y; Yamada, T; Suzuki, K; Shinkai, H; Uenishi, H; Yasue, H; Sasaki, Y
2004-05-01
Expressed sequence tags (ESTs) generated based on characterization of clones isolated randomly from cDNA libraries are used to study gene expression profiles in specific tissues and to provide useful information for characterizing tissue physiology. In this study, two directionally cloned cDNA libraries were constructed from 60 day-old bovine whole fetus and fetal placenta. We have characterized 5357 and 1126 clones, and then identified 3464 and 795 unique sequences for the fetus and placenta cDNA libraries: 1851 and 504 showed homology to already identified genes, and 1613 and 291 showed no significant matches to any of the sequences in DNA databases, respectively. Further, we found 94 unique sequences overlapping in both the fetus and the placenta, leading to a catalog of 4165 genes expressed in 60 day-old fetus and placenta. The catalog is used to examine expression profile of genes in 60 day-old bovine fetus and placenta.
Packaging of Dinoroseobacter shibae DNA into Gene Transfer Agent Particles Is Not Random.
Tomasch, Jürgen; Wang, Hui; Hall, April T K; Patzelt, Diana; Preusse, Matthias; Petersen, Jörn; Brinkmann, Henner; Bunk, Boyke; Bhuju, Sabin; Jarek, Michael; Geffers, Robert; Lang, Andrew S; Wagner-Döbler, Irene
2018-01-01
Gene transfer agents (GTAs) are phage-like particles which contain a fragment of genomic DNA of the bacterial or archaeal producer and deliver this to a recipient cell. GTA gene clusters are present in the genomes of almost all marine Rhodobacteraceae (Roseobacters) and might be important contributors to horizontal gene transfer in the world's oceans. For all organisms studied so far, no obvious evidence of sequence specificity or other nonrandom process responsible for packaging genomic DNA into GTAs has been found. Here, we show that knock-out of an autoinducer synthase gene of Dinoroseobacter shibae resulted in overproduction and release of functional GTA particles (DsGTA). Next-generation sequencing of the 4.2-kb DNA fragments isolated from DsGTAs revealed that packaging was not random. DNA from low-GC conjugative plasmids but not from high-GC chromids was excluded from packaging. Seven chromosomal regions were strongly overrepresented in DNA isolated from DsGTA. These packaging peaks lacked identifiable conserved sequence motifs that might represent recognition sites for the GTA terminase complex. Low-GC regions of the chromosome, including the origin and terminus of replication, were underrepresented in DNA isolated from DsGTAs. DNA methylation reduced packaging frequency while the level of gene expression had no influence. Chromosomal regions found to be over- and underrepresented in DsGTA-DNA were regularly spaced. We propose that a "headful" type of packaging is initiated at the sites of coverage peaks and, after linearization of the chromosomal DNA, proceeds in both directions from the initiation site. GC-content, DNA-modifications, and chromatin structure might influence at which sides GTA packaging can be initiated. © The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Packaging of Dinoroseobacter shibae DNA into Gene Transfer Agent Particles Is Not Random
Wang, Hui; Hall, April T K; Patzelt, Diana; Preusse, Matthias; Petersen, Jörn; Brinkmann, Henner; Bunk, Boyke; Bhuju, Sabin; Jarek, Michael; Geffers, Robert; Lang, Andrew S; Wagner-Döbler, Irene
2018-01-01
Abstract Gene transfer agents (GTAs) are phage-like particles which contain a fragment of genomic DNA of the bacterial or archaeal producer and deliver this to a recipient cell. GTA gene clusters are present in the genomes of almost all marine Rhodobacteraceae (Roseobacters) and might be important contributors to horizontal gene transfer in the world’s oceans. For all organisms studied so far, no obvious evidence of sequence specificity or other nonrandom process responsible for packaging genomic DNA into GTAs has been found. Here, we show that knock-out of an autoinducer synthase gene of Dinoroseobacter shibae resulted in overproduction and release of functional GTA particles (DsGTA). Next-generation sequencing of the 4.2-kb DNA fragments isolated from DsGTAs revealed that packaging was not random. DNA from low-GC conjugative plasmids but not from high-GC chromids was excluded from packaging. Seven chromosomal regions were strongly overrepresented in DNA isolated from DsGTA. These packaging peaks lacked identifiable conserved sequence motifs that might represent recognition sites for the GTA terminase complex. Low-GC regions of the chromosome, including the origin and terminus of replication, were underrepresented in DNA isolated from DsGTAs. DNA methylation reduced packaging frequency while the level of gene expression had no influence. Chromosomal regions found to be over- and underrepresented in DsGTA-DNA were regularly spaced. We propose that a “headful” type of packaging is initiated at the sites of coverage peaks and, after linearization of the chromosomal DNA, proceeds in both directions from the initiation site. GC-content, DNA-modifications, and chromatin structure might influence at which sides GTA packaging can be initiated. PMID:29325123
Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei
2005-08-15
Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.
Haider, Nadia
2017-01-01
Investigation of genetic variation and phylogenetic relationships among date palm (Phoenix dactylifera L.) cultivars is useful for their conservation and genetic improvement. Various molecular markers such as restriction fragment length polymorphisms (RFLPs), simple sequence repeat (SSR), representational difference analysis (RDA), and amplified fragment length polymorphism (AFLP) have been developed to molecularly characterize date palm cultivars. PCR-based markers random amplified polymorphic DNA (RAPD) and inter-simple sequence repeat (ISSR) are powerful tools to determine the relatedness of date palm cultivars that are difficult to distinguish morphologically. In this chapter, the principles, materials, and methods of RAPD and ISSR techniques are presented. Analysis of data generated from these two techniques and the use of these data to reveal phylogenetic relationships among date palm cultivars are also discussed.
High-density, microsphere-based fiber optic DNA microarrays.
Epstein, Jason R; Leung, Amy P K; Lee, Kyong Hoon; Walt, David R
2003-05-01
A high-density fiber optic DNA microarray has been developed consisting of oligonucleotide-functionalized, 3.1-microm-diameter microspheres randomly distributed on the etched face of an imaging fiber bundle. The fiber bundles are comprised of 6000-50000 fused optical fibers and each fiber terminates with an etched well. The microwell array is capable of housing complementary-sized microspheres, each containing thousands of copies of a unique oligonucleotide probe sequence. The array fabrication process results in random microsphere placement. Determining the position of microspheres in the random array requires an optical encoding scheme. This array platform provides many advantages over other array formats. The microsphere-stock suspension concentration added to the etched fiber can be controlled to provide inherent sensor redundancy. Examining identical microspheres has a beneficial effect on the signal-to-noise ratio. As other sequences of interest are discovered, new microsphere sensing elements can be added to existing microsphere pools and new arrays can be fabricated incorporating the new sequences without altering the existing detection capabilities. These microarrays contain the smallest feature sizes (3 microm) of any DNA array, allowing interrogation of extremely small sample volumes. Reducing the feature size results in higher local target molecule concentrations, creating rapid and highly sensitive assays. The microsphere array platform is also flexible in its applications; research has included DNA-protein interaction profiles, microbial strain differentiation, and non-labeled target interrogation with molecular beacons. Fiber optic microsphere-based DNA microarrays have a simple fabrication protocol enabling their expansion into other applications, such as single cell-based assays.
Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).
Hoskins, Roger A; Stapleton, Mark; George, Reed A; Yu, Charles; Wan, Kenneth H; Carlson, Joseph W; Celniker, Susan E
2005-12-02
cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT-PCR approaches.
Santini, A C; Magalhães, J T; Cascardo, J C M; Corrêa, R X
2016-04-28
Chromobacterium violaceum is a free-living Gram-negative bacillus usually found in the water and soil in tropical regions, which causes infections in humans. Chromobacteriosis is characterized by rapid dissemination and high mortality. The aim of this study was to detect the genetic variability among C. violaceum type strain ATCC 12472, and seven isolates from the environment and one from a pulmonary secretion from a chromobacteriosis patient from Ilhéus, Bahia. The molecular characterization of all samples was performed by polymerase chain reaction (PCR) sequencing and 16S rDNA analysis. Primers specific for two ATCC 12472 pathogenicity genes, hilA and yscD, as well as random amplified polymorphic DNA (RAPD), were used for PCR amplification and comparative sequencing of the products. For a more specific approach, the PCR products of 16S rDNA were digested with restriction enzymes. Seven of the samples, including type-strain ATCC 12472, were amplified by the hilA primers; these were subsequently sequenced. Gene yscD was amplified only in type-strain ATCC 12472. MspI and AluI digestion revealed 16S rDNA polymorphisms. This data allowed the generation of a dendogram for each analysis. The isolates of C. violaceum have variability in random genomic regions demonstrated by RAPD. Also, these isolates have variability in pathogenicity genes, as demonstrated by sequencing and restriction enzyme digestion.
Bacolla, Albino; Tainer, John A; Vasquez, Karen M; Cooper, David N
2016-07-08
Gross chromosomal rearrangements (including translocations, deletions, insertions and duplications) are a hallmark of cancer genomes and often create oncogenic fusion genes. An obligate step in the generation of such gross rearrangements is the formation of DNA double-strand breaks (DSBs). Since the genomic distribution of rearrangement breakpoints is non-random, intrinsic cellular factors may predispose certain genomic regions to breakage. Notably, certain DNA sequences with the potential to fold into secondary structures [potential non-B DNA structures (PONDS); e.g. triplexes, quadruplexes, hairpin/cruciforms, Z-DNA and single-stranded looped-out structures with implications in DNA replication and transcription] can stimulate the formation of DNA DSBs. Here, we tested the postulate that these DNA sequences might be found at, or in close proximity to, rearrangement breakpoints. By analyzing the distribution of PONDS-forming sequences within ±500 bases of 19 947 translocation and 46 365 sequence-characterized deletion breakpoints in cancer genomes, we find significant association between PONDS-forming repeats and cancer breakpoints. Specifically, (AT)n, (GAA)n and (GAAA)n constitute the most frequent repeats at translocation breakpoints, whereas A-tracts occur preferentially at deletion breakpoints. Translocation breakpoints near PONDS-forming repeats also recur in different individuals and patient tumor samples. Hence, PONDS-forming sequences represent an intrinsic risk factor for genomic rearrangements in cancer genomes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Primers for polymerase chain reaction to detect genomic DNA of Toxocara canis and T. cati.
Wu, Z; Nagano, I; Xu, D; Takahashi, Y
1997-03-01
Primers for polymerase chain reaction to amplify genomic DNA of both Toxocara canis and T. cati were constructed by adapting cloning and sequencing random amplified polymorphic DNA. The primers are expected to detect eggs and/or larvae of T. canis and T. cati, both of which are known to cause toxocariasis in humans.
Ultraaccurate genome sequencing and haplotyping of single human cells.
Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun
2017-11-21
Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.
Selection and Screening of DNA Aptamers for Inorganic Nanomaterials.
Zhou, Yibo; Huang, Zhicheng; Yang, Ronghua; Liu, Juewen
2018-02-21
Searching for DNA sequences that can strongly and selectively bind to inorganic surfaces is a long-standing topic in bionanotechnology, analytical chemistry and biointerface research. This can be achieved either by aptamer selection starting with a very large library of ≈10 14 random DNA sequences, or by careful screening of a much smaller library (usually from a few to a few hundred) with rationally designed sequences. Unlike typical molecular targets, inorganic surfaces often have quite strong DNA adsorption affinities due to polyvalent binding and even chemical interactions. This leads to a very high background binding making aptamer selection difficult. Screening, on the other hand, can be designed to compare relative binding affinities of different DNA sequences and could be more appropriate for inorganic surfaces. The resulting sequences have been used for DNA-directed assembly, sorting of carbon nanotubes, and DNA-controlled growth of inorganic nanomaterials. It was recently discovered that poly-cytosine (C) DNA can strongly bind to a diverse range of nanomaterials including nanocarbons (graphene oxide and carbon nanotubes), various metal oxides and transition-metal dichalcogenides. In this Concept article, we articulate the need for screening and potential artifacts associated with traditional aptamer selection methods for inorganic surfaces. Representative examples of application are discussed, and a few future research opportunities are proposed towards the end of this article. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Superimposed Code Theorectic Analysis of DNA Codes and DNA Computing
2010-03-01
because only certain collections (partitioned by font type) of sequences are allowed to be in each position (e.g., Arial = position 0, Comic ...rigidity of short oligos and the shape of the polar charge. Oligo movement was modeled by a Brownian motion 3 dimensional random walk. The one...temperature, kB is Boltz he viscosity of the medium. The random walk motion is modeled by assuming the oligo is on a three dimensional lattice and may
Alternative DNA structure formation in the mutagenic human c-MYC promoter.
Del Mundo, Imee Marie A; Zewail-Foote, Maha; Kerwin, Sean M; Vasquez, Karen M
2017-05-05
Mutation 'hotspot' regions in the genome are susceptible to genetic instability, implicating them in diseases. These hotspots are not random and often co-localize with DNA sequences potentially capable of adopting alternative DNA structures (non-B DNA, e.g. H-DNA and G4-DNA), which have been identified as endogenous sources of genomic instability. There are regions that contain overlapping sequences that may form more than one non-B DNA structure. The extent to which one structure impacts the formation/stability of another, within the sequence, is not fully understood. To address this issue, we investigated the folding preferences of oligonucleotides from a chromosomal breakpoint hotspot in the human c-MYC oncogene containing both potential G4-forming and H-DNA-forming elements. We characterized the structures formed in the presence of G4-DNA-stabilizing K+ ions or H-DNA-stabilizing Mg2+ ions using multiple techniques. We found that under conditions favorable for H-DNA formation, a stable intramolecular triplex DNA structure predominated; whereas, under K+-rich, G4-DNA-forming conditions, a plurality of unfolded and folded species were present. Thus, within a limited region containing sequences with the potential to adopt multiple structures, only one structure predominates under a given condition. The predominance of H-DNA implicates this structure in the instability associated with the human c-MYC oncogene. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Polymerase ribozyme efficiency increased by G/T-rich DNA oligonucleotides
Yao, Chengguo; Müller, Ulrich F.
2011-01-01
The RNA world hypothesis states that the early evolution of life went through a stage where RNA served as genome and as catalyst. The replication of RNA world organisms would have been facilitated by ribozymes that catalyze RNA polymerization. To recapitulate an RNA world in the laboratory, a series of RNA polymerase ribozymes was developed previously. However, these ribozymes have a polymerization efficiency that is too low for self-replication, and the most efficient ribozymes prefer one specific template sequence. The limiting factor for polymerization efficiency is the weak sequence-independent binding to its primer/template substrate. Most of the known polymerase ribozymes bind an RNA heptanucleotide to form the P2 duplex on the ribozyme. By modifying this heptanucleotide, we were able to significantly increase polymerization efficiency. Truncations at the 3′-terminus of this heptanucleotide increased full-length primer extension by 10-fold, on a specific template sequence. In contrast, polymerization on several different template sequences was improved dramatically by replacing the RNA heptanucleotide with DNA oligomers containing randomized sequences of 15 nt. The presence of G and T in the random sequences was sufficient for this effect, with an optimal composition of 60% G and 40% T. Our results indicate that these DNA sequences function by establishing many weak and nonspecific base-pairing interactions to the single-stranded portion of the template. Such low-specificity interactions could have had important functions in an RNA world. PMID:21622900
Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight
NASA Astrophysics Data System (ADS)
Shi, Jinming
In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.
Usefulness of fire ant genetics in insecticide efficacy trials
USDA-ARS?s Scientific Manuscript database
Mature fire ant colonies contain an average of 80,000 worker ants. For this study, eight fire ant workers were randomly sampled from each colony. DNA fingerprints for each individual ant were generated using 21 simple sequence repeats (SSR) markers that were developed from fire ant DNA by other lab...
Yang, Xiumin; Sugita, Takashi; Takashima, Masako; Hiruma, Masataro; Li, Ruoyu; Sudo, Hajime; Ogawa, Hideoki; Ikeda, Shigaku
2009-04-01
Trichophyton rubrum is the most common pathogen causing dermatophytosis worldwide. Recent genetic investigations showed that the microorganism originated in Africa and then spread to Europe and North America via Asia. We investigated the intraspecific diversity of T. rubrum isolated from two closely located Asian countries, Japan and China. A total of 150 clinical isolates of T. rubrum obtained from Japanese and Chinese patients were analyzed by randomly amplified polymorphic DNA (RAPD) and DNA sequence analysis of the non-transcribed spacer (NTS) region in the rRNA gene. RAPD analysis divided the 150 strains into two major clusters, A and B. Of the Japanese isolates, 30% belonged to cluster A and 70% belonged to cluster B, whereas 91% of the Chinese isolates were in cluster A. The NTS region of the rRNA gene was divided into four major groups (I-IV) based on DNA sequencing. The majority of Japanese isolates were type IV (51%), and the majority of Chinese isolates were type III (75%). These results suggest that although Japan and China are neighboring countries, the origins of T. rubrum isolates from these countries may not be identical. These findings provide information useful for tracing the global transmission routes of T. rubrum.
Conditional poliovirus mutants made by random deletion mutagenesis of infectious cDNA.
Kirkegaard, K; Nelsen, B
1990-01-01
Small deletions were introduced into DNA plasmids bearing cDNA copies of Mahoney type 1 poliovirus RNA. The procedure used was similar to that of P. Hearing and T. Shenk (J. Mol. Biol. 167:809-822, 1983), with modifications designed to introduce only one lesion randomly into each DNA molecule. Methods to map small deletions in either large DNA or RNA molecules were employed. Two poliovirus mutants, VP1-101 and VP1-102, were selected from mutagenized populations on the basis of their host range phenotype, showing a large reduction in the relative numbers of plaques on CV1 and HeLa cells compared with wild-type virus. The deletions borne by the mutant genomes were mapped to the region encoding the amino terminus of VP1. That these lesions were responsible for the mutant phenotypes was substantiated by reintroduction of the sequenced lesions into a wild-type poliovirus cDNA by deoxyoligonucleotide-directed mutagenesis. The deletion of nucleotides encoding amino acids 8 and 9 of VP1 was responsible for the VP1-101 phenotype; the VP1-102 defect was caused by the deletion of the sequences encoding the first four amino acids of VP1. The peptide sequence at the VP1-VP3 proteolytic cleavage site was altered from glutamine-glycine to glutamine-methionine in VP1-102; this apparently did not alter the proteolytic cleavage pattern. The biochemical defects resulting from these mutations are discussed in the accompanying report. Images PMID:2152811
Superimposed Code Theoretic Analysis of Deoxyribonucleic Acid (DNA) Codes and DNA Computing
2010-01-01
partitioned by font type) of sequences are allowed to be in each position (e.g., Arial = position 0, Comic = position 1, etc. ) and within each collection...movement was modeled by a Brownian motion 3 dimensional random walk. The one dimensional diffusion coefficient D for the ellipsoid shape with 3...temperature, kB is Boltzmann’s constant, and η is the viscosity of the medium. The random walk motion is modeled by assuming the oligo is on a three
EMPOP-quality mtDNA control region sequences from Kashmiri of Azad Jammu & Kashmir, Pakistan.
Rakha, Allah; Peng, Min-Sheng; Bi, Rui; Song, Jiao-Jiao; Salahudin, Zeenat; Adan, Atif; Israr, Muhammad; Yao, Yong-Gang
2016-11-01
The mitochondrial DNA (mtDNA) control region (nucleotide position 16024-576) sequences were generated through Sanger sequencing method for 317 self-identified Kashmiris from all districts of Azad Jammu & Kashmir Pakistan. The population sample set showed a total of 251 haplotypes, with a relatively high haplotype diversity (0.9977) and a low random match probability (0.54%). The containing matrilineal lineages belonging to three different phylogeographic origins of Western Eurasian (48.9%), South Asian (47.0%) and East Asian (4.1%). The present study was compared to previous data from Pakistan and other worldwide populations (Central Asia, Western Asia, and East & Southeast Asia). The dataset is made available through EMPOP under accession number EMP00679 and will serve as an mtDNA reference database in forensic casework in Pakistan. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
DNA based random key generation and management for OTP encryption.
Zhang, Yunpeng; Liu, Xin; Sun, Manhui
2017-09-01
One-time pad (OTP) is a principle of key generation applied to the stream ciphering method which offers total privacy. The OTP encryption scheme has proved to be unbreakable in theory, but difficult to realize in practical applications. Because OTP encryption specially requires the absolute randomness of the key, its development has suffered from dense constraints. DNA cryptography is a new and promising technology in the field of information security. DNA chromosomes storing capabilities can be used as one-time pad structures with pseudo-random number generation and indexing in order to encrypt the plaintext messages. In this paper, we present a feasible solution to the OTP symmetric key generation and transmission problem with DNA at the molecular level. Through recombinant DNA technology, by using only sender-receiver known restriction enzymes to combine the secure key represented by DNA sequence and the T vector, we generate the DNA bio-hiding secure key and then place the recombinant plasmid in implanted bacteria for secure key transmission. The designed bio experiments and simulation results show that the security of the transmission of the key is further improved and the environmental requirements of key transmission are reduced. Analysis has demonstrated that the proposed DNA-based random key generation and management solutions are marked by high security and usability. Published by Elsevier B.V.
Bhat, Abhay Prasad; Shin, Minsang; Choy, Hyon E
2014-07-01
Histone-like nucleoid structuring protein (H-NS) is a small but abundant protein present in enteric bacteria and is involved in compaction of the DNA and regulation of the transcription. Recent reports have suggested that H-NS binds to a specific AT rich DNA sequence than to intrinsically curved DNA in sequence independent manner. We detected two high-specificity H-NS binding sites in LEE5 promoter of EPEC centered at -110 and -138, which were close to the proposed consensus H-NS binding motif. To identify H-NS binding sequence in LEE5 promoter, we took a random mutagenesis approach and found the mutations at around -138 were specifically defective in the regulation by H-NS. It was concluded that H-NS exerts maximum repression via the specific sequence at around -138 and subsequently contacts a subunit of RNAP through oligomerization.
Schönberg, Anna; Theunert, Christoph; Li, Mingkun; Stoneking, Mark; Nasidze, Ivan
2011-09-01
To investigate the demographic history of human populations from the Caucasus and surrounding regions, we used high-throughput sequencing to generate 147 complete mtDNA genome sequences from random samples of individuals from three groups from the Caucasus (Armenians, Azeri and Georgians), and one group each from Iran and Turkey. Overall diversity is very high, with 144 different sequences that fall into 97 different haplogroups found among the 147 individuals. Bayesian skyline plots (BSPs) of population size change through time show a population expansion around 40-50 kya, followed by a constant population size, and then another expansion around 15-18 kya for the groups from the Caucasus and Iran. The BSP for Turkey differs the most from the others, with an increase from 35 to 50 kya followed by a prolonged period of constant population size, and no indication of a second period of growth. An approximate Bayesian computation approach was used to estimate divergence times between each pair of populations; the oldest divergence times were between Turkey and the other four groups from the South Caucasus and Iran (~400-600 generations), while the divergence time of the three Caucasus groups from each other was comparable to their divergence time from Iran (average of ~360 generations). These results illustrate the value of random sampling of complete mtDNA genome sequences that can be obtained with high-throughput sequencing platforms.
Glenn, Travis C; Lance, Stacey L; McKee, Anna M; Webster, Bonnie L; Emery, Aidan M; Zerlotini, Adhemar; Oliveira, Guilherme; Rollinson, David; Faircloth, Brant C
2013-10-17
Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma species (i.e., using DNA sequences conserved among species), as well as other markers that are specific to species or species-groups (i.e., using DNA sequences that differ among species). Full genome-sequencing of additional species and specimens of S. haematobium, S. japonicum, and S. mansoni is desirable to better characterize differences within and among these species, to develop additional genetic markers, and to examine genes as well as conserved non-coding elements associated with drug resistance.
Ramalho-Ortigão, J M; Temporal, P; de Oliveira , S M; Barbosa, A F; Vilela, M L; Rangel, E F; Brazil, R P; Traub-Cseko, Y M
2001-01-01
Molecular studies of insect disease vectors are of paramount importance for understanding parasite-vector relationship. Advances in this area have led to important findings regarding changes in vectors' physiology upon blood feeding and parasite infection. Mechanisms for interfering with the vectorial capacity of insects responsible for the transmission of diseases such as malaria, Chagas disease and dengue fever are being devised with the ultimate goal of developing transgenic insects. A primary necessity for this goal is information on gene expression and control in the target insect. Our group is investigating molecular aspects of the interaction between Leishmania parasites and Lutzomyia sand flies. As an initial step in our studies we have used random sequencing of cDNA clones from two expression libraries made from head/thorax and abdomen of sugar fed L. longipalpis for the identification of expressed sequence tags (EST). We applied differential display reverse transcriptase-PCR and randomly amplified polymorphic DNA-PCR to characterize differentially expressed mRNA from sugar and blood fed insects, and, in one case, from a L. (V.) braziliensis-infected L. longipalpis. We identified 37 cDNAs that have shown homology to known sequences from GeneBank. Of these, 32 cDNAs code for constitutive proteins such as zinc finger protein, glutamine synthetase, G binding protein, ubiquitin conjugating enzyme. Three are putative differentially expressed cDNAs from blood fed and Leishmania-infected midgut, a chitinase, a V-ATPase and a MAP kinase. Finally, two sequences are homologous to Drosophila melanogaster gene products recently discovered through the Drosophila genome initiative.
R. Steven Wagner; Mark P. Miller; Charles M. Crisafulli; Susan M. Haig
2005-01-01
The Larch Mountain salamander (Plethodon larselli Burns, 1954) is an endemic species in the Pacific northwestern United States facing threats related to habitat destruction. To facilitate development of conservation strategies, we used DNA sequences and RAPDs (random amplified polymorphic DNA) to examine differences among populations of this...
Wang, Xiaolong; Li, Lin; Zhao, Jiaxin; Li, Fangliang; Guo, Wei; Chen, Xia
2017-04-01
To evaluate the effects of different preservation methods (stored in a -20°C ice chest, preserved in liquid nitrogen and dried in silica gel) on inter simple sequence repeat (ISSR) or random amplified polymorphic DNA (RAPD) analyses in various botanical specimens (including broad-leaved plants, needle-leaved plants and succulent plants) for different times (three weeks and three years), we used a statistical analysis based on the number of bands, genetic index and cluster analysis. The results demonstrate that methods used to preserve samples can provide sufficient amounts of genomic DNA for ISSR and RAPD analyses; however, the effect of different preservation methods on these analyses vary significantly, and the preservation time has little effect on these analyses. Our results provide a reference for researchers to select the most suitable preservation method depending on their study subject for the analysis of molecular markers based on genomic DNA. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.
Chaitanya, Lakshmi; van Oven, Mannis; Brauer, Silke; Zimmermann, Bettina; Huber, Gabriela; Xavier, Catarina; Parson, Walther; de Knijff, Peter; Kayser, Manfred
2016-03-01
The use of mitochondrial DNA (mtDNA) for maternal lineage identification often marks the last resort when investigating forensic and missing-person cases involving highly degraded biological materials. As with all comparative DNA testing, a match between evidence and reference sample requires a statistical interpretation, for which high-quality mtDNA population frequency data are crucial. Here, we determined, under high quality standards, the complete mtDNA control-region sequences of 680 individuals from across the Netherlands sampled at 54 sites, covering the entire country with 10 geographic sub-regions. The complete mtDNA control region (nucleotide positions 16,024-16,569 and 1-576) was amplified with two PCR primers and sequenced with ten different sequencing primers using the EMPOP protocol. Haplotype diversity of the entire sample set was very high at 99.63% and, accordingly, the random-match probability was 0.37%. No population substructure within the Netherlands was detected with our dataset. Phylogenetic analyses were performed to determine mtDNA haplogroups. Inclusion of these high-quality data in the EMPOP database (accession number: EMP00666) will improve its overall data content and geographic coverage in the interest of all EMPOP users worldwide. Moreover, this dataset will serve as (the start of) a national reference database for mtDNA applications in forensic and missing person casework in the Netherlands. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Walker, David Lee
1999-12-01
This study uses dynamical analysis to examine in a quantitative fashion the information coding mechanism in DNA sequences. This exceeds the simple dichotomy of either modeling the mechanism by comparing DNA sequence walks as Fractal Brownian Motion (fbm) processes. The 2-D mappings of the DNA sequences for this research are from Iterated Function System (IFS) (Also known as the ``Chaos Game Representation'' (CGR)) mappings of the DNA sequences. This technique converts a 1-D sequence into a 2-D representation that preserves subsequence structure and provides a visual representation. The second step of this analysis involves the application of Wavelet Packet Transforms, a recently developed technique from the field of signal processing. A multi-fractal model is built by using wavelet transforms to estimate the Hurst exponent, H. The Hurst exponent is a non-parametric measurement of the dynamism of a system. This procedure is used to evaluate gene- coding events in the DNA sequence of cystic fibrosis mutations. The H exponent is calculated for various mutation sites in this gene. The results of this study indicate the presence of anti-persistent, random walks and persistent ``sub-periods'' in the sequence. This indicates the hypothesis of a multi-fractal model of DNA information encoding warrants further consideration. This work examines the model's behavior in both pathological (mutations) and non-pathological (healthy) base pair sequences of the cystic fibrosis gene. These mutations both natural and synthetic were introduced by computer manipulation of the original base pair text files. The results show that disease severity and system ``information dynamics'' correlate. These results have implications for genetic engineering as well as in mathematical biology. They suggest that there is scope for more multi-fractal models to be developed.
Cartwright, Joseph F; Anderson, Karin; Longworth, Joseph; Lobb, Philip; James, David C
2018-06-01
High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ∼40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5 kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding DNA. The putative ratio of non-synonymous and synonymous changes within the open reading frames (ORFs) in the plasmid sequence indicates that natural selection does not impact upon the prevalence of these mutations. Here we have demonstrated the abundance of mutations that fall outside of the reported range of detection of next generation sequencing (NGS) and second generation sequencing (SGS) platforms, providing a methodology capable of being utilized in cell line development platforms to identify the fidelity of recombinant genes throughout the production process. © 2018 Wiley Periodicals, Inc.
Random whole metagenomic sequencing for forensic discrimination of soils.
Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian
2014-01-01
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.
Prescott, D M
1994-01-01
Ciliates contain two types of nuclei: a micronucleus and a macronucleus. The micronucleus serves as the germ line nucleus but does not express its genes. The macronucleus provides the nuclear RNA for vegetative growth. Mating cells exchange haploid micronuclei, and a new macronucleus develops from a new diploid micronucleus. The old macronucleus is destroyed. This conversion consists of amplification, elimination, fragmentation, and splicing of DNA sequences on a massive scale. Fragmentation produces subchromosomal molecules in Tetrahymena and Paramecium cells and much smaller, gene-sized molecules in hypotrichous ciliates to which telomere sequences are added. These molecules are then amplified, some to higher copy numbers than others. rDNA is differentially amplified to thousands of copies per macronucleus. Eliminated sequences include transposonlike elements and sequences called internal eliminated sequences that interrupt gene coding regions in the micronuclear genome. Some, perhaps all, of these are excised as circular molecules and destroyed. In at least some hypotrichs, segments of some micronuclear genes are scrambled in a nonfunctional order and are recorded during macronuclear development. Vegetatively growing ciliates appear to possess a mechanism for adjusting copy numbers of individual genes, which corrects gene imbalances resulting from random distribution of DNA molecules during amitosis of the macronucleus. Other distinctive features of ciliate DNA include an altered use of the conventional stop codons. Images PMID:8078435
Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Rao, A R
2016-11-05
DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists. Copyright © 2016 Elsevier B.V. All rights reserved.
[The primary structure of a vaccine strain of tobacco mosaic virus V-69].
Shiian, A N; Mil'shina, N V; Snegireva, P B; Pukhal'skiĭ, V A
1994-12-01
A random set of cDNA fragments were synthesized on genomic RNA of TMV vaccine strain V-69, using random primers and reverse transcriptase. Following synthesis of double-stranded cDNA, they were cloned into the pUC-19 plasmid; and 28 clones were sequenced (insert size 100-500 bp). High nucleotide sequence homology of V-69 (more than 95%) was shown only with tomato strain TMV-L [1]. Sequenced clones represent 54% of the genome (50% of the replicase gene, 98% of the transport protein gene, and 60% of the coat protein gene). In this genome region, 24 base substitutions were revealed, as compared to the wild-type TMV-L sequence. Six base substitutions resulted in changes in corresponding amino acid codons. No substitutions coincided with those discovered in the related TMV vaccine strain L11A [2], while two substitutions in the replicase gene were identical to those found in TMV strain Lta1 [3], which is capable of overcoming protection in tomatoes with the resistance gene Tm-1.
Navarro, B; Daròs, J A; Flores, R
1996-01-01
Two PCR-based methods are described for obtaining clones of small circular RNAs of unknown sequence and for which only minute amounts are available. To avoid introducing any assumption about the RNA sequence, synthesis of the cDNAs is initiated with random primers. The cDNA population is then PCR-amplified using a primer whose sequence is present at both sides of the cDNAs, since they have been obtained with random hexamers and then a linker with the sequence of the PCR primer has been ligated to their termini, or because the cDNAs have been synthesized with an oligonucleotide that contains the sequence of the PCR primer at its 5' end and six randomized positions at its 3' end. The procedures need only approximately 50 ng of purified RNA template. The reasons for the emergence of cloning artifacts and precautions to avoid them are discussed.
Dean, Frank B.; Nelson, John R.; Giesler, Theresa L.; Lasken, Roger S.
2001-01-01
We describe a simple method of using rolling circle amplification to amplify vector DNA such as M13 or plasmid DNA from single colonies or plaques. Using random primers and φ29 DNA polymerase, circular DNA templates can be amplified 10,000-fold in a few hours. This procedure removes the need for lengthy growth periods and traditional DNA isolation methods. Reaction products can be used directly for DNA sequencing after phosphatase treatment to inactivate unincorporated nucleotides. Amplified products can also be used for in vitro cloning, library construction, and other molecular biology applications. PMID:11381035
Lam, Kelly Y C; Chan, Gallant K L; Xin, Gui-Zhong; Xu, Hong; Ku, Chuen-Fai; Chen, Jian-Ping; Yao, Ping; Lin, Huang-Quan; Dong, Tina T X; Tsim, Karl W K
2015-12-15
Cordyceps sinensis is an endoparasitic fungus widely used as a tonic and medicinal food in the practice of traditional Chinese medicine (TCM). In historical usage, Cordyceps specifically is referring to the species of C. sinensis. However, a number of closely related species are named themselves as Cordyceps, and they are sold commonly as C. sinensis. The substitutes and adulterants of C. sinensis are often introduced either intentionally or accidentally in the herbal market, which seriously affects the therapeutic effects or even leads to life-threatening poisoning. Here, we aim to identify Cordyceps by DNA sequencing technology. Two different DNA-based approaches were compared. The internal transcribed spacer (ITS) sequences and the random amplified polymorphic DNA (RAPD)-sequence characterized amplified region (SCAR) were developed here to authenticate different species of Cordyceps. Both approaches generally enabled discrimination of C. sinensis from others. The application of the two methods, supporting each other, increases the security of identification. For better reproducibility and faster analysis, the SCAR markers derived from the RAPD results provide a new method for quick authentication of Cordyceps.
DNA detection on ultrahigh-density optical fiber-based nanoarrays.
Tam, Jenny M; Song, Linan; Walt, David R
2009-04-15
Nanoarrays for DNA detection were fabricated on etched nanofiber bundles based on recently developed techniques for microscale arrays. Two different-sized nanoarrays were created: one with 700 nm feature sizes and a 1 microm center-to-center pitch (approximately 1x10(6) array elements/mm(2)) and one with 300 nm feature sizes and a 500 nm center-to-center pitch (4.6x10(6) array elements/mm(2)). A random, multiplexed array composed of oligonucleotide-functionalized nanospheres was constructed and used for parallel detection and analysis of fluorescently labeled DNA targets. We have used these arrays to detect a variety of target sequences including Bacillus thuringiensis kurstaki and vaccina virus sequences, two potential biowarfare agents, as well as interleukin-2 sequences, an immune system modulator that has been used for the diagnosis of HIV.
Investigation of microsatellite instability in Turkish breast cancer patients.
Demokan, Semra; Muslumanoglu, Mahmut; Yazici, H; Igci, Abdullah; Dalay, Nejat
2002-01-01
Multiple somatic and inherited genetic changes that lead to loss of growth control may contribute to the development of breast cancer. Microsatellites are tandem repeats of simple sequences that occur abundantly and at random throughout most eucaryotic genomes. Microsatellite instability (MI), characterized by the presence of random contractions or expansions in the length of simple sequence repeats or microsatellites, is observed in a variety of tumors. The aim of this study was to compare tumor DNA fingerprints with constitutional DNA fingerprints to investigate changes specific to breast cancer and evaluate its correlation with clinical characteristics. Tumor and normal tissue samples of 38 patients with breast cancer were investigated by comparing PCR-amplified microsatellite sequences D2S443 and D21S1436. Microsatellite instability at D21S1436 and D2S443 was found in 5 (13%) and 7 (18%) patients, respectively. Two patients displayed instability at both marker loci. No association was found between MI and age, family history, lymph node involvement and other clinical parameters.
Rector, Annabel; Tachezy, Ruth; Van Ranst, Marc
2004-01-01
The discovery of novel viruses has often been accomplished by using hybridization-based methods that necessitate the availability of a previously characterized virus genome probe or knowledge of the viral nucleotide sequence to construct consensus or degenerate PCR primers. In their natural replication cycle, certain viruses employ a rolling-circle mechanism to propagate their circular genomes, and multiply primed rolling-circle amplification (RCA) with φ29 DNA polymerase has recently been applied in the amplification of circular plasmid vectors used in cloning. We employed an isothermal RCA protocol that uses random hexamer primers to amplify the complete genomes of papillomaviruses without the need for prior knowledge of their DNA sequences. We optimized this RCA technique with extracted human papillomavirus type 16 (HPV-16) DNA from W12 cells, using a real-time quantitative PCR assay to determine amplification efficiency, and obtained a 2.4 × 104-fold increase in HPV-16 DNA concentration. We were able to clone the complete HPV-16 genome from this multiply primed RCA product. The optimized protocol was subsequently applied to a bovine fibropapillomatous wart tissue sample. Whereas no papillomavirus DNA could be detected by restriction enzyme digestion of the original sample, multiply primed RCA enabled us to obtain a sufficient amount of papillomavirus DNA for restriction enzyme analysis, cloning, and subsequent sequencing of a novel variant of bovine papillomavirus type 1. The multiply primed RCA method allows the discovery of previously unknown papillomaviruses, and possibly also other circular DNA viruses, without a priori sequence information. PMID:15113879
Exploring the loblolly pine (Pinus taeda L.) genome by BAC sequencing and Cot analysis.
Perera, Dinum; Magbanua, Zenaida V; Thummasuwan, Supaphan; Mukherjee, Dipaloke; Arick, Mark; Chouvarine, Philippe; Nairn, Campbell J; Schmutz, Jeremy; Grimwood, Jane; Dean, Jeffrey F D; Peterson, Daniel G
2018-07-15
Loblolly pine (LP; Pinus taeda L.) is an economically and ecologically important tree in the southeastern U.S. To advance understanding of the loblolly pine (LP; Pinus taeda L.) genome, we sequenced and analyzed 100 BAC clones and performed a Cot analysis. The Cot analysis indicates that the genome is composed of 57, 24, and 10% highly-repetitive, moderately-repetitive, and single/low-copy sequences, respectively (the remaining 9% of the genome is a combination of fold back and damaged DNA). Although single/low-copy DNA only accounts for 10% of the LP genome, the amount of single/low-copy DNA in LP is still 14 times the size of the Arabidopsis genome. Since gene numbers in LP are similar to those in Arabidopsis, much of the single/low-copy DNA of LP would appear to be composed of DNA that is both gene- and repeat-poor. Macroarrays prepared from a LP bacterial artificial chromosome (BAC) library were hybridized with probes designed from cell wall synthesis/wood development cDNAs, and 50 of the "targeted" clones were selected for further analysis. An additional 25 clones were selected because they contained few repeats, while 25 more clones were selected at random. The 100 BAC clones were Sanger sequenced and assembled. Of the targeted BACs, 80% contained all or part of the cDNA used to target them. One targeted BAC was found to contain fungal DNA and was eliminated from further analysis. Combinations of similarity-based and ab initio gene prediction approaches were utilized to identify and characterize potential coding regions in the 99 BACs containing LP DNA. From this analysis, we identified 154 gene models (GMs) representing both putative protein-coding genes and likely pseudogenes. Ten of the GMs (all of which were specifically targeted) had enough support to be classified as intact genes. Interestingly, the 154 GMs had statistically indistinguishable (α = 0.05) distributions in the targeted and random BAC clones (15.18 and 12.61 GM/Mb, respectively), whereas the low-repeat BACs contained significantly fewer GMs (7.08 GM/Mb). However, when GM length was considered, the targeted BACs had a significantly greater percentage of their length in GMs (3.26%) when compared to random (1.63%) and low-repeat (0.62%) BACs. The results of our study provide insight into LP evolution and inform ongoing efforts to produce a reference genome sequence for LP, while characterization of genes involved in cell wall production highlights carbon metabolism pathways that can be leveraged for increasing wood production. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Regulatory sequence analysis tools.
van Helden, Jacques
2003-07-01
The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
Beckenbach, Andrew T.
2012-01-01
The complete mitochondrial DNA sequences of eight representatives of lower Diptera, suborder Nematocera, along with nearly complete sequences from two other species, are presented. These taxa represent eight families not previously represented by complete mitochondrial DNA sequences. Most of the sequences retain the ancestral dipteran mitochondrial gene arrangement, while one sequence, that of the midge Arachnocampa flava (family Keroplatidae), has an inversion of the trnE gene. The most unusual result is the extensive rearrangement of the mitochondrial genome of a winter crane fly, Paracladura trichoptera (family Trichocera). The pattern of rearrangement indicates that the mechanism of rearrangement involved a tandem duplication of the entire mitochondrial genome, followed by random and nonrandom loss of one copy of each gene. Another winter crane fly retains the ancestral diperan gene arrangement. A preliminary mitochondrial phylogeny of the Diptera is also presented. PMID:22155689
Schlötelburg, C; von Wintzingerode, F; Hauck, R; Hegemann, W; Göbel, U B
2000-07-01
A 16S-rDNA-based molecular study was performed to determine the bacterial diversity of an anaerobic, 1,2-dichloropropane-dechlorinating bioreactor consortium derived from sediment of the River Saale, Germany. Total community DNA was extracted and bacterial 16S rRNA genes were subsequently amplified using conserved primers. A clone library was constructed and analysed by sequencing the 16S rDNA inserts of randomly chosen clones followed by dot blot hybridization with labelled polynucleotide probes. The phylogenetic analysis revealed significant sequence similarities of several as yet uncultured bacterial species in the bioreactor to those found in other reductively dechlorinating freshwater consortia. In contrast, no close relationship was obtained with as yet uncultured bacteria found in reductively dechlorinating consortia derived from marine habitats. One rDNA clone showed >97% sequence similarity to Dehalobacter species, known for reductive dechlorination of tri- and tetrachloroethene. These results suggest that reductive dechlorination in microbial freshwater habitats depends upon a specific bacterial community structure.
Entropy and long-range memory in random symbolic additive Markov chains
NASA Astrophysics Data System (ADS)
Melnik, S. S.; Usatenko, O. V.
2016-06-01
The goal of this paper is to develop an estimate for the entropy of random symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain with long-range memory. Supposing that the correlations between random elements of the chain are weak, we express the conditional entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the conditional entropy of finite symbolic sequences. We show that the entropy contains two contributions, i.e., the correlation and the fluctuation. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short-range and weak long-range memory.
Entropy and long-range memory in random symbolic additive Markov chains.
Melnik, S S; Usatenko, O V
2016-06-01
The goal of this paper is to develop an estimate for the entropy of random symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain with long-range memory. Supposing that the correlations between random elements of the chain are weak, we express the conditional entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the conditional entropy of finite symbolic sequences. We show that the entropy contains two contributions, i.e., the correlation and the fluctuation. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short-range and weak long-range memory.
Rizk, Francine; Laverdure, Sylvain; d'Alençon, Emmanuelle; Bossin, Hervé; Dupressoir, Thierry
2018-01-01
The Lepidopteran ambidensovirus 1 isolated from Junonia coenia (hereafter JcDV) is an invertebrate parvovirus considered as a viral transduction vector as well as a potential tool for the biological control of insect pests. Previous works showed that JcDV-based circular plasmids experimentally integrate into insect cells genomic DNA. In order to approach the natural conditions of infection and possible integration, we generated linear JcDV- gfp based molecules which were transfected into non permissive Spodoptera frugiperda ( Sf9 ) cultured cells. Cells were monitored for the expression of green fluorescent protein (GFP) and DNA was analyzed for integration of transduced viral sequences. Non-structural protein modulation of the VP-gene cassette promoter activity was additionally assayed. We show that linear JcDV-derived molecules are capable of long term genomic integration and sustained transgene expression in Sf9 cells. As expected, only the deletion of both inverted terminal repeats (ITR) or the polyadenylation signals of NS and VP genes dramatically impairs the global transduction/expression efficiency. However, all the integrated viral sequences we characterized appear "scrambled" whatever the viral content of the transfected vector. Despite a strong GFP expression, we were unable to recover any full sequence of the original constructs and found rearranged viral and non-viral sequences as well. Cellular flanking sequences were identified as non-coding ones. On the other hand, the kinetics of GFP expression over time led us to investigate the apparent down-regulation by non-structural proteins of the VP-gene cassette promoter. Altogether, our results show that JcDV-derived sequences included in linear DNA molecules are able to drive efficiently the integration and expression of a foreign gene into the genome of insect cells, whatever their composition, provided that at least one ITR is present. However, the transfected sequences were extensively rearranged with cellular DNA during or after random integration in the host cell genome. Lastly, the non-structural proteins seem to participate in the regulation of p9 promoter activity rather than to the integration of viral sequences.
Nanoparticle-labeled DNA capture elements for detection and identification of biological agents
NASA Astrophysics Data System (ADS)
Kiel, Johnathan L.; Holwitt, Eric A.; Parker, Jill E.; Vivekananda, Jeevalatha; Franz, Veronica
2004-12-01
Aptamers, synthetic DNA capture elements (DCEs), can be made chemically or in genetically engineered bacteria. DNA capture elements are artificial DNA sequences, from a random pool of sequences, selected for their specific binding to potential biological warfare or terrorism agents. These sequences were selected by an affinity method using filters to which the target agent was attached and the DNA isolated and amplified by polymerase chain reaction (PCR) in an iterative, increasingly stringent, process. The probes can then be conjugated to Quantum Dots and super paramagnetic nanoparticles. The former provide intense, bleach-resistant fluorescent detection of bioagent and the latter provide a means to collect the bioagents with a magnet. The fluorescence can be detected in a flow cytometer, in a fluorescence plate reader, or with a fluorescence microscope. To date, we have made DCEs to Bacillus anthracis spores, Shiga toxin, Venezuelan Equine Encephalitis (VEE) virus, and Francisella tularensis. DCEs can easily distinguish Bacillus anthracis from its nearest relatives, Bacillus cereus and Bacillus thuringiensis. Development of a high through-put process is currently being investigated.
Nullomers and High Order Nullomers in Genomic Sequences
Vergni, Davide; Santoni, Daniele
2016-01-01
A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications. PMID:27906971
Mizas, Ch; Sirakoulis, G Ch; Mardiris, V; Karafyllidis, I; Glykos, N; Sandaltzopoulos, R
2008-04-01
Change of DNA sequence that fuels evolution is, to a certain extent, a deterministic process because mutagenesis does not occur in an absolutely random manner. So far, it has not been possible to decipher the rules that govern DNA sequence evolution due to the extreme complexity of the entire process. In our attempt to approach this issue we focus solely on the mechanisms of mutagenesis and deliberately disregard the role of natural selection. Hence, in this analysis, evolution refers to the accumulation of genetic alterations that originate from mutations and are transmitted through generations without being subjected to natural selection. We have developed a software tool that allows modelling of a DNA sequence as a one-dimensional cellular automaton (CA) with four states per cell which correspond to the four DNA bases, i.e. A, C, T and G. The four states are represented by numbers of the quaternary number system. Moreover, we have developed genetic algorithms (GAs) in order to determine the rules of CA evolution that simulate the DNA evolution process. Linear evolution rules were considered and square matrices were used to represent them. If DNA sequences of different evolution steps are available, our approach allows the determination of the underlying evolution rule(s). Conversely, once the evolution rules are deciphered, our tool may reconstruct the DNA sequence in any previous evolution step for which the exact sequence information was unknown. The developed tool may be used to test various parameters that could influence evolution. We describe a paradigm relying on the assumption that mutagenesis is governed by a near-neighbour-dependent mechanism. Based on the satisfactory performance of our system in the deliberately simplified example, we propose that our approach could offer a starting point for future attempts to understand the mechanisms that govern evolution. The developed software is open-source and has a user-friendly graphical input interface.
cWINNOWER algorithm for finding fuzzy dna motifs
NASA Technical Reports Server (NTRS)
Liang, S.; Samanta, M. P.; Biegel, B. A.
2004-01-01
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs
NASA Technical Reports Server (NTRS)
Liang, Shoudan
2003-01-01
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).
Kijima, T E; Innan, Hideki
2013-11-01
A population genetic simulation framework is developed to understand the behavior and molecular evolution of DNA sequences of transposable elements. Our model incorporates random transposition and excision of transposable element (TE) copies, two modes of selection against TEs, and degeneration of transpositional activity by point mutations. We first investigated the relationships between the behavior of the copy number of TEs and these parameters. Our results show that when selection is weak, the genome can maintain a relatively large number of TEs, but most of them are less active. In contrast, with strong selection, the genome can maintain only a limited number of TEs but the proportion of active copies is large. In such a case, there could be substantial fluctuations of the copy number over generations. We also explored how DNA sequences of TEs evolve through the simulations. In general, active copies form clusters around the original sequence, while less active copies have long branches specific to themselves, exhibiting a star-shaped phylogeny. It is demonstrated that the phylogeny of TE sequences could be informative to understand the dynamics of TE evolution.
Detection of Hepatocyte Clones Containing Integrated Hepatitis B Virus DNA Using Inverse Nested PCR.
Tu, Thomas; Jilbert, Allison R
2017-01-01
Chronic hepatitis B virus (HBV) infection is a major cause of liver cirrhosis and hepatocellular carcinoma (HCC), leading to ~600,000 deaths per year worldwide. Many of the steps that occur during progression from the normal liver to cirrhosis and/or HCC are unknown. Integration of HBV DNA into random sites in the host cell genome occurs as a by-product of the HBV replication cycle and forms a unique junction between virus and cellular DNA. Analyses of integrated HBV DNA have revealed that HCCs are clonal and imply that they develop from the transformation of hepatocytes, the only liver cell known to be infected by HBV. Integrated HBV DNA has also been shown, at least in some tumors, to cause insertional mutagenesis in cancer driver genes, which may facilitate the development of HCC. Studies of HBV DNA integration in the histologically normal liver have provided additional insight into HBV-associated liver disease, suggesting that hepatocytes with a survival or growth advantage undergo high levels of clonal expansion even in the absence of oncogenic transformation. Here we describe inverse nested PCR (invPCR), a highly sensitive method that allows detection, sequencing, and enumeration of virus-cell DNA junctions formed by the integration of HBV DNA. The invPCR protocol is composed of two major steps: inversion of the virus-cell DNA junction and single-molecule nested PCR. The invPCR method is highly specific and inexpensive and can be tailored to DNA extracted from large or small amounts of liver. This procedure also allows detection of genome-wide random integration of any known DNA sequence and is therefore a useful technique for molecular biology, virology, and genetic research.
Assignment of the human caltractin gene (CALT) to Xq28 by fluorescence in situ hybridization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tanaka, Tanaka; Okui, Keiko; Nakamura, Yusuke
1994-12-01
The centrosome is the major microtubule-organizing center of interphase eukaryotic cells, an its duplication is essential to eukaryotic cell division. Caltractin, a structural component of centrosomes, is highly homologous in amino acid sequence to the product of the CDC31 gene of Saccharomyces cerevisiae. In S. cerevisiae, an important role for CDC31 in duplication of the spindle pole body (SPB), a kind of microtubule-organizing center, has been demonstrated by an experiment in which mutant CDC31 prevented SPB duplication and led to formation of a monopolar spindle. In view of the localization of human caltractin in centrosomes and the sequence homology itmore » bears to yeast CDC31, it is reasonable to assume that caltractin functions in humans as CDC31 does in yeast. As a part of the Human Genome Project, we have been determining nucleotide sequences of DNA clones randomly selected from a directionally cloned cDNA library constructed from fetal brain mRNA obtained from Clontech (La Jolla, CA). By comparing 5{prime} partial DNA sequences of these cDNA clones with known DNA sequences in the database, we found one clone that was highly homologous to the caltractin gene of Chlamydomonas, which turned out to be the same as a human gene identified recently. 4 refs., 1 fig.« less
Schadt, Eric E.; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H.; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A.; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew
2013-01-01
Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types. PMID:23093720
Schadt, Eric E; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew
2013-01-01
Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.
Preferential cleavage sites for Sau3A restriction endonuclease in human ribosomal DNA.
Kupriyanova, N S; Kirilenko, P M; Netchvolodov, K K; Ryskov, A P
2000-07-21
Previous studies of cloned ribosomal DNA (rDNA) variants isolated from the cosmid library of human chromosome 13 have revealed some disproportion in representativity of different rDNA regions (N. S. Kupriyanova, K. K. Netchvolodov, P. M. Kirilenko, B. I. Kapanadze, N. K. Yankovsky, and A. P. Ryskov, Mol. Biol. 30, 51-60, 1996). Here we show nonrandom cleavage of human rDNA with Sau3A or its isoshizomer MboI under mild hydrolysis conditions. The hypersensitive cleavage sites were found to be located in the ribosomal intergenic spacer (rIGS), especially in the regions of about 5-5.5 and 11 kb upstream of the rRNA transcription start point. This finding is based on sequencing mapping of the rDNA insert ends in randomly selected cosmid clones of human chromosome 13 and on the data of digestion kinetics of cloned and noncloned human genomic rDNA with Sau3A and MboI. The results show that a methylation status and superhelicity state of the rIGS have no effect on cleavage site sensitivity. It is interesting that all primary cleavage sites are adjacent to or entering into Alu or Psi cdc 27 retroposons of the rIGS suggesting a possible role of neighboring sequences in nuclease accessibility. The results explain nonequal representation of rDNA sequences in the human genomic DNA library used for this study. Copyright 2000 Academic Press.
Ishihara, Satoru; Kotomura, Naoe; Yamamoto, Naoki; Ochiai, Hiroshi
2017-08-15
Ligation-mediated polymerase chain reaction (LM-PCR) is a common technique for amplification of a pool of DNA fragments. Here, a double-stranded oligonucleotide consisting of two primer sequences in back-to-back orientation was designed as an adapter for LM-PCR. When DNA fragments were ligated with this adapter, the fragments were sandwiched between two adapters in random orientations. In the ensuing PCR, ligation products linked at each end to an opposite side of the adapter, i.e. to a distinct primer sequence, were preferentially amplified compared with products linked at each end to an identical primer sequence. The use of this adapter in LM-PCR reduced the impairment of PCR by substrate DNA with a high GC content, compared with the use of traditional LM-PCR adapters. This result suggested that our method has the potential to contribute to reduction of the amplification bias that is caused by an intrinsic property of the sequence context in substrate DNA. A DNA preparation obtained from a chromatin immunoprecipitation assay using pulldown of a specific form of histone H3 was successfully amplified using the modified LM-PCR, and the amplified products could be used as probes in a fluorescence in situ hybridization analysis. Copyright © 2017 Elsevier Inc. All rights reserved.
Xavier, Miguel J; Nixon, Brett; Roman, Shaun D; Aitken, Robert John
2018-01-01
Current approaches for DNA extraction and fragmentation from mammalian spermatozoa provide several challenges for the investigation of the oxidative stress burden carried in the genome of male gametes. Indeed, the potential introduction of oxidative DNA damage induced by reactive oxygen species, reducing agents (dithiothreitol or beta-mercaptoethanol), and DNA shearing techniques used in the preparation of samples for chromatin immunoprecipitation and next-generation sequencing serve to cofound the reliability and accuracy of the results obtained. Here we report optimised methodology that minimises, or completely eliminates, exposure to DNA damaging compounds during extraction and fragmentation procedures. Specifically, we show that Micrococcal nuclease (MNase) digestion prior to cellular lysis generates a greater DNA yield with minimal collateral oxidation while randomly fragmenting the entire paternal genome. This modified methodology represents a significant improvement over traditional fragmentation achieved via sonication in the preparation of genomic DNA from human spermatozoa for downstream applications, such as next-generation sequencing. We also present a redesigned bioinformatic pipeline framework adjusted to correctly analyse this form of data and detect statistically relevant targets of oxidation.
Hong, Seung Beom; Kim, Ki Cheol; Kim, Wook
2015-07-01
We generated complete mitochondrial DNA (mtDNA) control region sequences from 704 unrelated individuals residing in six major provinces in Korea. In addition to our earlier survey of the distribution of mtDNA haplogroup variation, a total of 560 different haplotypes characterized by 271 polymorphic sites were identified, of which 473 haplotypes were unique. The gene diversity and random match probability were 0.9989 and 0.0025, respectively. According to the pairwise comparison of the 704 control region sequences, the mean number of pairwise differences between individuals was 13.47±6.06. Based on the result of mtDNA control region sequences, pairwise FST genetic distances revealed genetic homogeneity of the Korean provinces on a peninsular level, except in samples from Jeju Island. This result indicates there may be a need to formulate a local mtDNA database for Jeju Island, to avoid bias in forensic parameter estimates caused by genetic heterogeneity of the population. Thus, the present data may help not only in personal identification but also in determining maternal lineages to provide an expanded and reliable Korean mtDNA database. These data will be available on the EMPOP database via accession number EMP00661. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
DNA damage induced by ascorbate in the presence of Cu2+.
Kobayashi, S; Ueda, K; Morita, J; Sakai, H; Komano, T
1988-01-25
DNA damage induced by ascorbate in the presence of Cu2+ was investigated by use of bacteriophage phi X174 double-stranded supercoiled DNA and linear restriction fragments as substrates. Single-strand cleavage was induced when supercoiled DNA was incubated with 5 microM-10 mM ascorbate and 50 microM Cu2+ at 37 degrees C for 10 min. The induced DNA damage was analyzed by sequencing of fragments singly labeled at their 5'- or 3'-end. DNA was cleaved directly and almost uniformly at every nucleotide by ascorbate and Cu2+. Piperidine treatment after the reaction showed that ascorbate and Cu2+ induced another kind of DNA damage different from the direct cleavage. The damage proceeded to DNA cleavage by piperidine treatment and was sequence-specific rather than random. These results indicate that ascorbate induces two classes of DNA damage in the presence of Cu2+, one being direct strand cleavage, probably via damage to the DNA backbone, and the other being a base modification labile to alkali treatment. These two classes of DNA damage were inhibited by potassium iodide, catalase and metal chelaters, suggesting the involvement of radicals generated from ascorbate hydroperoxide.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fleischmann, R.D.; Adams, M.D.; White, O.
1995-07-28
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.
Single Molecule Visualization of Protein-DNA Complexes: Watching Machines at Work
NASA Astrophysics Data System (ADS)
Kowalczykowski, Stephen
2013-03-01
We can now watch individual proteins acting on single molecules of DNA. Such imaging provides unprecedented interrogation of fundamental biophysical processes. Visualization is achieved through the application of two complementary procedures. In one, single DNA molecules are attached to a polystyrene bead and are then captured by an optical trap. The DNA, a worm-like coil, is extended either by the force of solution flow in a micro-fabricated channel, or by capturing the opposite DNA end in a second optical trap. In the second procedure, DNA is attached by one end to a glass surface. The coiled DNA is elongated either by continuous solution flow or by subsequently tethering the opposite end to the surface. Protein action is visualized by fluorescent reporters: fluorescent dyes that bind double-stranded DNA (dsDNA), fluorescent biosensors for single-stranded DNA (ssDNA), or fluorescently-tagged proteins. Individual molecules are imaged using either epifluorescence microscopy or total internal reflection fluorescence (TIRF) microscopy. Using these approaches, we imaged the search for DNA sequence homology conducted by the RecA-ssDNA filament. The manner by which RecA protein finds a single homologous sequence in the genome had remained undefined for almost 30 years. Single-molecule imaging revealed that the search occurs through a mechanism termed ``intersegmental contact sampling,'' in which the randomly coiled structure of DNA is essential for reiterative sampling of DNA sequence identity: an example of parallel processing. In addition, the assembly of RecA filaments on single molecules of single-stranded DNA was visualized. Filament assembly requires nucleation of a protein dimer on DNA, and subsequent growth occurs via monomer addition. Furthermore, we discovered a class of proteins that catalyzed both nucleation and growth of filaments, revealing how the cell controls assembly of this protein-DNA complex.
Random and non-random monoallelic expression.
Chess, Andrew
2013-01-01
Monoallelic expression poses an intriguing problem in epigenetics because it requires the unequal treatment of two segments of DNA that are present in the same nucleus and which can have absolutely identical sequences. This review will consider different known types of monoallelic expression. For all monoallelically expressed genes, their respective allele-specific patterns of expression have the potential to affect brain function and dysfunction.
Ozawa, Tatsuhiko; Kondo, Masato; Isobe, Masaharu
2004-01-01
The 3' rapid amplification of cDNA ends (3' RACE) is widely used to isolate the cDNA of unknown 3' flanking sequences. However, the conventional 3' RACE often fails to amplify cDNA from a large transcript if there is a long distance between the 5' gene-specific primer and poly(A) stretch, since the conventional 3' RACE utilizes 3' oligo-dT-containing primer complementary to the poly(A) tail of mRNA at the first strand cDNA synthesis. To overcome this problem, we have developed an improved 3' RACE method suitable for the isolation of cDNA derived from very large transcripts. By using the oligonucleotide-containing random 9mer together with the GC-rich sequence for the suppression PCR technology at the first strand of cDNA synthesis, we have been able to amplify the cDNA from a very large transcript, such as the microtubule-actin crosslinking factor 1 (MACF1) gene, which codes a transcript of 20 kb in size. When there is no splicing variant, our highly specific amplification allows us to perform the direct sequencing of 3' RACE products without requiring cloning in bacterial hosts. Thus, this stepwise 3' RACE walking will help rapid characterization of the 3' structure of a gene, even when it encodes a very large transcript.
Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg
2017-01-01
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404
Target Site Recognition by a Diversity-Generating Retroelement
Guo, Huatao; Tse, Longping V.; Nieh, Angela W.; Czornyj, Elizabeth; Williams, Steven; Oukil, Sabrina; Liu, Vincent B.; Miller, Jeff F.
2011-01-01
Diversity-generating retroelements (DGRs) are in vivo sequence diversification machines that are widely distributed in bacterial, phage, and plasmid genomes. They function to introduce vast amounts of targeted diversity into protein-encoding DNA sequences via mutagenic homing. Adenine residues are converted to random nucleotides in a retrotransposition process from a donor template repeat (TR) to a recipient variable repeat (VR). Using the Bordetella bacteriophage BPP-1 element as a prototype, we have characterized requirements for DGR target site function. Although sequences upstream of VR are dispensable, a 24 bp sequence immediately downstream of VR, which contains short inverted repeats, is required for efficient retrohoming. The inverted repeats form a hairpin or cruciform structure and mutational analysis demonstrated that, while the structure of the stem is important, its sequence can vary. In contrast, the loop has a sequence-dependent function. Structure-specific nuclease digestion confirmed the existence of a DNA hairpin/cruciform, and marker coconversion assays demonstrated that it influences the efficiency, but not the site of cDNA integration. Comparisons with other phage DGRs suggested that similar structures are a conserved feature of target sequences. Using a kanamycin resistance determinant as a reporter, we found that transplantation of the IMH and hairpin/cruciform-forming region was sufficient to target the DGR diversification machinery to a heterologous gene. In addition to furthering our understanding of DGR retrohoming, our results suggest that DGRs may provide unique tools for directed protein evolution via in vivo DNA diversification. PMID:22194701
Albayrak, Levent; Khanipov, Kamil; Pimenova, Maria; Golovko, George; Rojas, Mark; Pavlidis, Ioannis; Chumakov, Sergei; Aguilar, Gerardo; Chávez, Arturo; Widger, William R; Fofanov, Yuriy
2016-12-12
Low-abundance mutations in mitochondrial populations (mutations with minor allele frequency ≤ 1%), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA). Performed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA. Analysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4% of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9%, when low-abundance mutations at 100% of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.
Blind Predictions of DNA and RNA Tweezers Experiments with Force and Torque
Chou, Fang-Chieh; Lipfert, Jan; Das, Rhiju
2014-01-01
Single-molecule tweezers measurements of double-stranded nucleic acids (dsDNA and dsRNA) provide unprecedented opportunities to dissect how these fundamental molecules respond to forces and torques analogous to those applied by topoisomerases, viral capsids, and other biological partners. However, tweezers data are still most commonly interpreted post facto in the framework of simple analytical models. Testing falsifiable predictions of state-of-the-art nucleic acid models would be more illuminating but has not been performed. Here we describe a blind challenge in which numerical predictions of nucleic acid mechanical properties were compared to experimental data obtained recently for dsRNA under applied force and torque. The predictions were enabled by the HelixMC package, first presented in this paper. HelixMC advances crystallography-derived base-pair level models (BPLMs) to simulate kilobase-length dsDNAs and dsRNAs under external forces and torques, including their global linking numbers. These calculations recovered the experimental bending persistence length of dsRNA within the error of the simulations and accurately predicted that dsRNA's “spring-like” conformation would give a two-fold decrease of stretch modulus relative to dsDNA. Further blind predictions of helix torsional properties, however, exposed inaccuracies in current BPLM theory, including three-fold discrepancies in torsional persistence length at the high force limit and the incorrect sign of dsRNA link-extension (twist-stretch) coupling. Beyond these experiments, HelixMC predicted that ‘nucleosome-excluding’ poly(A)/poly(T) is at least two-fold stiffer than random-sequence dsDNA in bending, stretching, and torsional behaviors; Z-DNA to be at least three-fold stiffer than random-sequence dsDNA, with a near-zero link-extension coupling; and non-negligible effects from base pair step correlations. We propose that experimentally testing these predictions should be powerful next steps for understanding the flexibility of dsDNA and dsRNA in sequence contexts and under mechanical stresses relevant to their biology. PMID:25102226
Tuma Sabah, Jinan; Zulkifli, Razauden Mohamed; Shahir, Shafinaz; Ahmed, Farediah; Abdul Kadir, Mohammed Rafiq; Zakaria, Zarita
2018-05-15
Distinctive bioactivities possessed by luteolin (3', 4', 5, 7-tetrahydroxy-flavone) are advantageous for sundry practical applications. This paper reports the in vitro selection and characterization of single stranded-DNA (ssDNA) aptamers, specific for luteolin (LUT). 76-mer library containing 1015 randomized ssDNA were screened via systematic evolution of ligands by exponential enrichment (SELEX). The recovered ssDNA pool from the 8th round was amplified with unlabeled primers and cloned into PSTBlue-1 vector prior to sequencing. 22 of LUT-binding aptamer variants were further classified into one of the seven groups based on their N40 random sequence regions, wherein one representative from each group was characterized. The dissociation constant of aptamers designated as LUT#28, LUT#20 and LUT#3 was discerned to be 107, 214 and 109 nM, respectively with high binding affinity towards LUT. Prediction analysis of the secondary structure suggested discrete features with typical loop and stem motifs. Furthermore, LUT#3 displayed higher specificity with insignificant binding toward kaempferol and quercetin despite its structural and functional similarity compared to LUT#28 and LUT#20. Further LUT#3 can detect free luteolin within 0.2-1 mM in solution. It was suggested that LUT#3 aptamer were the most suitable for LUT recognition tool at laboratory scale based on the condition tested. Copyright © 2018 Elsevier Inc. All rights reserved.
Targeted gene insertion for molecular medicine.
Voigt, Katrin; Izsvák, Zsuzsanna; Ivics, Zoltán
2008-11-01
Genomic insertion of a functional gene together with suitable transcriptional regulatory elements is often required for long-term therapeutical benefit in gene therapy for several genetic diseases. A variety of integrating vectors for gene delivery exist. Some of them exhibit random genomic integration, whereas others have integration preferences based on attributes of the targeted site, such as primary DNA sequence and physical structure of the DNA, or through tethering to certain DNA sequences by host-encoded cellular factors. Uncontrolled genomic insertion bears the risk of the transgene being silenced due to chromosomal position effects, and can lead to genotoxic effects due to mutagenesis of cellular genes. None of the vector systems currently used in either preclinical experiments or clinical trials displays sufficient preferences for target DNA sequences that would ensure appropriate and reliable expression of the transgene and simultaneously prevent hazardous side effects. We review in this paper the advantages and disadvantages of both viral and non-viral gene delivery technologies, discuss mechanisms of target site selection of integrating genetic elements (viruses and transposons), and suggest distinct molecular strategies for targeted gene delivery.
Tohala, Luma; Oukacine, Farid; Ravelet, Corinne; Peyrin, Eric
2017-05-01
We recently reported that a great variety of DNA oligonucleotides (ONs) used as chiral selectors in partial-filling capillary electrophoresis (CE) exhibited interesting enantioresolution properties toward low-affinity DNA binders. Herein, the sequence prerequisites of ONs for the CE enantioseparation process were studied. First, the chiral resolution properties of a series of homopolymeric sequences (Poly-dT) of different lengths (from 5 to 60-mer) were investigated. It was shown that the size increase-dependent random coil-like conformation of Poly-dT favorably acted on the apparent selectivity and resolution. The base-unpairing state constituted also an important factor in the chiral resolution ability of ONs as the switch from the single-stranded to double-stranded structure was responsible for a significant decrease in the analyte selectivity range. Finally, the chemical diversity enhanced the enantioresolution ability of single-stranded ONs. The present work could lay the foundation for the design of performant ON chiral selectors for the CE separation of weak DNA binder enantiomers. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco
2016-03-01
Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
Li, XiaoChing; Wang, Xiu-Jie; Tannenhauser, Jonathan; Podell, Sheila; Mukherjee, Piali; Hertel, Moritz; Biane, Jeremy; Masuda, Shoko; Nottebohm, Fernando; Gaasterland, Terry
2007-01-01
Vocal learning and neuronal replacement have been studied extensively in songbirds, but until recently, few molecular and genomic tools for songbird research existed. Here we describe new molecular/genomic resources developed in our laboratory. We made cDNA libraries from zebra finch (Taeniopygia guttata) brains at different developmental stages. A total of 11,000 cDNA clones from these libraries, representing 5,866 unique gene transcripts, were randomly picked and sequenced from the 3′ ends. A web-based database was established for clone tracking, sequence analysis, and functional annotations. Our cDNA libraries were not normalized. Sequencing ESTs without normalization produced many developmental stage-specific sequences, yielding insights into patterns of gene expression at different stages of brain development. In particular, the cDNA library made from brains at posthatching day 30–50, corresponding to the period of rapid song system development and song learning, has the most diverse and richest set of genes expressed. We also identified five microRNAs whose sequences are highly conserved between zebra finch and other species. We printed cDNA microarrays and profiled gene expression in the high vocal center of both adult male zebra finches and canaries (Serinus canaria). Genes differentially expressed in the high vocal center were identified from the microarray hybridization results. Selected genes were validated by in situ hybridization. Networks among the regulated genes were also identified. These resources provide songbird biologists with tools for genome annotation, comparative genomics, and microarray gene expression analysis. PMID:17426146
Identification of apple cultivars on the basis of simple sequence repeat markers.
Liu, G S; Zhang, Y G; Tao, R; Fang, J G; Dai, H Y
2014-09-12
DNA markers are useful tools that play an important role in plant cultivar identification. They are usually based on polymerase chain reaction (PCR) and include simple sequence repeats (SSRs), inter-simple sequence repeats, and random amplified polymorphic DNA. However, DNA markers were not used effectively in the complete identification of plant cultivars because of the lack of known DNA fingerprints. Recently, a novel approach called the cultivar identification diagram (CID) strategy was developed to facilitate the use of DNA markers for separate plant individuals. The CID was designed whereby a polymorphic maker was generated from each PCR that directly allowed for cultivar sample separation at each step. Therefore, it could be used to identify cultivars and varieties easily with fewer primers. In this study, 60 apple cultivars, including a few main cultivars in fields and varieties from descendants (Fuji x Telamon) were examined. Of the 20 pairs of SSR primers screened, 8 pairs gave reproducible, polymorphic DNA amplification patterns. The banding patterns obtained from these 8 primers were used to construct a CID map. Each cultivar or variety in this study was distinguished from the others completely, indicating that this method can be used for efficient cultivar identification. The result contributed to studies on germplasm resources and the seedling industry in fruit trees.
Povinelli, C M
1992-01-01
In order to detect sequence-based information predictive for the location of eukaryotic transcriptional regulatory domains, the frequencies and distributions of the 36 possible purine/pyrimidine reverse complement hexamer pairs was determined for test sets of real and random sequences. The distribution of one of the hexamer pairs (RRYYRR/YYRRYY, referred to as M1) was further examined in a larger set of sequences (> 32 genes, 230 kb). Predominant clusters of M1 and the locations of eukaryotic transcriptional regulatory domains were found to be associated and non-randomly distributed along the DNA consistent with a periodicity of approximately 1.2 kb. In the context of higher ordered chromatin this would align promoters, enhancers and the predominant clusters of M1 longitudinally along one face of a 30 nm fiber. Using only information about the distribution of the M1 motif, 50-70% of a sequence could be eliminated as being unlikely to contain transcriptional regulatory domains with an 87% recovery of the regulatory domains present.
DNA-based identification of Brassica vegetable species for the juice industry.
Etoh, Kazumi; Niijima, Noritaka; Yokoshita, Masahiko; Fukuoka, Shin-Ichi
2003-10-01
Since kale (Brassica oleracea var. acephala), a cruciferous vegetable with a high level of vitamins and functional compounds beneficial to health and wellness, has become widely used in the juice industry, a precise method for quality control of vegetable species is necessary. We describe here a DNA-based identification method to distinguish kale from cabbage (Brassica oleracea var. capitata), a closely related species, which can be inadvertently mixed with kale during the manufacturing process. Using genomic DNA from these vegetables and combinatory sets of nucleotide primers, we screened for random amplified polymorphic DNA (RAPD) fragments and found three cabbage-specific fragments. These RAPD fragments, with lengths of 1.4, 0.5, and 1.5 kb, were purified, subcloned, and sequenced. Based on sequence-tagged sites (STS), we designed sets of primers to detect cabbage-specific identification (CAI) DNA markers. Utilizing the CAI markers, we successfully distinguished more than 10 different local cabbage accessions from 20 kale accessions, and identified kale juices experimentally spiked with different amounts of cabbage.
Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.
Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C
2018-01-10
Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing cancer cells. Copyright © 2017 Elsevier B.V. All rights reserved.
Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.
Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V
2018-02-01
Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.
RAD tag sequencing as a source of SNP markers in Cynara cardunculus L
2012-01-01
Background The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. Results RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. Conclusion The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria. PMID:22214349
Baxter, Laura L; Hsu, Benjamin J; Umayam, Lowell; Wolfsberg, Tyra G; Larson, Denise M; Frith, Martin C; Kawai, Jun; Hayashizaki, Yoshihide; Carninci, Piero; Pavan, William J
2007-06-01
As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.
The (not so) immortal strand hypothesis.
Tomasetti, Cristian; Bozic, Ivana
2015-03-01
Non-random segregation of DNA strands during stem cell replication has been proposed as a mechanism to minimize accumulated genetic errors in stem cells of rapidly dividing tissues. According to this hypothesis, an "immortal" DNA strand is passed to the stem cell daughter and not the more differentiated cell, keeping the stem cell lineage replication error-free. After it was introduced, experimental evidence both in favor and against the hypothesis has been presented. Using a novel methodology that utilizes cancer sequencing data we are able to estimate the rate of accumulation of mutations in healthy stem cells of the colon, blood and head and neck tissues. We find that in these tissues mutations in stem cells accumulate at rates strikingly similar to those expected without the protection from the immortal strand mechanism. Utilizing an approach that is fundamentally different from previous efforts to confirm or refute the immortal strand hypothesis, we provide evidence against non-random segregation of DNA during stem cell replication. Our results strongly suggest that parental DNA is passed randomly to stem cell daughters and provides new insight into the mechanism of DNA replication in stem cells. Copyright © 2015. Published by Elsevier B.V.
The (not so) Immortal Strand Hypothesis
Tomasetti, Cristian; Bozic, Ivana
2015-01-01
Background Non-random segregation of DNA strands during stem cell replication has been proposed as a mechanism to minimize accumulated genetic errors in stem cells of rapidly dividing tissues. According to this hypothesis, an “immortal” DNA strand is passed to the stem cell daughter and not the more differentiated cell, keeping the stem cell lineage replication error-free. After it was introduced, experimental evidence both in favor and against the hypothesis has been presented. Principal Findings Using a novel methodology that utilizes cancer sequencing data we are able to estimate the rate of accumulation of mutations in healthy stem cells of the colon, blood and head and neck tissues. We to find that in these tissues mutations in stem cells accumulate at rates strikingly similar to those expected without the protection from the immortal strand mechanism. Significance Utilizing an approach that is fundamentally different from previous efforts to confirm or refute the immortal strand hypothesis, we provide strong evidence against non-random segregation of DNA during stem cell replication. Our results strongly suggest that parental DNA is passed randomly to stem cell daughters and provides new insight into the mechanism of DNA replication in stem cells. PMID:25700960
Random and Non-Random Monoallelic Expression
Chess, Andrew
2013-01-01
Monoallelic expression poses an intriguing problem in epigenetics because it requires the unequal treatment of two segments of DNA that are present in the same nucleus and which can have absolutely identical sequences. This review will consider different known types of monoallelic expression. For all monoallelically expressed genes, their respective allele-specific patterns of expression have the potential to affect brain function and dysfunction. PMID:22763620
Universality of long-range correlations in expansion randomization systems
NASA Astrophysics Data System (ADS)
Messer, P. W.; Lässig, M.; Arndt, P. F.
2005-10-01
We study the stochastic dynamics of sequences evolving by single-site mutations, segmental duplications, deletions, and random insertions. These processes are relevant for the evolution of genomic DNA. They define a universality class of non-equilibrium 1D expansion-randomization systems with generic stationary long-range correlations in a regime of growing sequence length. We obtain explicitly the two-point correlation function of the sequence composition and the distribution function of the composition bias in sequences of finite length. The characteristic exponent χ of these quantities is determined by the ratio of two effective rates, which are explicitly calculated for several specific sequence evolution dynamics of the universality class. Depending on the value of χ, we find two different scaling regimes, which are distinguished by the detectability of the initial composition bias. All analytic results are accurately verified by numerical simulations. We also discuss the non-stationary build-up and decay of correlations, as well as more complex evolutionary scenarios, where the rates of the processes vary in time. Our findings provide a possible example for the emergence of universality in molecular biology.
Xu, Chao; Dong, Wenpan; Shi, Shuo; Cheng, Tao; Li, Changhao; Liu, Yanlei; Wu, Ping; Wu, Hongkun; Gao, Peng; Zhou, Shiliang
2015-11-01
A well-covered reference library is crucial for successful identification of species by DNA barcoding. The biggest difficulty in building such a reference library is the lack of materials of organisms. Herbarium collections are potentially an enormous resource of materials. In this study, we demonstrate that it is likely to build such reference libraries using the reconstructed (self-primed PCR amplified) DNA from the herbarium specimens. We used 179 rosaceous specimens to test the effects of DNA reconstruction, 420 randomly sampled specimens to estimate the usable percentage and another 223 specimens of true cherries (Cerasus, Rosaceae) to test the coverage of usable specimens to the species. The barcode rbcLb (the central four-sevenths of rbcL gene) and matK was each amplified in two halves and sequenced on Roche GS 454 FLX+. DNA from the herbarium specimens was typically shorter than 300 bp. DNA reconstruction enabled amplification fragments of 400-500 bp without bringing or inducing any sequence errors. About one-third of specimens in the national herbarium of China (PE) were proven usable after DNA reconstruction. The specimens in PE cover all Chinese true cherry species and 91.5% of vascular species listed in Flora of China. It is very possible to build well-covered reference libraries for DNA barcoding of vascular species in China. As exemplified in this study, DNA reconstruction and DNA-labelled next-generation sequencing can accelerate the construction of local reference libraries. By putting the local reference libraries together, a global library for DNA barcoding becomes closer to reality. © 2015 John Wiley & Sons Ltd.
Genetic dissection of the consensus sequence for the class 2 and class 3 flagellar promoters
Wozniak, Christopher E.; Hughes, Kelly T.
2008-01-01
Summary Computational searches for DNA binding sites often utilize consensus sequences. These search models make assumptions that the frequency of a base pair in an alignment relates to the base pair’s importance in binding and presume that base pairs contribute independently to the overall interaction with the DNA binding protein. These two assumptions have generally been found to be accurate for DNA binding sites. However, these assumptions are often not satisfied for promoters, which are involved in additional steps in transcription initiation after RNA polymerase has bound to the DNA. To test these assumptions for the flagellar regulatory hierarchy, class 2 and class 3 flagellar promoters were randomly mutagenized in Salmonella. Important positions were then saturated for mutagenesis and compared to scores calculated from the consensus sequence. Double mutants were constructed to determine how mutations combined for each promoter type. Mutations in the binding site for FlhD4C2, the activator of class 2 promoters, better satisfied the assumptions for the binding model than did mutations in the class 3 promoter, which is recognized by the σ28 transcription factor. These in vivo results indicate that the activator sites within flagellar promoters can be modeled using simple assumptions but that the DNA sequences recognized by the flagellar sigma factor require more complex models. PMID:18486950
An insight into the sialome of the blood-sucking bug Triatoma infestans, a vector of Chagas' disease
Assumpção, Teresa C. F.; Francischetti, Ivo M. B.; Andersen, John F.; Schwarz, Alexandra; Santana, Jaime M.; Ribeiro, José M. C.
2008-01-01
Triatoma infestans is a hemiptera, vector of Chagas’ disease, that feeds exclusively on vertebrate blood in all life stages. Hematophagous insects’ salivary glands (SG) produce potent pharmacological compounds that counteract host hemostasis, including anti-clotting, anti-platelet, and vasodilatory molecules. To obtain a further insight into the salivary biochemical and pharmacological complexity of this insect, a cDNA library from its salivary glands was randomly sequenced. Also, salivary proteins were submitted to two dimentional gel (2D-gel) electrophoresis followed by MS analysis. We present the analysis of a set of 1,534 (SG) cDNA sequences, 645 of which coded for proteins of a putative secretory nature. Most salivary proteins described as lipocalins matched peptide sequences obtained from proteomic results. PMID:18207082
Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy
NASA Astrophysics Data System (ADS)
Chen, Ellson Y.
1997-05-01
So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.
Local Renyi entropic profiles of DNA sequences.
Vinga, Susana; Almeida, Jonas S
2007-10-16
In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
Local Renyi entropic profiles of DNA sequences
Vinga, Susana; Almeida, Jonas S
2007-01-01
Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at . Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures. PMID:17939871
Silva-Sanchez, Aaron; Liu, Cun Ren; Vale, Andre M.; Khass, Mohamed; Kapoor, Pratibha; Elgavish, Ada; Ivanov, Ivaylo I.; Ippolito, Gregory C.; Schelonka, Robert L.; Schoeb, Trenton R.; Burrows, Peter D.; Schroeder, Harry W.
2015-01-01
Variability in the developing antibody repertoire is focused on the third complementarity determining region of the H chain (CDR-H3), which lies at the center of the antigen binding site where it often plays a decisive role in antigen binding. The power of VDJ recombination and N nucleotide addition has led to the common conception that the sequence of CDR-H3 is unrestricted in its variability and random in its composition. Under this view, the immune response is solely controlled by somatic positive and negative clonal selection mechanisms that act on individual B cells to promote production of protective antibodies and prevent the production of self-reactive antibodies. This concept of a repertoire of random antigen binding sites is inconsistent with the observation that diversity (DH) gene segment sequence content by reading frame (RF) is evolutionarily conserved, creating biases in the prevalence and distribution of individual amino acids in CDR-H3. For example, arginine, which is often found in the CDR-H3 of dsDNA binding autoantibodies, is under-represented in the commonly used DH RFs rearranged by deletion, but is a frequent component of rarely used inverted RF1 (iRF1), which is rearranged by inversion. To determine the effect of altering this germline bias in DH gene segment sequence on autoantibody production, we generated mice that by genetic manipulation are forced to utilize an iRF1 sequence encoding two arginines. Over a one year period we collected serial serum samples from these unimmunized, specific pathogen-free mice and found that more than one-fifth of them contained elevated levels of dsDNA-binding IgG, but not IgM; whereas mice with a wild type DH sequence did not. Thus, germline bias against the use of arginine enriched DH sequence helps to reduce the likelihood of producing self-reactive antibodies. PMID:25706374
Continuous in vitro evolution of bacteriophage RNA polymerase promoters
NASA Technical Reports Server (NTRS)
Breaker, R. R.; Banerji, A.; Joyce, G. F.
1994-01-01
Rapid in vitro evolution of bacteriophage T7, T3, and SP6 RNA polymerase promoters was achieved by a method that allows continuous enrichment of DNAs that contain functional promoter elements. This method exploits the ability of a special class of nucleic acid molecules to replicate continuously in the presence of both a reverse transcriptase and a DNA-dependent RNA polymerase. Replication involves the synthesis of both RNA and cDNA intermediates. The cDNA strand contains an embedded promoter sequence, which becomes converted to a functional double-stranded promoter element, leading to the production of RNA transcripts. Synthetic cDNAs, including those that contain randomized promoter sequences, can be used to initiate the amplification cycle. However, only those cDNAs that contain functional promoter sequences are able to produce RNA transcripts. Furthermore, each RNA transcript encodes the RNA polymerase promoter sequence that was responsible for initiation of its own transcription. Thus, the population of amplifying molecules quickly becomes enriched for those templates that encode functional promoters. Optimal promoter sequences for phage T7, T3, and SP6 RNA polymerase were identified after a 2-h amplification reaction, initiated in each case with a pool of synthetic cDNAs encoding greater than 10(10) promoter sequence variants.
Identification of species with DNA-based technology: current progress and challenges.
Pereira, Filipe; Carneiro, João; Amorim, António
2008-01-01
One of the grand challenges of modern biology is to develop accurate and reliable technologies for a rapid screening of DNA sequence variation. This topic of research is of prime importance for the detection and identification of species in numerous fields of investigation, such as taxonomy, epidemiology, forensics, archaeology or ecology. Molecular identification is also central for the diagnosis, treatment and control of infections caused by different pathogens. In recent years, a variety of DNA-based approaches have been developed for the identification of individuals in a myriad of taxonomic groups. Here, we provide an overview of most commonly used assays, with emphasis on those based on DNA hybridizations, restriction enzymes, random PCR amplifications, species-specific PCR primers and DNA sequencing. A critical evaluation of all methods is presented focusing on their discriminatory power, reproducibility and user-friendliness. Having in mind that the current trend is to develop small-scale devices with a high-throughput capacity, we briefly review recent technological achievements for DNA analysis that offer great potentials for the identification of species.
NASA Astrophysics Data System (ADS)
Mackiewicz, P.; Gierlik, A.; Kowalczuk, M.; Szczepanik, D.; Dudek, M. R.; Cebrat, S.
1999-12-01
We have analysed protein coding and intergenic sequences in the Borrelia burgdorferi (the Lyme disease bacterium) genome using different kinds of DNA walks. Genes occupying the leading strand of DNA have significantly different nucleotide composition from genes occupying the lagging strand. Nucleotide compositional bias of the two DNA strands reflects the aminoacid composition of proteins. 96% of genes coding for ribosomal proteins lie on the leading DNA strand, which suggests that the positions of these as well as other genes are non-random. In the B. burgdorferi genome, the asymmetry in intergenic DNA sequences is lower than the asymmetry in the third positions in codons. All these characters of the B. burgdorferi genome suggest that both replication-associated mutational pressure and recombination mechanisms have established the specific structure of the genome and now any recombination leading to inversion of a gene in respect to the direction of replication is forbidden. This property of the genome allows us to assume that it is in a steady state, which enables us to fix some parameters for simulations of DNA evolution.
DNA motifs associated with aberrant CpG island methylation.
Feltus, F Alex; Lee, Eva K; Costello, Joseph F; Plass, Christoph; Vertino, Paula M
2006-05-01
Epigenetic silencing involving the aberrant methylation of promoter region CpG islands is widely recognized as a tumor suppressor silencing mechanism in cancer. However, the molecular pathways underlying aberrant DNA methylation remain elusive. Recently we showed that, on a genome-wide level, CpG island loci differ in their intrinsic susceptibility to aberrant methylation and that this susceptibility can be predicted based on underlying sequence context. These data suggest that there are sequence/structural features that contribute to the protection from or susceptibility to aberrant methylation. Here we use motif elicitation coupled with classification techniques to identify DNA sequence motifs that selectively define methylation-prone or methylation-resistant CpG islands. Motifs common to 28 methylation-prone or 47 methylation-resistant CpG island-containing genomic fragments were determined using the MEME and MAST algorithms (). The five most discriminatory motifs derived from methylation-prone sequences were found to be associated with CpG islands in general and were nonrandomly distributed throughout the genome. In contrast, the eight most discriminatory motifs derived from the methylation-resistant CpG islands were randomly distributed throughout the genome. Interestingly, this latter group tended to associate with Alu and other repetitive sequences. Used together, the frequency of occurrence of these motifs successfully discriminated methylation-prone and methylation-resistant CpG island groups with an accuracy of 87% after 10-fold cross-validation. The motifs identified here are candidate methylation-targeting or methylation-protection DNA sequences.
Lewandowska, Dagmara W; Zagordi, Osvaldo; Geissberger, Fabienne-Desirée; Kufner, Verena; Schmutz, Stefan; Böni, Jürg; Metzner, Karin J; Trkola, Alexandra; Huber, Michael
2017-08-08
Sequence-specific PCR is the most common approach for virus identification in diagnostic laboratories. However, as specific PCR only detects pre-defined targets, novel virus strains or viruses not included in routine test panels will be missed. Recently, advances in high-throughput sequencing allow for virus-sequence-independent identification of entire virus populations in clinical samples, yet standardized protocols are needed to allow broad application in clinical diagnostics. Here, we describe a comprehensive sample preparation protocol for high-throughput metagenomic virus sequencing using random amplification of total nucleic acids from clinical samples. In order to optimize metagenomic sequencing for application in virus diagnostics, we tested different enrichment and amplification procedures on plasma samples spiked with RNA and DNA viruses. A protocol including filtration, nuclease digestion, and random amplification of RNA and DNA in separate reactions provided the best results, allowing reliable recovery of viral genomes and a good correlation of the relative number of sequencing reads with the virus input. We further validated our method by sequencing a multiplexed viral pathogen reagent containing a range of human viruses from different virus families. Our method proved successful in detecting the majority of the included viruses with high read numbers and compared well to other protocols in the field validated against the same reference reagent. Our sequencing protocol does work not only with plasma but also with other clinical samples such as urine and throat swabs. The workflow for virus metagenomic sequencing that we established proved successful in detecting a variety of viruses in different clinical samples. Our protocol supplements existing virus-specific detection strategies providing opportunities to identify atypical and novel viruses commonly not accounted for in routine diagnostic panels.
Abécassis, V; Pompon, D; Truan, G
2000-10-15
The design of a family shuffling strategy (CLERY: Combinatorial Libraries Enhanced by Recombination in Yeast) associating PCR-based and in vivo recombination and expression in yeast is described. This strategy was tested using human cytochrome P450 CYP1A1 and CYP1A2 as templates, which share 74% nucleotide sequence identity. Construction of highly shuffled libraries of mosaic structures and reduction of parental gene contamination were two major goals. Library characterization involved multiprobe hybridization on DNA macro-arrays. The statistical analysis of randomly selected clones revealed a high proportion of chimeric genes (86%) and a homogeneous representation of the parental contribution among the sequences (55.8 +/- 2.5% for parental sequence 1A2). A microtiter plate screening system was designed to achieve colorimetric detection of polycyclic hydrocarbon hydroxylation by transformed yeast cells. Full sequences of five randomly picked and five functionally selected clones were analyzed. Results confirmed the shuffling efficiency and allowed calculation of the average length of sequence exchange and mutation rates. The efficient and statistically representative generation of mosaic structures by this type of family shuffling in a yeast expression system constitutes a novel and promising tool for structure-function studies and tuning enzymatic activities of multicomponent eucaryote complexes involving non-soluble enzymes.
Murray, R; Pederson, K; Prosser, H; Muller, D; Hutchison, C A; Frelinger, J A
1988-01-01
We have used random oligonucleotide mutagenesis (or saturation mutagenesis) to create a library of point mutations in the alpha 1 protein domain of a Major Histocompatibility Complex (MHC) molecule. This protein domain is critical for T cell and B cell recognition. We altered the MHC class I H-2DP gene sequence such that synthetic mutant alpha 1 exons (270 bp of coding sequence), which contain mutations identified by sequence analysis, can replace the wild type alpha 1 exon. The synthetic exons were constructed from twelve overlapping oligonucleotides which contained an average of 1.3 random point mutations per intact exon. DNA sequence analysis of mutant alpha 1 exons has shown a point mutant distribution that fits a Poisson distribution, and thus emphasizes the utility of this mutagenesis technique to "scan" a large protein sequence for important mutations. We report our use of saturation mutagenesis to scan an entire exon of the H-2DP gene, a cassette strategy to replace the wild type alpha 1 exon with individual mutant alpha 1 exons, and analysis of mutant molecules expressed on the surface of transfected mouse L cells. Images PMID:2903482
Applying Agrep to r-NSA to solve multiple sequences approximate matching.
Ni, Bing; Wong, Man-Hon; Lam, Chi-Fai David; Leung, Kwong-Sak
2014-01-01
This paper addresses the approximate matching problem in a database consisting of multiple DNA sequences, where the proposed approach applies Agrep to a new truncated suffix array, r-NSA. The construction time of the structure is linear to the database size, and the computations of indexing a substring in the structure are constant. The number of characters processed in applying Agrep is analysed theoretically, and the theoretical upper-bound can approximate closely the empirical number of characters, which is obtained through enumerating the characters in the actual structure built. Experiments are carried out using (synthetic) random DNA sequences, as well as (real) genome sequences including Hepatitis-B Virus and X-chromosome. Experimental results show that, compared to the straight-forward approach that applies Agrep to multiple sequences individually, the proposed approach solves the matching problem in much shorter time. The speed-up of our approach depends on the sequence patterns, and for highly similar homologous genome sequences, which are the common cases in real-life genomes, it can be up to several orders of magnitude.
Unlocking hidden genomic sequence
Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.
2004-01-01
Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330
Single-molecule dilution and multiple displacement amplification for molecular haplotyping.
Paul, Philip; Apgar, Josh
2005-04-01
Separate haploid analysis is frequently required for heterozygous genotyping to resolve phase ambiguity or confirm allelic sequence. We demonstrate a technique of single-molecule dilution followed by multiple strand displacement amplification to haplotype polymorphic alleles. Dilution of DNA to haploid equivalency, or a single molecule, is a simple method for separating di-allelic DNA. Strand displacement amplification is a robust method for non-specific DNA expansion that employs random hexamers and phage polymerase Phi29 for double-stranded DNA displacement and primer extension, resulting in high processivity and exceptional product length. Single-molecule dilution was followed by strand displacement amplification to expand separated alleles to microgram quantities of DNA for more efficient haplotype analysis of heterozygous genes.
Prychitko, T M; Moore, W S
1997-10-01
Estimating phylogenies from DNA sequence data has become the major methodology of molecular phylogenetics. To date, molecular phylogenetics of the vertebrates has been very dependent on mtDNA, but studies involving mtDNA are limited because the several genes comprising the mt-genome are inherited as a single linkage group. The only apparent solution to this problem is to sequence additional genes, each representing a distinct linkage group, so that the resultant gene trees provide independent estimates of the species tree. There exists the need to find novel gene sequences which contain enough phylogenetic information to resolve relationships between closely related species. A possible source is the nuclear-encoded introns, because they evolve more rapidly than exons. We designed primers to amplify and sequence the 7 intron from the beta-fibrinogen gene for a recently evolved group, the woodpeckers. We sequenced the entire intron for 10 specimens representing five species. Nucleotide substitutions are randomly distributed along the length of the intron, suggesting selective neutrality. A preliminary analysis indicates that the phylogenetic signal in the intron is as strong as that in the mitochondrial encoded cytochrome b (cyt b) gene. The topology of the beta-fibrinogen tree is identical to that of the cyt b tree. This analysis demonstrates the ability of the 7 intron of beta-fibrinogen to provide well resolved, independent gene trees for recently evolved groups and establishes it as a source of sequences to be used in other phylogenetic studies. Copyright 1997 Academic Press
Choi, Sangdun; Chang, Mi Sook; Stuecker, Tara; Chung, Christine; Newcombe, David A; Venkateswaran, Kasthuri
2012-12-01
In this study, fosmid cloning strategies were used to assess the microbial populations in water from the International Space Station (ISS) drinking water system (henceforth referred to as Prebiocide and Tank A water samples). The goals of this study were: to compare the sensitivity of the fosmid cloning strategy with that of traditional culture-based and 16S rRNA-based approaches and to detect the widest possible spectrum of microbial populations during the water purification process. Initially, microbes could not be cultivated, and conventional PCR failed to amplify 16S rDNA fragments from these low biomass samples. Therefore, randomly primed rolling-circle amplification was used to amplify any DNA that might be present in the samples, followed by size selection by using pulsed-field gel electrophoresis. The amplified high-molecular-weight DNA from both samples was cloned into fosmid vectors. Several hundred clones were randomly selected for sequencing, followed by Blastn/Blastx searches. Sequences encoding specific genes from Burkholderia, a species abundant in the soil and groundwater, were found in both samples. Bradyrhizobium and Mesorhizobium, which belong to rhizobia, a large community of nitrogen fixers often found in association with plant roots, were present in the Prebiocide samples. Ralstonia, which is prevalent in soils with a high heavy metal content, was detected in the Tank A samples. The detection of many unidentified sequences suggests the presence of potentially novel microbial fingerprints. The bacterial diversity detected in this pilot study using a fosmid vector approach was higher than that detected by conventional 16S rRNA gene sequencing.
Slama-Schwok, A; Zakrzewska, K; Léger, G; Leroux, Y; Takahashi, M; Käs, E; Debey, P
2000-01-01
Using spectroscopic methods, we have studied the structural changes induced in both protein and DNA upon binding of the High-Mobility Group I (HMG-I) protein to a 21-bp sequence derived from mouse satellite DNA. We show that these structural changes depend on the stoichiometry of the protein/DNA complexes formed, as determined by Job plots derived from experiments using pyrene-labeled duplexes. Circular dichroism and melting temperature experiments extended in the far ultraviolet range show that while native HMG-I is mainly random coiled in solution, it adopts a beta-turn conformation upon forming a 1:1 complex in which the protein first binds to one of two dA.dT stretches present in the duplex. HMG-I structure in the 1:1 complex is dependent on the sequence of its DNA target. A 3:1 HMG-I/DNA complex can also form and is characterized by a small increase in the DNA natural bend and/or compaction coupled to a change in the protein conformation, as determined from fluorescence resonance energy transfer (FRET) experiments. In addition, a peptide corresponding to an extended DNA-binding domain of HMG-I induces an ordered condensation of DNA duplexes. Based on the constraints derived from pyrene excimer measurements, we present a model of these nucleated structures. Our results illustrate an extreme case of protein structure induced by DNA conformation that may bear on the evolutionary conservation of the DNA-binding motifs of HMG-I. We discuss the functional relevance of the structural flexibility of HMG-I associated with the nature of its DNA targets and the implications of the binding stoichiometry for several aspects of chromatin structure and gene regulation. PMID:10777751
Lactobacillus hammesii sp. nov., isolated from French sourdough.
Valcheva, Rosica; Korakli, Maher; Onno, Bernard; Prévost, Hervé; Ivanova, Iskra; Ehrmann, Matthias A; Dousset, Xavier; Gänzle, Michael G; Vogel, Rudi F
2005-03-01
Twenty morphologically different strains were chosen from French wheat sourdough isolates. Cells were Gram-positive, non-spore-forming, non-motile rods. The isolates were identified using amplified-fragment length polymorphism, randomly amplified polymorphic DNA and 16S rRNA gene sequence analysis. All isolates were members of the genus Lactobacillus. They were identified as representing Lactobacillus plantarum, Lactobacillus paralimentarius, Lactobacillus sanfranciscensis, Lactobacillus spicheri and Lactobacillus sakei. However, two isolates (LP38(T) and LP39) could be clearly discriminated from recognized Lactobacillus species on the basis of genotyping methods. 16S rRNA gene sequence similarity and DNA-DNA relatedness data indicate that the two strains belong to a novel Lactobacillus species, for which the name Lactobacillus hammesii is proposed. The type strain is LP38(T) (=DSM 16381(T)=CIP 108387(T)=TMW 1.1236(T)).
In silico Analysis of 2085 Clones from a Normalized Rat Vestibular Periphery 3′ cDNA Library
Roche, Joseph P.; Cioffi, Joseph A.; Kwitek, Anne E.; Erbe, Christy B.; Popper, Paul
2005-01-01
The inserts from 2400 cDNA clones isolated from a normalized Rattus norvegicus vestibular periphery cDNA library were sequenced and characterized. The Wackym-Soares vestibular 3′ cDNA library was constructed from the saccular and utricular maculae, the ampullae of all three semicircular canals and Scarpa's ganglia containing the somata of the primary afferent neurons, microdissected from 104 male and female rats. The inserts from 2400 randomly selected clones were sequenced from the 5′ end. Each sequence was analyzed using the BLAST algorithm compared to the Genbank nonredundant, rat genome, mouse genome and human genome databases to search for high homology alignments. Of the initial 2400 clones, 315 (13%) were found to be of poor quality and did not yield useful information, and therefore were eliminated from the analysis. Of the remaining 2085 sequences, 918 (44%) were found to represent 758 unique genes having useful annotations that were identified in databases within the public domain or in the published literature; these sequences were designated as known characterized sequences. 1141 sequences (55%) aligned with 1011 unique sequences had no useful annotations and were designated as known but uncharacterized sequences. Of the remaining 26 sequences (1%), 24 aligned with rat genomic sequences, but none matched previously described rat expressed sequence tags or mRNAs. No significant alignment to the rat or human genomic sequences could be found for the remaining 2 sequences. Of the 2085 sequences analyzed, 86% were singletons. The known, characterized sequences were analyzed with the FatiGO online data-mining tool (http://fatigo.bioinfo.cnio.es/) to identify level 5 biological process gene ontology (GO) terms for each alignment and to group alignments with similar or identical GO terms. Numerous genes were identified that have not been previously shown to be expressed in the vestibular system. Further characterization of the novel cDNA sequences may lead to the identification of genes with vestibular-specific functions. Continued analysis of the rat vestibular periphery transcriptome should provide new insights into vestibular function and generate new hypotheses. Physiological studies are necessary to further elucidate the roles of the identified genes and novel sequences in vestibular function. PMID:16103642
Within-genome evolution of REPINs: a new family of miniature mobile DNA in bacteria.
Bertels, Frederic; Rainey, Paul B
2011-06-01
Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT-containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA.
The Conjugative Relaxase TrwC Promotes Integration of Foreign DNA in the Human Genome.
González-Prieto, Coral; Gabriel, Richard; Dehio, Christoph; Schmidt, Manfred; Llosa, Matxalen
2017-06-15
Bacterial conjugation is a mechanism of horizontal DNA transfer. The relaxase TrwC of the conjugative plasmid R388 cleaves one strand of the transferred DNA at the oriT gene, covalently attaches to it, and leads the single-stranded DNA (ssDNA) into the recipient cell. In addition, TrwC catalyzes site-specific integration of the transferred DNA into its target sequence present in the genome of the recipient bacterium. Here, we report the analysis of the efficiency and specificity of the integrase activity of TrwC in human cells, using the type IV secretion system of the human pathogen Bartonella henselae to introduce relaxase-DNA complexes. Compared to Mob relaxase from plasmid pBGR1, we found that TrwC mediated a 10-fold increase in the rate of plasmid DNA transfer to human cells and a 100-fold increase in the rate of chromosomal integration of the transferred DNA. We used linear amplification-mediated PCR and plasmid rescue to characterize the integration pattern in the human genome. DNA sequence analysis revealed mostly reconstituted oriT sequences, indicating that TrwC is active and recircularizes transferred DNA in human cells. One TrwC-mediated site-specific integration event was detected, proving that TrwC is capable of mediating site-specific integration in the human genome, albeit with very low efficiency compared to the rate of random integration. Our results suggest that TrwC may stabilize the plasmid DNA molecules in the nucleus of the human cell, probably by recircularization of the transferred DNA strand. This stabilization would increase the opportunities for integration of the DNA by the host machinery. IMPORTANCE Different biotechnological applications, including gene therapy strategies, require permanent modification of target cells. Long-term expression is achieved either by extrachromosomal persistence or by integration of the introduced DNA. Here, we studied the utility of conjugative relaxase TrwC, a bacterial protein with site-specific integrase activity in bacteria, as an integrase in human cells. Although it is not efficient as a site-specific integrase, we found that TrwC is active in human cells and promotes random integration of the transferred DNA in the human genome, probably acting as a DNA chaperone until it is integrated by host mechanisms. TrwC-DNA complexes can be delivered to human cells through a type IV secretion system involved in pathogenesis. Thus, TrwC could be used in vivo to transfer the DNA of interest into the appropriate cell and promote its integration. If used in combination with a site-specific nuclease, it could lead to site-specific integration of the incoming DNA by homologous recombination. Copyright © 2017 American Society for Microbiology.
The Conjugative Relaxase TrwC Promotes Integration of Foreign DNA in the Human Genome
González-Prieto, Coral; Gabriel, Richard; Dehio, Christoph; Schmidt, Manfred
2017-01-01
ABSTRACT Bacterial conjugation is a mechanism of horizontal DNA transfer. The relaxase TrwC of the conjugative plasmid R388 cleaves one strand of the transferred DNA at the oriT gene, covalently attaches to it, and leads the single-stranded DNA (ssDNA) into the recipient cell. In addition, TrwC catalyzes site-specific integration of the transferred DNA into its target sequence present in the genome of the recipient bacterium. Here, we report the analysis of the efficiency and specificity of the integrase activity of TrwC in human cells, using the type IV secretion system of the human pathogen Bartonella henselae to introduce relaxase-DNA complexes. Compared to Mob relaxase from plasmid pBGR1, we found that TrwC mediated a 10-fold increase in the rate of plasmid DNA transfer to human cells and a 100-fold increase in the rate of chromosomal integration of the transferred DNA. We used linear amplification-mediated PCR and plasmid rescue to characterize the integration pattern in the human genome. DNA sequence analysis revealed mostly reconstituted oriT sequences, indicating that TrwC is active and recircularizes transferred DNA in human cells. One TrwC-mediated site-specific integration event was detected, proving that TrwC is capable of mediating site-specific integration in the human genome, albeit with very low efficiency compared to the rate of random integration. Our results suggest that TrwC may stabilize the plasmid DNA molecules in the nucleus of the human cell, probably by recircularization of the transferred DNA strand. This stabilization would increase the opportunities for integration of the DNA by the host machinery. IMPORTANCE Different biotechnological applications, including gene therapy strategies, require permanent modification of target cells. Long-term expression is achieved either by extrachromosomal persistence or by integration of the introduced DNA. Here, we studied the utility of conjugative relaxase TrwC, a bacterial protein with site-specific integrase activity in bacteria, as an integrase in human cells. Although it is not efficient as a site-specific integrase, we found that TrwC is active in human cells and promotes random integration of the transferred DNA in the human genome, probably acting as a DNA chaperone until it is integrated by host mechanisms. TrwC-DNA complexes can be delivered to human cells through a type IV secretion system involved in pathogenesis. Thus, TrwC could be used in vivo to transfer the DNA of interest into the appropriate cell and promote its integration. If used in combination with a site-specific nuclease, it could lead to site-specific integration of the incoming DNA by homologous recombination. PMID:28411218
Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A
2015-01-01
It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.
In vitro selection of zinc fingers with altered DNA-binding specificity.
Jamieson, A C; Kim, S H; Wells, J A
1994-05-17
We have used random mutagenesis and phage display to alter the DNA-binding specificity of Zif268, a transcription factor that contains three zinc finger domains. Four residues in the helix of finger 1 of Zif268 that potentially mediate DNA binding were identified from an X-ray structure of the Zif268-DNA complex. A library was constructed in which these residues were randomly mutated and the Zif268 variants were fused to a truncated version of the gene III coat protein on the surface of M13 filamentous phage particles. The phage displayed the mutant proteins in a monovalent fashion and were sorted by repeated binding and elution from affinity matrices containing different DNA sequences. When the matrix contained the natural nine base pair operator sequence 5'-GCG-TGG-GCG-3', native-like zinc fingers were isolated. New finger 1 variants were found by sorting with two different operators in which the singly modified triplets, GTG and TCG, replaced the native finger 1 triplet, GCG. Overall, the selected finger 1 variants contained a preponderance of polar residues at the four sites. Interestingly, the net charge of the four residues in any selected finger never derived more that one unit from neutrality despite the fact that about half the variants contained three or four charged residues over the four sites. Measurements of the dissociation constants for two of these purified finger 1 variants by gel-shift assay showed their specificities to vary over a 10-fold range, with the greatest affinity being for the DNA binding site for which they were sorted.(ABSTRACT TRUNCATED AT 250 WORDS)
Leray, Matthieu; Knowlton, Nancy
2017-01-01
DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence-absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common β descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.
Entropic Profiler – detection of conservation in genomes using information theory
Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana
2009-01-01
Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538
Multiplex single-molecule interaction profiling of DNA-barcoded proteins.
Gu, Liangcai; Li, Chao; Aach, John; Hill, David E; Vidal, Marc; Church, George M
2014-11-27
In contrast with advances in massively parallel DNA sequencing, high-throughput protein analyses are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule protein detection using optical methods is limited by the number of spectrally non-overlapping chromophores. Here we introduce a single-molecular-interaction sequencing (SMI-seq) technology for parallel protein interaction profiling leveraging single-molecule advantages. DNA barcodes are attached to proteins collectively via ribosome display or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide thin film to construct a random single-molecule array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies) and analysed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimetre. Furthermore, protein interactions can be measured on the basis of the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor and antibody-binding profiling, are demonstrated. SMI-seq enables 'library versus library' screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity.
Multiplex single-molecule interaction profiling of DNA barcoded proteins
Gu, Liangcai; Li, Chao; Aach, John; Hill, David E.; Vidal, Marc; Church, George M.
2014-01-01
In contrast with advances in massively parallel DNA sequencing1, high-throughput protein analyses2-4 are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule (SM) protein detection achieved using optical methods5 is limited by the number of spectrally nonoverlapping chromophores. Here, we introduce a single molecular interaction-sequencing (SMI-Seq) technology for parallel protein interaction profiling leveraging SM advantages. DNA barcodes are attached to proteins collectively via ribosome display6 or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide (PAA) thin film to construct a random SM array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies)7 and analyzed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimeter. Furthermore, protein interactions can be measured based on the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor (GPCR) and antibody binding profiling, were demonstrated. SMI-Seq enables “library vs. library” screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity. PMID:25252978
Trinucleotide cassettes increase diversity of T7 phage-displayed peptide library.
Krumpe, Lauren R H; Schumacher, Kathryn M; McMahon, James B; Makowski, Lee; Mori, Toshiyuki
2007-10-05
Amino acid sequence diversity is introduced into a phage-displayed peptide library by randomizing library oligonucleotide DNA. We recently evaluated the diversity of peptide libraries displayed on T7 lytic phage and M13 filamentous phage and showed that T7 phage can display a more diverse amino acid sequence repertoire due to differing processes of viral morphogenesis. In this study, we evaluated and compared the diversity of a 12-mer T7 phage-displayed peptide library randomized using codon-corrected trinucleotide cassettes with a T7 and an M13 12-mer phage-displayed peptide library constructed using the degenerate codon randomization method. We herein demonstrate that the combination of trinucleotide cassette amino acid codon randomization and T7 phage display construction methods resulted in a significant enhancement to the functional diversity of a 12-mer peptide library. This novel library exhibited superior amino acid uniformity and order-of-magnitude increases in amino acid sequence diversity as compared to degenerate codon randomized peptide libraries. Comparative analyses of the biophysical characteristics of the 12-mer peptide libraries revealed the trinucleotide cassette-randomized library to be a unique resource. The combination of T7 phage display and trinucleotide cassette randomization resulted in a novel resource for the potential isolation of binding peptides for new and previously studied molecular targets.
van Zyl, Leonel; von Arnold, Sara; Bozhkov, Peter; Chen, Yongzhong; Egertsdotter, Ulrika; MacKay, John; Sederoff, Ronald R.; Shen, Jing; Zelena, Lyubov
2002-01-01
Hybridization of labelled cDNA from various cell types with high-density arrays of expressed sequence tags is a powerful technique for investigating gene expression. Few conifer cDNA libraries have been sequenced. Because of the high level of sequence conservation between Pinus and Picea we have investigated the use of arrays from one genus for studies of gene expression in the other. The partial cDNAs from 384 identifiable genes expressed in differentiating xylem of Pinus taeda were printed on nylon membranes in randomized replicates. These were hybridized with labelled cDNA from needles or embryogenic cultures of Pinus taeda, P. sylvestris and Picea abies, and with labelled cDNA from leaves of Nicotiana tabacum. The Spearman correlation of gene expression for pairs of conifer species was high for needles (r2 = 0.78 − 0.86), and somewhat lower for embryogenic cultures (r2 = 0.68 − 0.83). The correlation of gene expression for tobacco leaves and needles of each of the three conifer species was lower but sufficiently high (r2 = 0.52 − 0.63) to suggest that many partial gene sequences are conserved in angiosperms and gymnosperms. Heterologous probing was further used to identify tissue-specific gene expression over species boundaries. To evaluate the significance of differences in gene expression, conventional parametric tests were compared with permutation tests after four methods of normalization. Permutation tests after Z-normalization provide the highest degree of discrimination but may enhance the probability of type I errors. It is concluded that arrays of cDNA from loblolly pine are useful for studies of gene expression in other pines or spruces. PMID:18629264
Digital transcriptome profiling using selective hexamer priming for cDNA synthesis.
Armour, Christopher D; Castle, John C; Chen, Ronghua; Babak, Tomas; Loerch, Patrick; Jackson, Stuart; Shah, Jyoti K; Dey, John; Rohl, Carol A; Johnson, Jason M; Raymond, Christopher K
2009-09-01
We developed a procedure for the preparation of whole transcriptome cDNA libraries depleted of ribosomal RNA from only 1 microg of total RNA. The method relies on a collection of short, computationally selected oligonucleotides, called 'not-so-random' (NSR) primers, to obtain full-length, strand-specific representation of nonribosomal RNA transcripts. In this study we validated the technique by profiling human whole brain and universal human reference RNA using ultra-high-throughput sequencing.
Template-Directed Copolymerization, Random Walks along Disordered Tracks, and Fractals
NASA Astrophysics Data System (ADS)
Gaspard, Pierre
2016-12-01
In biology, template-directed copolymerization is the fundamental mechanism responsible for the synthesis of DNA, RNA, and proteins. More than 50 years have passed since the discovery of DNA structure and its role in coding genetic information. Yet, the kinetics and thermodynamics of information processing in DNA replication, transcription, and translation remain poorly understood. Challenging issues are the facts that DNA or RNA sequences constitute disordered media for the motion of polymerases or ribosomes while errors occur in copying the template. Here, it is shown that these issues can be addressed and sequence heterogeneity effects can be quantitatively understood within a framework revealing universal aspects of information processing at the molecular scale. In steady growth regimes, the local velocities of polymerases or ribosomes along the template are distributed as the continuous or fractal invariant set of a so-called iterated function system, which determines the copying error probabilities. The growth may become sublinear in time with a scaling exponent that can also be deduced from the iterated function system.
Dridi, Bédis; Henry, Mireille; El Khéchine, Amel; Raoult, Didier; Drancourt, Michel
2009-01-01
Background The low and variable prevalence of Methanobrevibacter smithii and Methanosphaera stadtmanae DNA in human stool contrasts with the paramount role of these methanogenic Archaea in digestion processes. We hypothesized that this contrast is a consequence of the inefficiencies of current protocols for archaeon DNA extraction. We developed a new protocol for the extraction and PCR-based detection of M. smithii and M. stadtmanae DNA in human stool. Methodology/Principal Findings Stool specimens collected from 700 individuals were filtered, mechanically lysed twice, and incubated overnight with proteinase K prior to DNA extraction using a commercial DNA extraction kit. Total DNA was used as a template for quantitative real-time PCR targeting M. smithii and M. stadtmanae 16S rRNA and rpoB genes. Amplification of 16S rRNA and rpoB yielded positive detection of M. smithii in 95.7% and M. stadtmanae in 29.4% of specimens. Sequencing of 16S rRNA gene PCR products from 30 randomly selected specimens (15 for M. smithii and 15 for M. stadtmanae) yielded a sequence similarity of 99–100% using the reference M. smithii ATCC 35061 and M. stadtmanae DSM 3091 sequences. Conclusions/Significance In contrast to previous reports, these data indicate a high prevalence of the methanogens M. smithii and M. stadtmanae in the human gut, with the former being an almost ubiquitous inhabitant of the intestinal microbiome. PMID:19759898
Genes expressed during the development and ripening of watermelon fruit.
Levi, A; Davis, A; Hernandez, A; Wechter, P; Thimmapuram, J; Trebitsh, T; Tadmor, Y; Katzir, N; Portnoy, V; King, S
2006-11-01
A normalized cDNA library was constructed using watermelon flesh mRNA from three distinct developmental time-points and was subtracted by hybridization with leaf cDNA. Random cDNA clones of the watermelon flesh subtraction library were sequenced from the 5' end in order to identify potentially informative genes associated with fruit setting, development, and ripening. One-thousand and forty-six 5'-end sequences (expressed sequence tags; ESTs) were assembled into 832 non-redundant sequences, designated as "EST-unigenes". Of these 832 "EST-unigenes", 254 ( approximately 30%) have no significant homology to sequences published so far for other plant species. Additionally, 168 "EST-unigenes" ( approximately 20%) correspond to genes with unknown function, whereas 410 "EST-unigenes" ( approximately 50%) correspond to genes with known function in other plant species. These "EST-unigenes" are mainly associated with metabolism, membrane transport, cytoskeleton synthesis and structure, cell wall formation and cell division, signal transduction, nucleic acid binding and transcription factors, defense and stress response, and secondary metabolism. This study provides the scientific community with novel genetic information for watermelon as well as an expanded pool of genes associated with fruit development in watermelon. These genes will be useful targets in future genetic and functional genomic studies of watermelon and its development.
The Teaching of Protein Synthesis--A Microcomputer Based Method.
ERIC Educational Resources Information Center
Goodridge, Frank
1983-01-01
Describes two computer programs (BASIC for 32K Commodore PET) for teaching protein synthesis. The first is an interactive test of base-pairing knowledge, and the second generates random DNA nucleotide sequences, with instructions for substitution, insertion, and deletion printed out for each student. (JN)
Linkage mapping in a watermelon population segregating for fusarium wilt resistance
Leigh K. Hawkins; Fenny Dane; Thomas L. Kubisiak; Billy B. Rhodes; Robert L. Jarret
2001-01-01
Isozyme, randomly amplified polymorphic DNA (RAPD), and simple sequence repeats (SSR) markers were used to generate a linkage map in an F2 and F3 watermelon (Citrullus lanatus (Thumb.) Matsum. & Nakai) population derived from a cross between the fusarium wilt (Fusarium oxysporum f....
Short-Sequence DNA Repeats in Prokaryotic Genomes
van Belkum, Alex; Scherer, Stewart; van Alphen, Loek; Verbrugh, Henri
1998-01-01
Short-sequence DNA repeat (SSR) loci can be identified in all eukaryotic and many prokaryotic genomes. These loci harbor short or long stretches of repeated nucleotide sequence motifs. DNA sequence motifs in a single locus can be identical and/or heterogeneous. SSRs are encountered in many different branches of the prokaryote kingdom. They are found in genes encoding products as diverse as microbial surface components recognizing adhesive matrix molecules and specific bacterial virulence factors such as lipopolysaccharide-modifying enzymes or adhesins. SSRs enable genetic and consequently phenotypic flexibility. SSRs function at various levels of gene expression regulation. Variations in the number of repeat units per locus or changes in the nature of the individual repeat sequences may result from recombination processes or polymerase inadequacy such as slipped-strand mispairing (SSM), either alone or in combination with DNA repair deficiencies. These rather complex phenomena can occur with relative ease, with SSM approaching a frequency of 10−4 per bacterial cell division and allowing high-frequency genetic switching. Bacteria use this random strategy to adapt their genetic repertoire in response to selective environmental pressure. SSR-mediated variation has important implications for bacterial pathogenesis and evolutionary fitness. Molecular analysis of changes in SSRs allows epidemiological studies on the spread of pathogenic bacteria. The occurrence, evolution and function of SSRs, and the molecular methods used to analyze them are discussed in the context of responsiveness to environmental factors, bacterial pathogenicity, epidemiology, and the availability of full-genome sequences for increasing numbers of microorganisms, especially those that are medically relevant. PMID:9618442
The organization of repeating units in mitochondrial DNA from yeast petite mutants.
Bos, J L; Heyting, C; Van der Horst, G; Borst, P
1980-04-01
We have reinvestigated the linkage orientation of repeating units in mtDNAs of yeast ρ(-) petite mutants containing an inverted duplication. All five petite mtDNAs studied contain a continuous segment of wild-type mtDNA, part of which is duplicated and present in inverted form in the repeat. We show by restriction enzyme analysis that the non-duplicated segments between the inverted duplications are present in random orientation in all five petite mtDNAs. There is no segregation of sub-types with unique orientation. We attribute this to the high rate of intramolecular recombination between the inverted duplications. The results provide additional evidence for the high rate of recombination of yeast mtDNA even in haploid ρ(-) petite cells.We conclude that only two types of stable sequence organization exist in petite mtDNA: petites without an inverted duplication have repeats linked in straight head-to-tail arrangement (abcabc); petites with an inverted duplication have repeats in which the non-duplicated segments are present in random orientation.
Lin, Jinke; Kudrna, Dave; Wing, Rod A.
2011-01-01
We describe the construction and characterization of a publicly available BAC library for the tea plant, Camellia sinensis. Using modified methods, the library was constructed with the aim of developing public molecular resources to advance tea plant genomics research. The library consists of a total of 401,280 clones with an average insert size of 135 kb, providing an approximate coverage of 13.5 haploid genome equivalents. No empty vector clones were observed in a random sampling of 576 BAC clones. Further analysis of 182 BAC-end sequences from randomly selected clones revealed a GC content of 40.35% and low chloroplast and mitochondrial contamination. Repetitive sequence analyses indicated that LTR retrotransposons were the most predominant sequence class (86.93%–87.24%), followed by DNA retrotransposons (11.16%–11.69%). Additionally, we found 25 simple sequence repeats (SSRs) that could potentially be used as genetic markers. PMID:21234344
Initial Characterization of the Pf-Int Recombinase from the Malaria Parasite Plasmodium falciparum
Ghorbal, Mehdi; Scheidig-Benatar, Christine; Bouizem, Salma; Thomas, Christophe; Paisley, Genevieve; Faltermeier, Claire; Liu, Melanie; Scherf, Artur; Lopez-Rubio, Jose-Juan; Gopaul, Deshmukh N.
2012-01-01
Background Genetic variation is an essential means of evolution and adaptation in many organisms in response to environmental change. Certain DNA alterations can be carried out by site-specific recombinases (SSRs) that fall into two families: the serine and the tyrosine recombinases. SSRs are seldom found in eukaryotes. A gene homologous to a tyrosine site-specific recombinase has been identified in the genome of Plasmodium falciparum. The sequence is highly conserved among five other members of Plasmodia. Methodology/Principal Findings The predicted open reading frame encodes for a ∼57 kDa protein containing a C-terminal domain including the putative tyrosine recombinase conserved active site residues R-H-R-(H/W)-Y. The N-terminus has the typical alpha-helical bundle and potentially a mixed alpha-beta domain resembling that of λ-Int. Pf-Int mRNA is expressed differentially during the P. falciparum erythrocytic life stages, peaking in the schizont stage. Recombinant Pf-Int and affinity chromatography of DNA from genomic or synthetic origin were used to identify potential DNA targets after sequencing or micro-array hybridization. Interestingly, the sequences captured also included highly variable subtelomeric genes such as var, rif, and stevor sequences. Electrophoretic mobility shift assays with DNA were carried out to verify Pf-Int/DNA binding. Finally, Pf-Int knock-out parasites were created in order to investigate the biological role of Pf-Int. Conclusions/Significance Our data identify for the first time a malaria parasite gene with structural and functional features of recombinases. Pf-Int may bind to and alter DNA, either in a sequence specific or in a non-specific fashion, and may contribute to programmed or random DNA rearrangements. Pf-Int is the first molecular player identified with a potential role in genome plasticity in this pathogen. Finally, Pf-Int knock-out parasite is viable showing no detectable impact on blood stage development, which is compatible with such function. PMID:23056326
A simple method for the computation of first neighbour frequencies of DNAs from CD spectra
Marck, Christian; Guschlbauer, Wilhelm
1978-01-01
A procedure for the computation of the first neighbour frequencies of DNA's is presented. This procedure is based on the first neighbour approximation of Gray and Tinoco. We show that the knowledge of all the ten elementary CD signals attached to the ten double stranded first neighbour configurations is not necessary. One can obtain the ten frequencies of an unknown DNA with the use of eight elementary CD signals corresponding to eight linearly independent polymer sequences. These signals can be extracted very simply from any eight or more CD spectra of double stranded DNA's of known frequencies. The ten frequencies of a DNA are obtained by least square fit of its CD spectrum with these elementary signals. One advantage of this procedure is that it does not necessitate linear programming, it can be used with CD data digitalized using a large number of wavelengths, thus permitting an accurate resolution of the CD spectra. Under favorable case, the ten frequencies of a DNA (not used as input data) can be determined with an average absolute error < 2%. We have also observed that certain satellite DNA's, those of Drosophila virilis and Callinectes sapidus have CD spectra compatible with those of DNA's of quasi random sequence; these satellite DNA's should adopt also the B-form in solution. PMID:673843
Liu, Wei-Guo; Liang, Cun-Zhen; Yang, Jin-Sheng; Wang, Gui-Ping; Liu, Miao-Miao
2013-02-01
The bacterial diversity in the biological desulfurization reactor operated continuously for 1 year was studied by the 16S rDNA cloning and sequencing method. Forty clones were randomly selected and their partial 16S rDNA genes (ca. 1,400 bp) were sequenced and blasted. The results indicated that there were dominant bacterias in the biological desulfurization reactor, where 33 clones belonged to 3 different published phyla, while 1 clone belonged to unknown phylum. The dominant bacterial community in the system was Proteobacteria, which accounted for 85.3%. The bacterial community succession was as follows: the gamma-Proteobacteria(55.9%), beta-Proteobacteria(17.6%), Actinobacteridae (8.8%), delta-Proteobacteria (5.9%) , alpha-Proteobacteria(5.9%), and Sphingobacteria (2.9%). Halothiobacillus sp. ST15 and Thiobacillus sp. UAM-I were the major desulfurization strains.
Parvari, R; Ziv, E; Lentner, F; Tel-Or, S; Burstein, Y; Schechter, I
1987-01-01
cDNA libraries of chicken spleen and Harder gland (a gland enriched with immunocytes) constructed in pBR322 were screened by differential hybridization and by mRNA hybrid-selected translation. Eleven L-chain cDNA clones were identified from which VL probes were prepared and each was annealed with kidney DNA restriction digests. All VL probes revealed the same set of bands, corresponding to about 15 germline VL genes of one subgroup. The nucleotide sequences of six VL clones showed greater than or equal to 85% homology, and the predicted amino acid sequences were identical or nearly identical to the major N-terminal sequence of L-chains in chicken serum. These findings, and the fact that the VL clones were randomly selected from normal lymphoid tissues, strongly indicate that the bulk of chicken L-chains is encoded by a few germline VL genes, probably much less than 15 since many of the VL genes are known to be pseudogenes. Therefore, it is likely that somatic mechanisms operating prior to specific triggering by antigen play a major role in the generation of antibody diversity in chicken. Analysis of the constant region locus (sequencing of CL gene and cDNAs) demonstrate a single CL isotype and suggest the presence of CL allotypes.
Subtraction of cap-trapped full-length cDNA libraries to select rare transcripts.
Hirozane-Kishikawa, Tomoko; Shiraki, Toshiyuki; Waki, Kazunori; Nakamura, Mari; Arakawa, Takahiro; Kawai, Jun; Fagiolini, Michela; Hensch, Takao K; Hayashizaki, Yoshihide; Carninci, Piero
2003-09-01
The normalization and subtraction of highly expressed cDNAs from relatively large tissues before cloning dramatically enhanced the gene discovery by sequencing for the mouse full-length cDNA encyclopedia, but these methods have not been suitable for limited RNA materials. To normalize and subtract full-length cDNA libraries derived from limited quantities of total RNA, here we report a method to subtract plasmid libraries excised from size-unbiased amplified lambda phage cDNA libraries that avoids heavily biasing steps such as PCR and plasmid library amplification. The proportion of full-length cDNAs and the gene discovery rate are high, and library diversity can be validated by in silico randomization.
Choi, Sangdun; Chang, Mi Sook; Stuecker, Tara; Chung, Christine; Newcombe, David A.; Venkateswaran, Kasthuri
2012-01-01
In this study, fosmid cloning strategies were used to assess the microbial populations in water from the International Space Station (ISS) drinking water system (henceforth referred to as Prebiocide and Tank A water samples). The goals of this study were: to compare the sensitivity of the fosmid cloning strategy with that of traditional culture-based and 16S rRNA-based approaches and to detect the widest possible spectrum of microbial populations during the water purification process. Initially, microbes could not be cultivated, and conventional PCR failed to amplify 16S rDNA fragments from these low biomass samples. Therefore, randomly primed rolling-circle amplification was used to amplify any DNA that might be present in the samples, followed by size selection by using pulsed-field gel electrophoresis. The amplified high-molecular-weight DNA from both samples was cloned into fosmid vectors. Several hundred clones were randomly selected for sequencing, followed by Blastn/Blastx searches. Sequences encoding specific genes from Burkholderia, a species abundant in the soil and groundwater, were found in both samples. Bradyrhizobium and Mesorhizobium, which belong to rhizobia, a large community of nitrogen fixers often found in association with plant roots, were present in the Prebiocide samples. Ralstonia, which is prevalent in soils with a high heavy metal content, was detected in the Tank A samples. The detection of many unidentified sequences suggests the presence of potentially novel microbial fingerprints. The bacterial diversity detected in this pilot study using a fosmid vector approach was higher than that detected by conventional 16S rRNA gene sequencing. PMID:23346038
Wang, Pei; Lu, Yanli; Zheng, Mingmin; Rong, Tingzhao; Tang, Qilin
2011-01-01
Genetic relationship of a newly discovered teosinte from Nicaragua, Zea nicaraguensis with waterlogging tolerance, was determined based on randomly amplified polymorphic DNA (RAPD) markers and the internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA using 14 accessions from Zea species. RAPD analysis showed that a total of 5,303 fragments were produced by 136 random decamer primers, of which 84.86% bands were polymorphic. RAPD-based UPGMA analysis demonstrated that the genus Zea can be divided into section Luxuriantes including Zea diploperennis, Zea luxurians, Zea perennis and Zea nicaraguensis, and section Zea including Zea mays ssp. mexicana, Zea mays ssp. parviglumis, Zea mays ssp. huehuetenangensis and Zea mays ssp. mays. ITS sequence analysis showed the lengths of the entire ITS region of the 14 taxa in Zea varied from 597 to 605 bp. The average GC content was 67.8%. In addition to the insertion/deletions, 78 variable sites were recorded in the total ITS region with 47 in ITS1, 5 in 5.8S, and 26 in ITS2. Sequences of these taxa were analyzed with neighbor-joining (NJ) and maximum parsimony (MP) methods to construct the phylogenetic trees, selecting Tripsacum dactyloides L. as the outgroup. The phylogenetic relationships of Zea species inferred from the ITS sequences are highly concordant with the RAPD evidence that resolved two major subgenus clades. Both RAPD and ITS sequence analyses indicate that Zea nicaraguensis is more closely related to Zea luxurians than the other teosintes and cultivated maize, which should be regarded as a section Luxuriantes species. PMID:21525982
MADS-box genes in maize: Frequent targets of selection during domestication
USDA-ARS?s Scientific Manuscript database
MADS-box genes encode transcription factors that are key regulators of plant inflorescence and flower development. We examined DNA sequence variation in 32 maize MADS-box genes and 32 random loci from the maize genome and investigated their involvement in maize domestication and improvement. Using n...
Taylor, Maria Lucia; Chávez-Tapia, Catalina B; Rojas-Martínez, Alberto; del Rocio Reyes-Montes, Maria; del Valle, Mirian Bobadilla; Zúñiga, Gerardo
2005-09-01
Fourteen Histoplasma capsulatum isolates recovered from infected bats captured in Mexican caves and two human H. capsulatum reference strains were analyzed using random amplification of polymorphic DNA PCR-based and partial DNA sequences of four genes. Cluster analysis of random amplification of polymorphic DNA-patterns revealed differences for two H. capsulatum isolates of one migratory bat Tadarida brasiliensis. Three groups were identified by distance and maximum-parsimony analyses of arf, H-anti, ole, and tub1 H. capsulatum genes. Group I included most isolates from infected bats and one clinical strain from central Mexico; group II included the two isolates from T. brasiliensis; the human G-217B reference strain from USA formed an independent group III. Isolates from group II showed diversity in relation to groups I and III, suggesting a different H. capsulatum population.
Gonçalves, R B; Väisänen, M L; Van Steenbergen, T J; Sundqvist, G; Mouton, C
1999-01-01
Genomic fingerprints from the DNA of 27 strains of Porphyromonas endodontalis from diverse clinical and geographic origins were generated as random amplified polymorphic DNA (RAPD) using the technique of PCR amplification with a single primer of arbitrary sequence. Cluster analysis of the combined RAPD data obtained with three selected 9- or 10-mer-long primers identified 25 distinct RAPD types which clustered as three main groups identifying three genogroups. Genogroups I and II included exclusively P. endodontalis isolates of oral origin, while 7/9 human intestinal strains of genogroup III which linked at a similarity level of 52% constituted the most homogeneous group in our study. Genotypic diversity within P. endodontalis, as shown by RAPD analysis, suggests that the taxon is composed of two oral genogroups and one intestinal genogroup. This hypothesis remains to be confirmed.
NASA Astrophysics Data System (ADS)
Yu, Jianzhong; Ma, Xiaolei; Pan, Kehou; Yang, Guanpin; Yu, Wengong
2010-07-01
We constructed and characterized a normalized cDNA library of Nannochloropsis oculata CS-179, and obtained 905 nonredundant sequences (NRSs) ranging from 431-1 756 bp in length. Among them, 496 were very similar to nonredundant ones in the GenBank ( E ≤1.0e-05), and 349 ESTs had significant hits with the clusters of eukaryotic orthologous groups (KOG). Bases G and/or C at the third position of codons of 14 amino acid residues suggested a strong bias in the conserved domain of 362 NRSs (>60%). We also identified the unigenes encoding phosphorus and nitrogen transporters, suggesting that N. oculata could efficiently transport and metabolize phosphorus and nitrogen, and recognized the unigenes that involved in biosynthesis and storage of both fatty acids and polyunsaturated fatty acids (PUFAs), which will facilitate the demonstration of eicosapentaenoic acid (EPA) biosynthesis pathway of N. oculata. In comparison with the original cDNA library, the normalized library significantly increased the efficiencies of random sequencing and rarely expressed genes discovering, and decreased the frequency of abundant gene sequences.
Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria
Bertels, Frederic; Rainey, Paul B.
2011-01-01
Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT–containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA. PMID:21698139
NASA Astrophysics Data System (ADS)
Acquisti, Claudia; Allegrini, Paolo; Bogani, Patrizia; Buiatti, Marcello; Catanese, Elena; Fronzoni, Leone; Grigolini, Paolo; Mersi, Giuseppe; Palatella, Luigi
2004-04-01
We investigate on a possible way to connect the presence of Low-Complexity Sequences (LCS) in DNA genomes and the nonstationary properties of base correlations. Under the hypothesis that these variations signal a change in the DNA function, we use a new technique, called Non-Stationarity Entropic Index (NSEI) method, and we prove that this technique is an efficient way to detect functional changes with respect to a random baseline. The remarkable aspect is that NSEI does not imply any training data or fitting parameter, the only arbitrarity being the choice of a marker in the sequence. We make this choice on the basis of biological information about LCS distributions in genomes. We show that there exists a correlation between changing the amount in LCS and the ratio of long- to short-range correlation.
The nucleoid protein Dps binds genomic DNA of Escherichia coli in a non-random manner
Kondrashov, F. A.; Toshchakov, S. V.; Dominova, I.; Shvyreva, U. S.; Vrublevskaya, V. V.; Morenkov, O. S.; Panyukov, V. V.
2017-01-01
Dps is a multifunctional homododecameric protein that oxidizes Fe2+ ions accumulating them in the form of Fe2O3 within its protein cavity, interacts with DNA tightly condensing bacterial nucleoid upon starvation and performs some other functions. During the last two decades from discovery of this protein, its ferroxidase activity became rather well studied, but the mechanism of Dps interaction with DNA still remains enigmatic. The crucial role of lysine residues in the unstructured N-terminal tails led to the conventional point of view that Dps binds DNA without sequence or structural specificity. However, deletion of dps changed the profile of proteins in starved cells, SELEX screen revealed genomic regions preferentially bound in vitro and certain affinity of Dps for artificial branched molecules was detected by atomic force microscopy. Here we report a non-random distribution of Dps binding sites across the bacterial chromosome in exponentially growing cells and show their enrichment with inverted repeats prone to form secondary structures. We found that the Dps-bound regions overlap with sites occupied by other nucleoid proteins, and contain overrepresented motifs typical for their consensus sequences. Of the two types of genomic domains with extensive protein occupancy, which can be highly expressed or transcriptionally silent only those that are enriched with RNA polymerase molecules were preferentially occupied by Dps. In the dps-null mutant we, therefore, observed a differentially altered expression of several targeted genes and found suppressed transcription from the dps promoter. In most cases this can be explained by the relieved interference with Dps for nucleoid proteins exploiting sequence-specific modes of DNA binding. Thus, protecting bacterial cells from different stresses during exponential growth, Dps can modulate transcriptional integrity of the bacterial chromosome hampering RNA biosynthesis from some genes via competition with RNA polymerase or, vice versa, competing with inhibitors to activate transcription. PMID:28800583
Kim, Dong Hyun; Patnaik, Bharat Bhusan; Seo, Gi Won; Kang, Seong Min; Lee, Yong Seok; Lee, Bok Luel; Han, Yeon Soo
2013-11-01
We have identified novel ricin-type (R-type) lectin by sequencing of random clones from cDNA library of the coleopteran beetle, Tenebrio molitor. The cDNA sequence is comprised of 495 bp encoding a protein of 164 amino acid residues and shows 49% identity with galectin of Tribolium castaneum. Bioinformatics analysis shows that the amino acid residues from 35 to 162 belong to ricin-type beta-trefoil structure. The transcript was significantly upregulated after early hours of injection with peptidoglycans derived from Gram (+) and Gram (-) bacteria, beta-1, 3 glucan from fungi and an intracellular pathogen, Listeria monocytogenes suggesting putative function in innate immunity. Copyright © 2013 Elsevier Inc. All rights reserved.
Bose, Jeffrey L
2016-01-01
The ability to create mutations is an important step towards understanding bacterial physiology and virulence. While targeted approaches are invaluable, the ability to produce genome-wide random mutations can lead to crucial discoveries. Transposon mutagenesis is a useful approach, but many interesting mutations can be missed by these insertions that interrupt coding and noncoding sequences due to the integration of an entire transposon. Chemical mutagenesis and UV-based random mutagenesis are alternate approaches to isolate mutations of interest with the potential of only single nucleotide changes. Once a standard method, difficulty in identifying mutation sites had decreased the popularity of this technique. However, thanks to the recent emergence of economical whole-genome sequencing, this approach to making mutations can once again become a viable option. Therefore, this chapter provides an overview protocol for random mutagenesis using UV light or DNA-damaging chemicals.
Templated sequence insertion polymorphisms in the human genome
NASA Astrophysics Data System (ADS)
Onozawa, Masahiro; Aplan, Peter
2016-11-01
Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.
Tian, Yang-jie; Yang, Hong; Wu, Xiu-juan; Li, Dao-tang
2005-01-01
Seashore landfill aquifers are environments of special physicochemical conditions (high organic load and high salinity), and microbes in leachate-polluted aquifers play a significant role for intrinsic bioremediation. In order to characterize microbial diversity and look for clues on the relationship between microbial community structure and hydrochemistry, a culture-independent examination of a typical groundwater sample obtained from a seashore landfill was conducted by sequence analysis of 16S rDNA clone library. Two sets of universal 16S rDNA primers were used to amplify DNA extracted from the groundwater so that problems arising from primer efficiency and specificity could be reduced. Of 74 clones randomly selected from the libraries, 30 contained unique sequences whose analysis showed that the majority of them belonged to bacteria (95.9%), with Proteobacteria (63.5%) being the dominant division. One archaeal sequence and one eukaryotic sequence were found as well. Bacterial sequences belonging to the following phylogenic groups were identified: Bacteroidetes (20.3%), β, γ, δ and ε-subdivisions of Proteobacteria (47.3%, 9.5%, 5.4% and 1.3%, respectively), Firmicutes (1.4%), Actinobacteria (2.7%), Cyanobacteria (2.7%). The percentages of Proteobacteria and Bacteroides in seawater were greater than those in the groundwater from a non-seashore landfill, indicating a possible influence of seawater. Quite a few sequences had close relatives in marine or hypersaline environments. Many sequences showed affiliations with microbes involved in anaerobic fermentation. The remarkable abundance of sequences related to (per)chlorate-reducing bacteria (ClRB) in the groundwater was significant and worthy of further study. PMID:15682499
Speranskaya, Anna S; Krinitsina, Anastasia A; Kudryavtseva, Anna V; Poltronieri, Palmiro; Santino, Angelo; Oparina, Nina Y; Dmitriev, Alexey A; Belenikin, Maxim S; Guseva, Marina A; Shevelev, Alexei B
2012-08-01
The group of Kunitz-type protease inhibitors (KPI) from potato is encoded by a polymorphic family of multiple allelic and non-allelic genes. The previous explanations of the KPI variability were based on the hypothesis of random mutagenesis as a key factor of KPI polymorphism. KPI-A genes from the genomes of Solanum tuberosum cv. Istrinskii and the wild species Solanum palustre were amplified by PCR with subsequent cloning in plasmids. True KPI sequences were derived from comparison of the cloned copies. "Hot spots" of recombination in KPI genes were independently identified by DnaSP 4.0 and TOPALi v2.5 software. The KPI-A sequence from potato cv. Istrinskii was found to be 100% identical to the gene from Solanum nigrum. This fact illustrates a high degree of similarity of KPI genes in the genus Solanum. Pairwise comparison of KPI A and B genes unambiguously showed a non-uniform extent of polymorphism at different nt positions. Moreover, the occurrence of substitutions was not random along the strand. Taken together, these facts contradict the traditional hypothesis of random mutagenesis as a principal source of KPI gene polymorphism. The experimentally found mosaic structure of KPI genes in both plants studied is consistent with the hypothesis suggesting recombination of ancestral genes. The same mechanism was proposed earlier for other resistance-conferring genes in the nightshade family (Solanaceae). Based on the data obtained, we searched for potential motifs of site-specific binding with plant DNA recombinases. During this work, we analyzed the sequencing data reported by the Potato Genome Sequencing Consortium (PGSC), 2011 and found considerable inconsistence of their data concerning the number, location, and orientation of KPI genes of groups A and B. The key role of recombination rather than random point mutagenesis in KPI polymorphism was demonstrated for the first time. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Lau, Billy T; Ji, Hanlee P
2017-09-21
RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels. We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts. We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.
Replication Protein A-1 Has a Preference for the Telomeric G-rich Sequence in Trypanosoma cruzi.
Pavani, Raphael Souza; Vitarelli, Marcela O; Fernandes, Carlos A H; Mattioli, Fabio F; Morone, Mariana; Menezes, Milene C; Fontes, Marcos R M; Cano, Maria Isabel N; Elias, Maria Carolina
2018-05-01
Replication protein A (RPA), the major eukaryotic single-stranded binding protein, is a heterotrimeric complex formed by RPA-1, RPA-2, and RPA-3. RPA is a fundamental player in replication, repair, recombination, and checkpoint signaling. In addition, increasing evidences have been adding functions to RPA in telomere maintenance, such as interaction with telomerase to facilitate its activity and also involvement in telomere capping in some conditions. Trypanosoma cruzi, the etiological agent of Chagas disease is a protozoa parasite that appears early in the evolution of eukaryotes. Recently, we have showed that T. cruziRPA presents canonical functions being involved with DNA replication and DNA damage response. Here, we found by FISH/IF assays that T. cruziRPA localizes at telomeres even outside replication (S) phase. In vitro analysis showed that one telomeric repeat is sufficient to bind RPA-1. Telomeric DNA induces different secondary structural modifications on RPA-1 in comparison with other types of DNA. In addition, RPA-1 presents a higher affinity for telomeric sequence compared to randomic sequence, suggesting that RPA may play specific roles in T. cruzi telomeric region. © 2017 The Author(s) Journal of Eukaryotic Microbiology © 2017 International Society of Protistologists.
Amarger, V; Mercier, L
1995-01-01
We have applied the recently developed technique of random amplified polymorphic DNA (RAPD) for the discrimination between two jojoba clones at the genomic level. Among a set of 30 primers tested, a simple reproducible pattern with three distinct fragments for clone D and two distinct fragments for clone E was obtained with primer OPB08. Since RAPD products are the results of arbitrarily priming events and because a given primer can amplify a number of non-homologous sequences, we wondered whether or not RAPD bands, even those of similar size, were derived from different loci in the two clones. To answer this question, two complementary approaches were used: i) cloning and sequencing of the amplification products from clone E; and ii) complementary Southern analysis of RAPD gels using cloned or amplified fragments (directly recovered from agarose gels) as RFLP probes. The data reported here show that the RAPD reaction generates multiple amplified fragments. Some fragments, although resolved as a single band on agarose gels, contain different DNA species of the same size. Furthermore, it appears that the cloned RAPD products of known sequence that do not target repetitive DNA can be used as hybridization probes in RFLP to detect a polymorphism among individuals.
Quantification of DNA cleavage specificity in Hi-C experiments.
Meluzzi, Dario; Arya, Gaurav
2016-01-08
Hi-C experiments produce large numbers of DNA sequence read pairs that are typically analyzed to deduce genomewide interactions between arbitrary loci. A key step in these experiments is the cleavage of cross-linked chromatin with a restriction endonuclease. Although this cleavage should happen specifically at the enzyme's recognition sequence, an unknown proportion of cleavage events may involve other sequences, owing to the enzyme's star activity or to random DNA breakage. A quantitative estimation of these non-specific cleavages may enable simulating realistic Hi-C read pairs for validation of downstream analyses, monitoring the reproducibility of experimental conditions and investigating biophysical properties that correlate with DNA cleavage patterns. Here we describe a computational method for analyzing Hi-C read pairs to estimate the fractions of cleavages at different possible targets. The method relies on expressing an observed local target distribution downstream of aligned reads as a linear combination of known conditional local target distributions. We validated this method using Hi-C read pairs obtained by computer simulation. Application of the method to experimental Hi-C datasets from murine cells revealed interesting similarities and differences in patterns of cleavage across the various experiments considered. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Siqueira, Juliana D; Ng, Terry F; Miller, Melissa; Li, Linlin; Deng, Xutao; Dodd, Erin; Batac, Francesca; Delwart, Eric
2017-07-01
Over the past century, the southern sea otter (SSO; Enhydra lutris nereis) population has been slowly recovering from near extinction due to overharvest. The SSO is a threatened subspecies under federal law and a fully protected species under California law, US. Through a multiagency collaborative program, stranded animals are rehabilitated and released, while deceased animals are necropsied and tissues are cryopreserved to facilitate scientific study. Here, we processed archival tissues to enrich particle-associated viral nucleic acids, which we randomly amplified and deeply sequenced to identify viral genomes through sequence similarities. Anelloviruses and endogenous retroviral sequences made up over 50% of observed viral sequences. Polyomavirus, parvovirus, and adenovirus sequences made up most of the remaining reads. We characterized and phylogenetically analyzed the full genome of sea otter polyomavirus 1 and the complete coding sequence of sea otter parvovirus 1 and found that the closest known viruses infect primates and domestic pigs ( Sus scrofa domesticus), respectively. We tested archived tissues from 69 stranded SSO necropsied over 14 yr (2000-13) by PCR. Polyomavirus, parvovirus, and adenovirus infections were detected in 51, 61, and 29% of examined animals, respectively, with no significant increase in frequency over time, suggesting endemic infection. We found that 80% of tested SSO were infected with at least one of the three DNA viruses, whose tissue distribution we determined in 261 tissue samples. Parvovirus DNA was most frequently detected in mesenteric lymph node, polyomavirus DNA in spleen, and adenovirus DNA in multiple tissues (spleen, retropharyngeal and mesenteric lymph node, lung, and liver). This study describes the virome in tissues of a threatened species and shows that stranded SSO are frequently infected with multiple viruses, warranting future research to investigate associations between these infections and observed lesions.
Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction
Laehnemann, David; Borkhardt, Arndt
2016-01-01
Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159
2010-01-01
Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644
Song, Wen Jun; Qin, Qi Wei; Qiu, Jin; Huang, Can Hua; Wang, Fan; Hew, Choy Leong
2004-01-01
Here we report the complete genome sequence of Singapore grouper iridovirus (SGIV). Sequencing of the random shotgun and restriction endonuclease genomic libraries showed that the entire SGIV genome consists of 140,131 nucleotide bp. One hundred sixty-two open reading frames (ORFs) from the sense and antisense DNA strands, coding for lengths varying from 41 to 1,268 amino acids, were identified. Computer-assisted analyses of the deduced amino acid sequences revealed that 77 of the ORFs exhibited homologies to known virus genes, 23 of which matched functional iridovirus proteins. Forty-two putative conserved domains or signatures were detected in the National Center for Biotechnology Information CD-Search database and PROSITE database. An assortment of enzyme activities involved in DNA replication, transcription, nucleotide metabolism, cell signaling, etc., were identified. Viruses were cultured on a cell line derived from the embryonated egg of the grouper Epinephelus tauvina, isolated, and purified by sucrose gradient ultracentrifugation. The protein extract from the purified virions was analyzed by polyacrylamide gel electrophoresis followed by in-gel digestion of protein bands. Matrix-assisted laser desorption ionization-time of flight mass spectrometry and database searching led to identification of 26 proteins. Twenty of these represented novel or previously unidentified genes, which were further confirmed by reverse transcription-PCR (RT-PCR) and DNA sequencing of their respective RT-PCR products. PMID:15507645
Evaluation of microbial community in hydrothermal field by direct DNA sequencing
NASA Astrophysics Data System (ADS)
Kawarabayasi, Y.; Maruyama, A.
2002-12-01
Many extremophiles have been discovered from terrestrial and marine hydrothermal fields. Some thermophiles can grow beyond 90°C in culture, while direct microscopic analysis occasionally indicates that microbes may survive in much hotter hydrothermal fluids. However, it is very difficult to isolate and cultivate such microbes from the environments, i.e., over 99% of total microbes remains undiscovered. Based on experiences of entire microbial genome analysis (Y.K.) and microbial community analysis (A.M.), we started to find out unique microbes/genes in hydrothermal fields through direct sequencing of environmental DNA fragments. At first, shotgun plasmid libraries were directly constructed with the DNA molecules prepared from mixed microbes collected by an in situ filtration system from low-temperature fluids at RM24 in the Southern East Pacific Rise (S-EPR). A gene amplification (PCR) technique was not used for preventing mutation in the process. The nucleotide sequences of 285 clones indicated that no sequence had identical data in public databases. Among 27 clones determined entire sequences, no ORF was identified on 14 clones like intron in Eukaryote. On four clones, tetra-nucleotide-long multiple tandem repetitive sequences were identified. This type of sequence was identified in some familiar disease in human. The result indicates that living/dead materials with eukaryotic features may exist in this low temperature field. Secondly, shotgun plasmid libraries were constructed from the environmental DNA prepared from Beppu hot springs. In randomly-selected 143 clones used for sequencing, no known sequence was identified. Unlike the clones in S-EPR library, clear ORFs were identified on all nine clones determined the entire sequence. It was found that one clone, H4052, contained the complete Aspartyl-tRNA synthetase. Phylogenetic analysis using amino acid sequences of this gene indicated that this gene was separated from other Euryarchaea before the differentiation of species. Thus, some novel archaeal species are expected to be in this field. The present direct cloning and sequencing technique is now opening a window to the new world in hydrothermal microbial community analysis.
Roden, Suzanne E; Dutton, Peter H; Morin, Phillip A
2009-01-01
The green sea turtle, Chelonia mydas, was used as a case study for single nucleotide polymorphism (SNP) discovery in a species that has little genetic sequence information available. As green turtles have a complex population structure, additional nuclear markers other than microsatellites could add to our understanding of their complex life history. Amplified fragment length polymorphism technique was used to generate sets of random fragments of genomic DNA, which were then electrophoretically separated with precast gels, stained with SYBR green, excised, and directly sequenced. It was possible to perform this method without the use of polyacrylamide gels, radioactive or fluorescent labeled primers, or hybridization methods, reducing the time, expense, and safety hazards of SNP discovery. Within 13 loci, 2547 base pairs were screened, resulting in the discovery of 35 SNPs. Using this method, it was possible to yield a sufficient number of loci to screen for SNP markers without the availability of prior sequence information.
SCAR marker specific to detect Magnaporthe grisea infecting finger millets (Eleusine coracana).
Gnanasing Jesumaharaja, L; Manikandan, R; Raguchander, T
2016-09-01
To determine the molecular variability and develop specific Sequence Characterized Amplified Region (SCAR) marker for the detection of Magnaporthe grisea causing blast disease in finger millet. Random amplified polymorphic DNA (RAPD) was performed with 14 isolates of M. grisea using 20 random primers. SCAR marker was developed for accurate and specific detection of M. grisea infecting only finger millets. The genetic similarity coefficient within each group and variation between the groups was observed. Among the primers, OPF-08 generated a RAPD polymorphic profile that showed common fragment of 478 bp in all the isolates. This fragment was cloned and sequenced. SCAR primers, Mg-SCAR-FP and Mg-SCAR-RP, were designed using sequence of the cloned product. The specificity of the SCAR primers was evaluated using purified DNA from M. grisea isolates from finger millets and other pathogens viz., Pyricularia oryzae, Colletotrichum gloeosporioides, Colletotrichum falcatum and Colletotrichum capcisi infecting different crops. The SCAR primers amplified only specific 460 bp fragment from DNA of M. grisea isolates and this fragment was not amplified in other pathogens tested. SCAR primers distinguish blast disease of finger millet from rice as there is no amplification in the rice blast pathogen. PCR-based SCAR marker is a convenient tool for specific and rapid detection of M. grisea in finger millets. Genetic diversity in fungal population helps in developing a suitable SCAR marker to identify the blast pathogen at the early stage of infection. © 2016 The Society for Applied Microbiology.
Stepanchick, Ann; Zhi, Huijun; Cavanaugh, Alice H; Rothblum, Katrina; Schneider, David A; Rothblum, Lawrence I
2013-03-29
The human homologue of yeast Rrn3 is an RNA polymerase I-associated transcription factor that is essential for ribosomal DNA (rDNA) transcription. The generally accepted model is that Rrn3 functions as a bridge between RNA polymerase I and the transcription factors bound to the committed template. In this model Rrn3 would mediate an interaction between the mammalian Rrn3-polymerase I complex and SL1, the rDNA transcription factor that binds to the core promoter element of the rDNA. In the course of studying the role of Rrn3 in recruitment, we found that Rrn3 was in fact a DNA-binding protein. Analysis of the sequence of Rrn3 identified a domain with sequence similarity to the DNA binding domain of heat shock transcription factor 2. Randomization, or deletion, of the amino acids in this region in Rrn3, amino acids 382-400, abrogated its ability to bind DNA, indicating that this domain was an important contributor to DNA binding by Rrn3. Control experiments demonstrated that these mutant Rrn3 constructs were capable of interacting with both rpa43 and SL1, two other activities demonstrated to be essential for Rrn3 function. However, neither of these Rrn3 mutants was capable of functioning in transcription in vitro. Moreover, although wild-type human Rrn3 complemented a yeast rrn3-ts mutant, the DNA-binding site mutant did not. These results demonstrate that DNA binding by Rrn3 is essential for transcription by RNA polymerase I.
Stepanchick, Ann; Zhi, Huijun; Cavanaugh, Alice H.; Rothblum, Katrina; Schneider, David A.; Rothblum, Lawrence I.
2013-01-01
The human homologue of yeast Rrn3 is an RNA polymerase I-associated transcription factor that is essential for ribosomal DNA (rDNA) transcription. The generally accepted model is that Rrn3 functions as a bridge between RNA polymerase I and the transcription factors bound to the committed template. In this model Rrn3 would mediate an interaction between the mammalian Rrn3-polymerase I complex and SL1, the rDNA transcription factor that binds to the core promoter element of the rDNA. In the course of studying the role of Rrn3 in recruitment, we found that Rrn3 was in fact a DNA-binding protein. Analysis of the sequence of Rrn3 identified a domain with sequence similarity to the DNA binding domain of heat shock transcription factor 2. Randomization, or deletion, of the amino acids in this region in Rrn3, amino acids 382–400, abrogated its ability to bind DNA, indicating that this domain was an important contributor to DNA binding by Rrn3. Control experiments demonstrated that these mutant Rrn3 constructs were capable of interacting with both rpa43 and SL1, two other activities demonstrated to be essential for Rrn3 function. However, neither of these Rrn3 mutants was capable of functioning in transcription in vitro. Moreover, although wild-type human Rrn3 complemented a yeast rrn3-ts mutant, the DNA-binding site mutant did not. These results demonstrate that DNA binding by Rrn3 is essential for transcription by RNA polymerase I. PMID:23393135
Wang, Hongxia; Walla, James A; Zhong, Shaobin; Huang, Danqiong; Dai, Wenhao
2012-11-01
Chokecherry (Prunus virginiana L.) (2n = 4x = 32) is a unique Prunus species for both genetics and disease-resistance research due to its tetraploid nature and X-disease resistance. However, no genetic and genomic information on chokecherry is available. A partial chokecherry genome was sequenced using Roche 454 sequencing technology. A total of 145,094 reads covering 4.8 Mbp of the chokecherry genome were generated and 15,113 contigs were assembled, of which 11,675 contigs were larger than 100 bp in size. A total of 481 SSR loci were identified from 234 (out of 11,675) contigs and 246 polymerase chain reaction (PCR) primer pairs were designed. Of 246 primers, 212 (86.2 %) effectively produced amplification from the genomic DNA of chokecherry. All 212 amplifiable chokecherry primers were used to amplify genomic DNA from 11 other rosaceous species (sour cherry, sweet cherry, black cherry, peach, apricot, plum, apple, crabapple, pear, juneberry, and raspberry). Thus, chokecherry SSR primers can be transferable across Prunus species and other rosaceous species. An average of 63.2 and 58.7 % of amplifiable chokecherry primers amplified DNA from cherry and other Prunus species, respectively, while 47.2 % of amplifiable chokecherry primers amplified DNA from other rosaceous species. Using random genome sequence data generated from next-generation sequencing technology to identify microsatellite loci appears to be rapid and cost-efficient, particularly for species with no sequence information available. Sequence information and confirmed transferability of the identified chokecherry SSRs among species will be valuable for genetic research in Prunus and other rosaceous species. Key message A total of 246 SSR primers were identified from chokecherry genome sequences. Of which, 212 were confirmed amplifiable both in chokecherry and other 11 other rosaceous species.
Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc
2012-01-01
While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977
The female urinary microbiome in urgency urinary incontinence.
Pearce, Meghan M; Zilliox, Michael J; Rosenfeld, Amy B; Thomas-White, Krystal J; Richter, Holly E; Nager, Charles W; Visco, Anthony G; Nygaard, Ingrid E; Barber, Matthew D; Schaffer, Joseph; Moalli, Pamela; Sung, Vivian W; Smith, Ariana L; Rogers, Rebecca; Nolen, Tracy L; Wallace, Dennis; Meikle, Susan F; Gai, Xiaowu; Wolfe, Alan J; Brubaker, Linda
2015-09-01
The purpose of this study was to characterize the urinary microbiota in women who are planning treatment for urgency urinary incontinence and to describe clinical associations with urinary symptoms, urinary tract infection, and treatment outcomes. Catheterized urine samples were collected from multisite randomized trial participants who had no clinical evidence of urinary tract infection; 16S ribosomal RNA gene sequencing was used to dichotomize participants as either DNA sequence-positive or sequence-negative. Associations with demographics, urinary symptoms, urinary tract infection risk, and treatment outcomes were determined. In sequence-positive samples, microbiotas were characterized on the basis of their dominant microorganisms. More than one-half (51.1%; 93/182) of the participants' urine samples were sequence-positive. Sequence-positive participants were younger (55.8 vs 61.3 years old; P = .0007), had a higher body mass index (33.7 vs 30.1 kg/m(2); P = .0009), had a higher mean baseline daily urgency urinary incontinence episodes (5.7 vs 4.2 episodes; P < .0001), responded better to treatment (decrease in urgency urinary incontinence episodes, -4.4 vs -3.3; P = .0013), and were less likely to experience urinary tract infection (9% vs 27%; P = .0011). In sequence-positive samples, 8 major bacterial clusters were identified; 7 clusters were dominated not only by a single genus, most commonly Lactobacillus (45%) or Gardnerella (17%), but also by other taxa (25%). The remaining cluster had no dominant genus (13%). DNA sequencing confirmed urinary bacterial DNA in many women with urgency urinary incontinence who had no signs of infection. Sequence status was associated with baseline urgency urinary incontinence episodes, treatment response, and posttreatment urinary tract infection risk. Copyright © 2015 Elsevier Inc. All rights reserved.
Deep Sequencing to Identify the Causes of Viral Encephalitis
Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.
2014-01-01
Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691
NASA Astrophysics Data System (ADS)
Hanasaki, Itsuo; Yukimoto, Naoya; Uehara, Satoshi; Shintaku, Hirofumi; Kawano, Satoyuki
2015-04-01
Because long DNA molecules usually exist in random coil states due to the entropic effect, linearisation is required for devices equipped with nanopores where electrical sequencing is necessary during single-file translocation. We present a novel technique for linearising DNA molecules in a micro-channel. In our device, electrodes are embedded in the bottom surface of the channel. The application of a voltage induces the trapping of λDNA molecules on the positive electrode. An instantaneous voltage drop is used to put the λDNA molecules in a partly released state and the hydrodynamic force of the solution induces linearisation. Phenomena were directly observed using an optical microscopy system equipped with a high-speed camera and the linearisation principle was explored in detail. Furthermore, we estimate the tensile characteristics produced by the flow of the solution through a numerical model of a tethered polymer subject to a Poiseuille flow. The mean tensile force is in the range of 0.1-1 pN. This is sufficiently smaller than the structural transition point of λDNA but counterbalances the entropic elasticity that causes the random coil shape of λDNA molecules in solution. We show the important role of thermal fluctuation in the manipulation of molecules in solution and clarify the tensile conditions required for DNA linearisation using a combination of solution flow and voltage variation in a microchannel.
Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases
Bührer-Sékula, Samira; Benjak, Andrej; Loiseau, Chloé; Singh, Pushpendra; Pontes, Maria A. A.; Gonçalves, Heitor S.; Hungria, Emerith M.; Busso, Philippe; Piton, Jérémie; Silveira, Maria I. S.; Cruz, Rossilene; Schetinni, Antônio; Costa, Maurício B.; Virmond, Marcos C. L.; Diorio, Suzana M.; Dias-Baptista, Ida M. F.; Rosa, Patricia S.; Matsuoka, Masanori; Penna, Maria L. F.; Cole, Stewart T.; Penna, Gerson O.
2017-01-01
Background Since leprosy is both treated and controlled by multidrug therapy (MDT) it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin. Methodology DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR). Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico. Principal findings In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable. Conclusions/Significance This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission. PMID:28617800
Michlewski, Gracjan; Finnegan, David J.; Elfick, Alistair; Rosser, Susan J.
2017-01-01
Abstract Delivery of DNA to cells and its subsequent integration into the host genome is a fundamental task in molecular biology, biotechnology and gene therapy. Here we describe an IP-free one-step method that enables stable genome integration into either prokaryotic or eukaryotic cells. A synthetic mariner transposon is generated by flanking a DNA sequence with short inverted repeats. When purified recombinant Mos1 or Mboumar-9 transposase is co-transfected with transposon-containing plasmid DNA, it penetrates prokaryotic or eukaryotic cells and integrates the target DNA into the genome. In vivo integrations by purified transposase can be achieved by electroporation, chemical transfection or Lipofection of the transposase:DNA mixture, in contrast to other published transposon-based protocols which require electroporation or microinjection. As in other transposome systems, no helper plasmids are required since transposases are not expressed inside the host cells, thus leading to generation of stable cell lines. Since it does not require electroporation or microinjection, this tool has the potential to be applied for automated high-throughput creation of libraries of random integrants for purposes including gene knock-out libraries, screening for optimal integration positions or safe genome locations in different organisms, selection of the highest production of valuable compounds for biotechnology, and sequencing. PMID:28204586
Search for methylation-sensitive amplification polymorphisms in mutant figs.
Rodrigues, M G F; Martins, A B G; Bertoni, B W; Figueira, A; Giuliatti, S
2013-07-08
Fig (Ficus carica) breeding programs that use conventional approaches to develop new cultivars are rare, owing to limited genetic variability and the difficulty in obtaining plants via gamete fusion. Cytosine methylation in plants leads to gene repression, thereby affecting transcription without changing the DNA sequence. Previous studies using random amplification of polymorphic DNA and amplified fragment length polymorphism markers revealed no polymorphisms among select fig mutants that originated from gamma-irradiated buds. Therefore, we conducted methylation-sensitive amplified polymorphism analysis to verify the existence of variability due to epigenetic DNA methylation among these mutant selections compared to the main cultivar 'Roxo-de-Valinhos'. Samples of genomic DNA were double-digested with either HpaII (methylation sensitive) or MspI (methylation insensitive) and with EcoRI. Fourteen primer combinations were tested, and on an average, non-methylated CCGG, symmetrically methylated CmCGG, and hemimethylated hmCCGG sites accounted for 87.9, 10.1, and 2.0%, respectively. MSAP analysis was effective in detecting differentially methylated sites in the genomic DNA of fig mutants, and methylation may be responsible for the phenotypic variation between treatments. Further analyses such as polymorphic DNA sequencing are necessary to validate these differences, standardize the regions of methylation, and analyze reads using bioinformatic tools.
Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi
2016-03-02
Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kundu, Sourav, E-mail: sourav.kundu@saha.ac.in; Karmakar, S. N., E-mail: sachindranath.karmakar@saha.ac.in
We present a tight-binding study of conformation dependent electronic transport properties of DNA double-helix including its helical symmetry. We have studied the changes in the localization properties of DNA as we alter the number of stacked bases within every pitch of the double-helix keeping fixed the total number of nitrogen bases within the DNA molecule. We take three DNA sequences, two of them are periodic and one is random and observe that in all the cases localization length increases as we increase the radius of DNA double-helix i.e., number of nucleobases within a pitch. We have also investigated the effectmore » of backbone energetic on the I-V response of the system and found that in presence of helical symmetry, depending on the interplay of conformal variation and disorder, DNA can be found in either metallic, semiconducting and insulating phases, as observed experimentally.« less
2012-01-01
Background In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Results Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. Conclusions We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species. PMID:22805587
Yang, Huaan; Tao, Ye; Zheng, Zequn; Li, Chengdao; Sweetingham, Mark W; Howieson, John G
2012-07-17
In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species.
The molecular biology of environmental aromatic hydrocarbons
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weiss, S.B.
The induction of mutations in living cells by polycyclic aromatic hydrocarbons (PAH) has been recognized for many years. Although the mechanism for this occurrence has been examined by numerous investigators, the precise nature and type of mutations induced is still unclear. Earlier investigations of DNA damage and repair were primarily examined by the random alkylation of bacterial and mammalian DNAs, in vivo, using a variety of different PAH agents. This procedure is still used today. Though informative, such studies have not offered any explanation of the mechanism by which PAH agents induce carcinogenesis. We have attempted to examine the repairmore » of PAH-damaged DNA using small DNA oligomer constructs as targets for site-specific alkylation. DNA constructs containing a single BPDE alkylated site in each duplex strand were ligated into M13 RF DNA and used to transfect E. coli. Progeny M13 DNA was isolated from E. coli colonies grown on agar plates containing IPTG and Xgal. DNA sequence analysis of the isolated progeny M13 DNA, at the site of construct insertion, was found to contain large deletions and illegitimate recombinants. These sequence rearrangements occurred in either recA{sup +} or recA{sup -} host cells suggesting that SOS processing was not involved in the deletions and the recombinants observed. The mechanism by which BPDE induces illegitimate recombinants has not been resolved, however, it is possible that the closely spaced adducts activate the recombinant machinery in our DNA-damaged cells. 1 ref., 6 figs., 1 tab.« less
LINE-1 retrotransposons: from 'parasite' sequences to functional elements.
Paço, Ana; Adega, Filomena; Chaves, Raquel
2015-02-01
Long interspersed nuclear elements-1 (LINE-1) are the most abundant and active retrotransposons in the mammalian genomes. Traditionally, the occurrence of LINE-1 sequences in the genome of mammals has been explained by the selfish DNA hypothesis. Nevertheless, recently, it has also been argued that these sequences could play important roles in these genomes, as in the regulation of gene expression, genome modelling and X-chromosome inactivation. The non-random chromosomal distribution is a striking feature of these retroelements that somehow reflects its functionality. In the present study, we have isolated and analysed a fraction of the open reading frame 2 (ORF2) LINE-1 sequence from three rodent species, Cricetus cricetus, Peromyscus eremicus and Praomys tullbergi. Physical mapping of the isolated sequences revealed an interspersed longitudinal AT pattern of distribution along all the chromosomes of the complement in the three genomes. A detailed analysis shows that these sequences are preferentially located in the euchromatic regions, although some signals could be detected in the heterochromatin. In addition, a coincidence between the location of imprinted gene regions (as Xist and Tsix gene regions) and the LINE-1 retroelements was also observed. According to these results, we propose an involvement of LINE-1 sequences in different genomic events as gene imprinting, X-chromosome inactivation and evolution of repetitive sequences located at the heterochromatic regions (e.g. satellite DNA sequences) of the rodents' genomes analysed.
Classification of Pelteobagrus fish in Poyang Lake based on mitochondrial COI gene sequence.
Zhong, Bin; Chen, Ting-Ting; Gong, Rui-Yue; Zhao, Zhe-Xia; Wang, Binhua; Fang, Chunlin; Mao, Hui-Ling
2016-11-01
We use DNA molecular marker technology to correct the deficiency of traditional morphological taxonomy. Totality 770 Pelteobagrus fish from Poyang Lake were collected. After preliminary morphological classification, random selected eight samples in each species for DNA extraction. Mitochondrial COI gene sequence was cloned with universal primers and sequenced. The results showed that there are four species of Pelteobagrus living in Poyang Lake. The average of intraspecific genetic distance value was 0.003, while the average interspecific genetic distance was 0.128. The interspecific genetic distance is far more than intraspecific genetic distance. Besides, phylogenetic tree analysis revealed that molecular systematics was in accord with morphological classification. It indicated that COI gene is an effective DNA molecular marker in Pelteobagrus classification. Surprisingly, the intraspecific difference of some individuals (P. e6, P. n6, P. e5, and P. v4) from their original named exceeded species threshold (2%), which should be renewedly classified into Pelteobagrus fulvidraco. However, another individual P. v3 was very different, because its genetic distance was over 8.4% difference from original named Pelteobagrus vachelli. Its taxonomic status remained to be further studied.
Characterization of Microbial Communities in Gas Industry Pipelines
Zhu, Xiang Y.; Lubeck, John; Kilbane, John J.
2003-01-01
Culture-independent techniques, denaturing gradient gel electrophoresis (DGGE) analysis, and random cloning of 16S rRNA gene sequences amplified from community DNA were used to determine the diversity of microbial communities in gas industry pipelines. Samples obtained from natural gas pipelines were used directly for DNA extraction, inoculated into sulfate-reducing bacterium medium, or used to inoculate a reactor that simulated a natural gas pipeline environment. The variable V2-V3 (average size, 384 bp) and V3-V6 (average size, 648 bp) regions of bacterial and archaeal 16S rRNA genes, respectively, were amplified from genomic DNA isolated from nine natural gas pipeline samples and analyzed. A total of 106 bacterial 16S rDNA sequences were derived from DGGE bands, and these formed three major clusters: beta and gamma subdivisions of Proteobacteria and gram-positive bacteria. The most frequently encountered bacterial species was Comamonas denitrificans, which was not previously reported to be associated with microbial communities found in gas pipelines or with microbially influenced corrosion. The 31 archaeal 16S rDNA sequences obtained in this study were all related to those of methanogens and phylogenetically fall into three clusters: order I, Methanobacteriales; order III, Methanomicrobiales; and order IV, Methanosarcinales. Further microbial ecology studies are needed to better understand the relationship among bacterial and archaeal groups and the involvement of these groups in the process of microbially influenced corrosion in order to develop improved ways of monitoring and controlling microbially influenced corrosion. PMID:12957923
High-density fiber optic biosensor arrays
NASA Astrophysics Data System (ADS)
Epstein, Jason R.; Walt, David R.
2002-02-01
Novel approaches are required to coordinate the immense amounts of information derived from diverse genomes. This concept has influenced the expanded role of high-throughput DNA detection and analysis in the biological sciences. A high-density fiber optic DNA biosensor was developed consisting of oligonucleotide-functionalized, 3.1 mm diameter microspheres deposited into the etched wells on the distal face of a 500 micrometers imaging fiber bundle. Imaging fiber bundles containing thousands of optical fibers, each associated with a unique oligonucleotide probe sequence, were the foundation for an optically connected, individually addressable DNA detection platform. Different oligonucleotide-functionalized microspheres were combined in a stock solution, and randomly dispersed into the etched wells. Microsphere positions were registered from optical dyes incorporated onto the microspheres. The distribution process provided an inherent redundancy that increases the signal-to-noise ratio as the square root of the number of sensors examined. The representative amount of each probe-type in the array was dependent on their initial stock solution concentration, and as other sequences of interest arise, new microsphere elements can be added to arrays without altering the existing detection capabilities. The oligonucleotide probe sequences hybridize to fluorescently-labeled, complementary DNA target solutions. Fiber optic DNA microarray research has included DNA-protein interaction profiles, microbial strain differentiation, non-labeled target interrogation with molecular beacons, and single cell-based assays. This biosensor array is proficient in DNA detection linked to specific disease states, single nucleotide polymorphism (SNP's) discrimination, and gene expression analysis. This array platform permits multiple detection formats, provides smaller feature sizes, and enables sensor design flexibility. High-density fiber optic microarray biosensors provide a fast, reversible format with the detection limit of a few hundred molecules.
de-Carvalho, Jorge; Rodrigues, Rogério M M; Tomé, Brigitte; Henriques, Sílvia F; Mira, Nuno P; Sá-Correia, Isabel; Ferreira, Guilherme N M
2014-04-21
A novel quartz crystal microbalance (QCM) analytical method is developed based on the transmission line model (TLM) algorithm to analyze the binding of transcription factors (TFs) to immobilized DNA oligoduplexes. The method is used to characterize the mechanical properties of biological films through the estimation of the film dynamic shear moduli, G and G, and the film thickness. Using the Saccharomyces cerevisiae transcription factor Haa1 (Haa1DBD) as a biological model two sensors were prepared by immobilizing DNA oligoduplexes, one containing the Haa1 recognition element (HRE(wt)) and another with a random sequence (HRE(neg)) used as a negative control. The immobilization of DNA oligoduplexes was followed in real time and we show that DNA strands initially adsorb with low or non-tilting, laying flat close to the surface, which then lift-off the surface leading to final film tilting angles of 62.9° and 46.7° for HRE(wt) and HRE(neg), respectively. Furthermore we show that the binding of Haa1DBD to HRE(wt) leads to a more ordered and compact film, and forces a 31.7° bending of the immobilized HRE(wt) oligoduplex. This work demonstrates the suitability of the QCM to monitor the specific binding of TFs to immobilized DNA sequences and provides an analytical methodology to study protein-DNA biophysics and kinetics.
Scar-less multi-part DNA assembly design automation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hillson, Nathan J.
The present invention provides a method of a method of designing an implementation of a DNA assembly. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which to assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding flanking homology sequences to each of the DNA oligos. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which tomore » assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding optimized overhang sequences to each of the DNA oligos.« less
Lou, Wangchao; Wang, Xiaoqing; Chen, Fan; Chen, Yixiao; Jiang, Bo; Zhang, Hua
2014-01-01
Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins. PMID:24475169
Design of nucleic acid sequences for DNA computing based on a thermodynamic approach
Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma
2005-01-01
We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. PMID:15701762
Microbeads display of proteins using emulsion PCR and cell-free protein synthesis.
Gan, Rui; Yamanaka, Yumiko; Kojima, Takaaki; Nakano, Hideo
2008-01-01
We developed a method for coupling protein to its coding DNA on magnetic microbeads using emulsion PCR and cell-free protein synthesis in emulsion. A PCR mixture containing streptavidin-coated microbeads was compartmentalized by water-in-oil (w/o) emulsion with estimated 0.5 template molecules per droplet. The template molecules were amplified and immobilized on beads via bead-linked reverse primers and biotinylated forward primers. After amplification, the templates were sequentially labeled with streptavidin and biotinylated anti-glutathione S-transferase (GST) antibody. The pool of beads was then subjected to cell-free protein synthesis compartmentalized in another w/o emulsion, in which templates were coupled to their coding proteins. We mixed two types of DNA templates of Histidine6 tag (His6)-fused and FLAG tag-fused GST in a ratio of 1:1,000 (His6: FLAG) for use as a model DNA library. After incubation with fluorescein isothiocyanate (FITC)-labeled anti-His6 (C-term) antibody, the beads with the His6 gene were enriched 917-fold in a single-round screening by using flow cytometry. A library with a theoretical diversity of 10(6) was constructed by randomizing the middle four residues of the His6 tag. After a two-round screening, the randomized sequences were substantially converged to peptide-encoding sequences recognized by the anti-His6 antibody.
Iterative dictionary construction for compression of large DNA data sets.
Kuruppu, Shanika; Beresford-Smith, Bryan; Conway, Thomas; Zobel, Justin
2012-01-01
Genomic repositories increasingly include individual as well as reference sequences, which tend to share long identical and near-identical strings of nucleotides. However, the sequential processing used by most compression algorithms, and the volumes of data involved, mean that these long-range repetitions are not detected. An order-insensitive, disk-based dictionary construction method can detect this repeated content and use it to compress collections of sequences. We explore a dictionary construction method that improves repeat identification in large DNA data sets. Our adaptation, COMRAD, of an existing disk-based method identifies exact repeated content in collections of sequences with similarities within and across the set of input sequences. COMRAD compresses the data over multiple passes, which is an expensive process, but allows COMRAD to compress large data sets within reasonable time and space. COMRAD allows for random access to individual sequences and subsequences without decompressing the whole data set. COMRAD has no competitor in terms of the size of data sets that it can compress (extending to many hundreds of gigabytes) and, even for smaller data sets, the results are competitive compared to alternatives; as an example, 39 S. cerevisiae genomes compressed to 0.25 bits per base.
DNA fingerprinting of Brassica juncea cultivars using microsatellite probes.
Bhatia, S; Das, S; Jain, A; Lakshmikumaran, M
1995-09-01
The genetic variability in the Brassica juncea cultivars was detected by employing in-gel hybridization of restricted DNA to simple repetitive sequences such as (GATA)4, (GACA)4 and (CAC)5. The most informative probe/enzyme combination was (GATA)4/EcoRI, yielding highly polymorphic fingerprint patterns for the B. juncea cultivars. This technique was found to be dependable for establishing the variety specific patterns for most of the cultivars studied, a prerequisite for germplasm preservation. The results of the present study were compared with those reported in our earlier study in which random amplification of polymorphic DNA (RAPD) was used for assessing the genetic variability in the B. juncea cultivars.
Mak, Chi H; Pham, Phuong; Afif, Samir A; Goodman, Myron F
2015-09-01
Enzymes that rely on random walk to search for substrate targets in a heterogeneously dispersed medium can leave behind complex spatial profiles of their catalyzed conversions. The catalytic signatures of these random-walk enzymes are the result of two coupled stochastic processes: scanning and catalysis. Here we develop analytical models to understand the conversion profiles produced by these enzymes, comparing an intrusive model, in which scanning and catalysis are tightly coupled, against a loosely coupled passive model. Diagrammatic theory and path-integral solutions of these models revealed clearly distinct predictions. Comparison to experimental data from catalyzed deaminations deposited on single-stranded DNA by the enzyme activation-induced deoxycytidine deaminase (AID) demonstrates that catalysis and diffusion are strongly intertwined, where the chemical conversions give rise to new stochastic trajectories that were absent if the substrate DNA was homogeneous. The C→U deamination profiles in both analytical predictions and experiments exhibit a strong contextual dependence, where the conversion rate of each target site is strongly contingent on the identities of other surrounding targets, with the intrusive model showing an excellent fit to the data. These methods can be applied to deduce sequence-dependent catalytic signatures of other DNA modification enzymes, with potential applications to cancer, gene regulation, and epigenetics.
Mak, Chi H.; Pham, Phuong; Afif, Samir A.; Goodman, Myron F.
2015-01-01
Enzymes that rely on random walk to search for substrate targets in a heterogeneously dispersed medium can leave behind complex spatial profiles of their catalyzed conversions. The catalytic signatures of these random-walk enzymes are the result of two coupled stochastic processes: scanning and catalysis. Here we develop analytical models to understand the conversion profiles produced by these enzymes, comparing an intrusive model, in which scanning and catalysis are tightly coupled, against a loosely coupled passive model. Diagrammatic theory and path-integral solutions of these models revealed clearly distinct predictions. Comparison to experimental data from catalyzed deaminations deposited on single-stranded DNA by the enzyme activation-induced deoxycytidine deaminase (AID) demonstrates that catalysis and diffusion are strongly intertwined, where the chemical conversions give rise to new stochastic trajectories that were absent if the substrate DNA was homogeneous. The C → U deamination profiles in both analytical predictions and experiments exhibit a strong contextual dependence, where the conversion rate of each target site is strongly contingent on the identities of other surrounding targets, with the intrusive model showing an excellent fit to the data. These methods can be applied to deduce sequence-dependent catalytic signatures of other DNA modification enzymes, with potential applications to cancer, gene regulation, and epigenetics. PMID:26465508
NASA Astrophysics Data System (ADS)
Mak, Chi H.; Pham, Phuong; Afif, Samir A.; Goodman, Myron F.
2015-09-01
Enzymes that rely on random walk to search for substrate targets in a heterogeneously dispersed medium can leave behind complex spatial profiles of their catalyzed conversions. The catalytic signatures of these random-walk enzymes are the result of two coupled stochastic processes: scanning and catalysis. Here we develop analytical models to understand the conversion profiles produced by these enzymes, comparing an intrusive model, in which scanning and catalysis are tightly coupled, against a loosely coupled passive model. Diagrammatic theory and path-integral solutions of these models revealed clearly distinct predictions. Comparison to experimental data from catalyzed deaminations deposited on single-stranded DNA by the enzyme activation-induced deoxycytidine deaminase (AID) demonstrates that catalysis and diffusion are strongly intertwined, where the chemical conversions give rise to new stochastic trajectories that were absent if the substrate DNA was homogeneous. The C →U deamination profiles in both analytical predictions and experiments exhibit a strong contextual dependence, where the conversion rate of each target site is strongly contingent on the identities of other surrounding targets, with the intrusive model showing an excellent fit to the data. These methods can be applied to deduce sequence-dependent catalytic signatures of other DNA modification enzymes, with potential applications to cancer, gene regulation, and epigenetics.
Heler, Robert; Wright, Addison V; Vucelja, Marija; Bikard, David; Doudna, Jennifer A; Marraffini, Luciano A
2017-01-05
CRISPR loci and their associated (Cas) proteins encode a prokaryotic immune system that protects against viruses and plasmids. Upon infection, a low fraction of cells acquire short DNA sequences from the invader. These sequences (spacers) are integrated in between the repeats of the CRISPR locus and immunize the host against the matching invader. Spacers specify the targets of the CRISPR immune response through transcription into short RNA guides that direct Cas nucleases to the invading DNA molecules. Here we performed random mutagenesis of the RNA-guided Cas9 nuclease to look for variants that provide enhanced immunity against viral infection. We identified a mutation, I473F, that increases the rate of spacer acquisition by more than two orders of magnitude. Our results highlight the role of Cas9 during CRISPR immunization and provide a useful tool to study this rare process and develop it as a biotechnological application. Copyright © 2017 Elsevier Inc. All rights reserved.
First report of human parvovirus 4 detection in Iran.
Asiyabi, Sanaz; Nejati, Ahmad; Shoja, Zabihollah; Shahmahmoodi, Shohreh; Jalilvand, Somayeh; Farahmand, Mohammad; Gorzin, Ali-Akbar; Najafi, Alireza; Haji Mollahoseini, Mostafa; Marashi, Sayed Mahdi
2016-08-01
Parvovirus 4 (PARV4) is an emerging and intriguing virus that currently received many attentions. High prevalence of PARV4 infection in high-risk groups such as HIV infected patients highlights the potential clinical outcomes that this virus might have. Molecular techniques were used to determine both the presence and the genotype of circulating PARV4 on previously collected serum samples from 133 HIV infected patients and 120 healthy blood donors. Nested PCR was applied to assess the presence of PARV4 DNA genome in both groups. PARV4 DNA was detected in 35.3% of HIV infected patients compared to 16.6% healthy donors. To genetically characterize the PARV4 genotype in these groups, positive samples were randomly selected and subjected for sequencing and phylogenetic analysis. All PARV4 sequences were found to be genotype 1 and clustered with the reference sequences of PARV4 genotype 1. J. Med. Virol. 88:1314-1318, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest
2007-01-01
WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794
Inagaki, Hidehito; Ohye, Tamae; Kogo, Hiroshi; Kato, Takema; Bolor, Hasbaira; Taniguchi, Mariko; Shaikh, Tamim H; Emanuel, Beverly S; Kurahashi, Hiroki
2009-02-01
Chromosomal aberrations have been thought to be random events. However, recent findings introduce a new paradigm in which certain DNA segments have the potential to adopt unusual conformations that lead to genomic instability and nonrandom chromosomal rearrangement. One of the best-studied examples is the palindromic AT-rich repeat (PATRR), which induces recurrent constitutional translocations in humans. Here, we established a plasmid-based model that promotes frequent intermolecular rearrangements between two PATRRs in HEK293 cells. In this model system, the proportion of PATRR plasmid that extrudes a cruciform structure correlates to the levels of rearrangement. Our data suggest that PATRR-mediated translocations are attributable to unusual DNA conformations that confer a common pathway for chromosomal rearrangements in humans.
Furano, A V; Somerville, C C; Tsichlis, P N; D'Ambrosio, E
1986-01-01
The long interspersed repeated DNA family of rats (LINE or L1Rn family) contains about 40,000 6.7-kilobase (kb) long members (1). LINE members may be currently mobile since their presence or absence causes allelic variation at three single copy loci (2, 3): insulin 1, Moloney leukemia virus integration 2 (Mlvi-2) (4), and immunoglobulin heavy chain (Igh). To characterize target sites for LINE insertion, we compared the DNA sequences of the unoccupied Mlvi-2 target site, its LINE-containing allele, and several other LINE-containing sites. Although not homologous overall, the target sites share three characteristics: First, depending on the site, they are from 68% to 86% (A+T) compared to 58% (A+T) for total rat DNA (5). Depending on the site, a 7- to 15-bp target site sequence becomes duplicated and flanks the inserted LINE member. The second is a version (0 or 1 mismatch) of the hexanucleotide, TACTCA, which is also present in the LINE member, in a highly conserved region located just before the A-rich right end of the LINE member. The third is a stretch of alternating purine/pyrimidine (PQ). The A-rich right ends of different LINE members vary in length and composition, and the sequence of a particularly long one suggests that it contains the A-rich target site from a previous transposition. PMID:3012480
2011-01-01
Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061
Alam, Nuhu; Kim, Jeong Hwa; Shim, Mi Ja; Lee, U Youn
2010-01-01
This study evaluated the optimal vegetative growth conditions and molecular phylogenetic relationships of eleven strains of Agrocybe cylindracea collected from different ecological regions of Korea, China and Taiwan. The optimal temperature and pH for mycelial growth were observed at 25℃ and 6. Potato dextrose agar and Hennerberg were the favorable media for vegetative growth, whereas glucose tryptone was unfavorable. Dextrin, maltose, and fructose were the most effective carbon sources. The most suitable nitrogen sources were arginine and glycine, whereas methionine, alanine, histidine, and urea were least effective for the mycelial propagation of A. cylindracea. The internal transcribed spacer (ITS) regions of rDNA were amplified using PCR. The sequence of ITS2 was more variable than that of ITS1, while the 5.8S sequences were identical. The reciprocal homologies of the ITS sequences ranged from 98 to 100%. The strains were also analyzed by random amplification of polymorphic DNA (RAPD) using 20 arbitrary primers. Fifteen primers efficiently amplified the genomic DNA. The average number of polymorphic bands observed per primer was 3.8. The numbers of amplified bands varied based on the primers and strains, with polymorphic fragments ranging from 0.1 to 2.9 kb. The results of RAPD analysis were similar to the ITS region sequences. The results revealed that RAPD and ITS techniques were well suited for detecting the genetic diversity of all A. cylindracea strains tested. PMID:23956633
Werner, Benjamin; Sottoriva, Andrea
2018-06-01
The immortal strand hypothesis poses that stem cells could produce differentiated progeny while conserving the original template strand, thus avoiding accumulating somatic mutations. However, quantitating the extent of non-random DNA strand segregation in human stem cells remains difficult in vivo. Here we show that the change of the mean and variance of the mutational burden with age in healthy human tissues allows estimating strand segregation probabilities and somatic mutation rates. We analysed deep sequencing data from healthy human colon, small intestine, liver, skin and brain. We found highly effective non-random DNA strand segregation in all adult tissues (mean strand segregation probability: 0.98, standard error bounds (0.97,0.99)). In contrast, non-random strand segregation efficiency is reduced to 0.87 (0.78,0.88) in neural tissue during early development, suggesting stem cell pool expansions due to symmetric self-renewal. Healthy somatic mutation rates differed across tissue types, ranging from 3.5 × 10-9/bp/division in small intestine to 1.6 × 10-7/bp/division in skin.
Jia, Ying; Cantu, Bruno A; Sánchez, Elda E; Pérez, John C
2008-06-15
To advance our knowledge on the snake venom composition and transcripts expressed in venom gland at the molecular level, we constructed a cDNA library from the venom gland of Agkistrodon piscivorus leucostoma for the generation of expressed sequence tags (ESTs) database. From the randomly sequenced 2112 independent clones, we have obtained ESTs for 1309 (62%) cDNAs, which showed significant deduced amino acid sequence similarity (scores >80) to previously characterized proteins in National Center for Biotechnology Information (NCBI) database. Ribosomal proteins make up 47 clones (2%) and the remaining 756 (36%) cDNAs represent either unknown identity or show BLASTX sequence identity scores of <80 with known GenBank accessions. The most highly expressed gene encoding phospholipase A(2) (PLA(2)) accounting for 35% of A. p. leucostoma venom gland cDNAs was identified and further confirmed by crude venom applied to sodium dodecyl sulfate/polyacrylamide gel electrophoresis (SDS-PAGE) electrophoresis and protein sequencing. A total of 180 representative genes were obtained from the sequence assemblies and deposited to EST database. Clones showing sequence identity to disintegrins, thrombin-like enzymes, hemorrhagic toxins, fibrinogen clotting inhibitors and plasminogen activators were also identified in our EST database. These data can be used to develop a research program that will help us identify genes encoding proteins that are of medical importance or proteins involved in the mechanisms of the toxin venom.
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions
Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S
2013-06-25
A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions
Gardner, Shea N [San Leandro, CA; Mariella, Jr., Raymond P.; Christian, Allen T [Tracy, CA; Young, Jennifer A [Berkeley, CA; Clague, David S [Livermore, CA
2011-01-18
A method of fabricating a DNA molecule of user-defined sequence. The method comprises the steps of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an even or odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths. In one embodiment starting sequence fragments are of different lengths, n, n+1, n+2, etc.
Ocan, Moses; Bwanga, Freddie; Okeng, Alfred; Katabazi, Fred; Kigozi, Edgar; Kyobe, Samuel; Ogwal-Okeng, Jasper; Obua, Celestino
2016-08-19
In the absence of an effective vaccine, malaria treatment and eradication is still a challenge in most endemic areas globally. This is especially the case with the current reported emergence of resistance to artemisinin agents in Southeast Asia. This study therefore explored the prevalence of K13-propeller gene polymorphisms among Plasmodium falciparum parasites in northern Uganda. Adult patients (≥18 years) presenting to out-patients department of Lira and Gulu regional referral hospitals in northern Uganda were randomly recruited. Laboratory investigation for presence of plasmodium infection among patients was done using Plasmodium falciparum exclusive rapid diagnostic test, histidine rich protein-2 (HRP2) (Pf). Finger prick capillary blood from patients with a positive malaria test was spotted on a filter paper Whatman no. 903. The parasite DNA was extracted using chelex resin method and sequenced for mutations in K13-propeller gene using Sanger sequencing. PCR DNA sequence products were analyzed using in DNAsp 5.10.01software, data was further processed in Excel spreadsheet 2007. A total of 60 parasite DNA samples were sequenced. Polymorphisms in the K13-propeller gene were detected in four (4) of the 60 parasite DNA samples sequenced. A non-synonymous polymorphism at codon 533 previously detected in Cambodia was found in the parasite DNA samples analyzed. Polymorphisms at codon 522 (non-synonymous) and codon 509 (synonymous) were also found in the samples analyzed. The study found evidence of positive selection in the Plasmodium falciparum population in northern Uganda (Tajima's D = -1.83205; Fu and Li's D = -1.82458). Polymorphism in the K13-propeller gene previously reported in Cambodia has been found in the Ugandan Plasmodium falciparum parasites. There is need for continuous surveillance for artemisinin resistance gene markers in the country.
Sargsyan, Ori
2012-05-25
Hitchhiking and severe bottleneck effects have impact on the dynamics of genetic diversity of a population by inducing homogenization at a single locus and at the genome-wide scale, respectively. As a result, identification and differentiation of the signatures of such events from DNA sequence data at a single locus is challenging. This study develops an analytical framework for identifying and differentiating recent homogenization events at multiple neutral loci in low recombination regions. The dynamics of genetic diversity at a locus after a recent homogenization event is modeled according to the infinite-sites mutation model and the Wright-Fisher model of reproduction withmore » constant population size. In this setting, I derive analytical expressions for the distribution, mean, and variance of the number of polymorphic sites in a random sample of DNA sequences from a locus affected by a recent homogenization event. Based on this framework, three likelihood-ratio based tests are presented for identifying and differentiating recent homogenization events at multiple loci. Lastly, I apply the framework to two data sets. First, I consider human DNA sequences from four non-coding loci on different chromosomes for inferring evolutionary history of modern human populations. The results suggest, in particular, that recent homogenization events at the loci are identifiable when the effective human population size is 50000 or greater in contrast to 10000, and the estimates of the recent homogenization events are agree with the “Out of Africa” hypothesis. Second, I use HIV DNA sequences from HIV-1-infected patients to infer the times of HIV seroconversions. The estimates are contrasted with other estimates derived as the mid-time point between the last HIV-negative and first HIV-positive screening tests. Finally, the results show that significant discrepancies can exist between the estimates.« less
Frimodt-Møller, Jakob; Charbon, Godefroid; Krogfelt, Karen A; Løbner-Olesen, Anders
2017-09-11
The optimal chromosomal position(s) of a given DNA element was/were determined by transposon-mediated random insertion followed by fitness selection. In bacteria, the impact of the genetic context on the function of a genetic element can be difficult to assess. Several mechanisms, including topological effects, transcriptional interference from neighboring genes, and/or replication-associated gene dosage, may affect the function of a given genetic element. Here, we describe a method that permits the random integration of a DNA element into the chromosome of Escherichia coli and select the most favorable locations using a simple growth competition experiment. The method takes advantage of a well-described transposon-based system of random insertion, coupled with a selection of the fittest clone(s) by growth advantage, a procedure that is easily adjustable to experimental needs. The nature of the fittest clone(s) can be determined by whole-genome sequencing on a complex multi-clonal population or by easy gene walking for the rapid identification of selected clones. Here, the non-coding DNA region DARS2, which controls the initiation of chromosome replication in E. coli, was used as an example. The function of DARS2 is known to be affected by replication-associated gene dosage; the closer DARS2 gets to the origin of DNA replication, the more active it becomes. DARS2 was randomly inserted into the chromosome of a DARS2-deleted strain. The resultant clones containing individual insertions were pooled and competed against one another for hundreds of generations. Finally, the fittest clones were characterized and found to contain DARS2 inserted in close proximity to the original DARS2 location.
NASA Technical Reports Server (NTRS)
Breaker, R. R.; Joyce, G. F.; Hoyce, G. F. (Principal Investigator)
1994-01-01
BACKGROUND: Several types of RNA enzymes (ribozymes) have been identified in biological systems and generated in the laboratory. Considering the variety of known RNA enzymes and the similarity of DNA and RNA, it is reasonable to imagine that DNA might be able to function as an enzyme as well. No such DNA enzyme has been found in nature, however. We set out to identify a metal-dependent DNA enzyme using in vitro selection methodology. RESULTS: Beginning with a population of 10(14) DNAs containing 50 random nucleotides, we carried out five successive rounds of selective amplification, enriching for individuals that best promote the Pb(2+)-dependent cleavage of a target ribonucleoside 3'-O-P bond embedded within an otherwise all-DNA sequence. By the fifth round, the population as a whole carried out this reaction at a rate of 0.2 min-1. Based on the sequence of 20 individuals isolated from this population, we designed a simplified version of the catalytic domain that operates in an intermolecular context with a turnover rate of 1 min-1. This rate is about 10(5)-fold increased compared to the uncatalyzed reaction. CONCLUSIONS: Using in vitro selection techniques, we obtained a DNA enzyme that catalyzes the Pb(2+)-dependent cleavage of an RNA phosphoester in a reaction that proceeds with rapid turnover. The catalytic rate compares favorably to that of known RNA enzymes. We expect that other examples of DNA enzymes will soon be forthcoming.
Counting of oligomers in sequences generated by markov chains for DNA motif discovery.
Shan, Gao; Zheng, Wei-Mou
2009-02-01
By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.
Menkis, Audrius; Marčiulynas, Adas; Gedminas, Artūras; Lynikienė, Jūratė; Povilaitienė, Aistė
2015-11-01
The aim of this study was to assess the diversity and composition of fungal communities in damaged and undamaged shoots of Norway spruce (Picea abies) following recent invasion of the spruce bud scale (Physokermes piceae) in Lithuania. Sampling was done in July 2013 and included 50 random lateral shoots from ten random trees in each of five visually undamaged and five damaged 40-50-year-old pure stands of P. abies. DNA was isolated from 500 individual shoots, subjected to amplification of the internal transcribed spacer of fungal ribosomal DNA (ITS rDNA), barcoded and sequenced. Clustering of 149,426 high-quality sequences resulted in 1193 non-singleton contigs of which 1039 (87.1 %) were fungal. In total, there were 893 fungal taxa in damaged shoots and 608 taxa in undamaged shoots (p < 0.0001). Furthermore, 431 (41.5 %) fungal taxa were exclusively in damaged shoots, 146 (14.0 %) were exclusively in undamaged shoots, and 462 (44.5 %) were common to both types of samples. Correspondence analysis showed that study sites representing damaged and undamaged shoots were separated from each other, indicating that in these fungal communities, these were largely different and, therefore, heavily affected by P. piceae. In conclusion, the results demonstrated that invasive alien tree pests may have a profound effect on fungal mycobiota associated with the phyllosphere of P. abies, and therefore, in addition to their direct negative effect owing physical damage of the tissue, they may also indirectly determine health, sustainability and, ultimately, distribution of the forest tree species.
Improving the prospects of cleavage-based nanopore sequencing engines
NASA Astrophysics Data System (ADS)
Brady, Kyle T.; Reiner, Joseph E.
2015-08-01
Recently proposed methods for DNA sequencing involve the use of cleavage-based enzymes attached to the opening of a nanopore. The idea is that DNA interacting with either an exonuclease or polymerase protein will lead to a small molecule being cleaved near the mouth of the nanopore, and subsequent entry into the pore will yield information about the DNA sequence. The prospects for this approach seem promising, but it has been shown that diffusion related effects impose a limit on the capture probability of molecules by the pore, which limits the efficacy of the technique. Here, we revisit the problem with the goal of optimizing the capture probability via a step decrease in the nucleotide diffusion coefficient between the pore and bulk solutions. It is shown through random walk simulations and a simplified analytical model that decreasing the molecule's diffusion coefficient in the bulk relative to its value in the pore increases the nucleotide capture probability. Specifically, we show that at sufficiently high applied transmembrane potentials (≥100 mV), increasing the potential by a factor f is equivalent to decreasing the diffusion coefficient ratio Dbulk/Dpore by the same factor f. This suggests a promising route toward implementation of cleavage-based sequencing protocols. We also discuss the feasibility of forming a step function in the diffusion coefficient across the pore-bulk interface.
Peerbolte, R; Leenhouts, K; Hooykaas-van Slogteren, G M; Hoge, J H; Wullems, G J; Schilperoort, R A
1986-07-01
Transformed clones from a shooty tobacco crown gall tumor, induced byAgrobacterium tumefaciens strain LBA1501, having a Tn1831 insertion in the auxin locus, were investigated for their T-DNA structure and expression. In addition to clones with the expected phenotype, i.e. phytohormone autonomy, regeneration of non-rooting shoots and octopine synthesis (Aut(+)Reg(+)Ocs(+) 'type I' clones), clones were obtained with an aberrant phenotype. Among these were the Aut(-)Reg(-)Ocs(+) 'type II' clones. Two shooty type I clones and three type II callus clones (all randomly chosen) as well as a rooting shoot regenerated from a type II clone via a high kinetin treatment, all had a T-DNA structure which differed significantly from 'regular' T-DNA structures. No Tn1831 DNA sequences were detected in these clones. The two type I clones were identical: they both contained the same highly truncated T-DNA segments. One TL-DNA segment of approximately 0.7 kb, originating form the left part of the TL-region, was present at one copy per diploid tobacco genome. Another segment with a maximum size of about 7 kb was derived from the right hand part of the TL-region and was present at minimally two copies. Three copies of a truncated TR-DNA segment were detected, probably starting at the right TR-DNA border repeat and ending halfway the regular TR-region. Indications have been obtained that at least some of the T-DNA segments are closely linked, sometimes via intervening plant DNA sequences. The type I clones harbored TL-DNA transcripts 4, 6a/b and 3 as well as TR-DNA transcript 0'. The type II clones harbored three to six highly truncated T-DNA segments, originating from the right part of the TL-region. In addition they had TR-DNA segments, similar to those of the type I clones. On Northern blots TR-DNA transcripts 0' and 1' were detected as well as the TL-DNA transcripts 3 and 6a/b and an 1800 bp hybrid transcript (tr.Y) containing gene 6b sequences. Possible origins of the observed irregularities in T-DNA structures are discussed in relation to fidelity of transformation of plant cells viaAgrobacterium.
Application of k-means clustering algorithm in grouping the DNA sequences of hepatitis B virus (HBV)
NASA Astrophysics Data System (ADS)
Bustamam, A.; Tasman, H.; Yuniarti, N.; Frisca, Mursidah, I.
2017-07-01
Based on WHO data, an estimated of 15 millions people worldwide who are infected with hepatitis B (HBsAg+), which is caused by HBV virus, are also infected by hepatitis D, which is caused by HDV virus. Hepatitis D infection can occur simultaneously with hepatitis B (co infection) or after a person is exposed to chronic hepatitis B (super infection). Since HDV cannot live without HBV, HDV infection is closely related to HBV infection, hence it is very realistic that every effort of prevention against hepatitis B can indirectly prevent hepatitis D. This paper presents clustering of HBV DNA sequences by using k-means clustering algorithm and R programming. Clustering processes are started with collecting HBV DNA sequences from GenBank, then performing extraction HBV DNA sequences using n-mers frequency and furthermore the extraction results are collected as a matrix and normalized using the min-max normalization with interval [0, 1] which will later be used as an input data. The number of clusters is two and the initial centroid selected of the cluster is chosen randomly. In each iteration, the distance of every object to each centroid are calculated using the Euclidean distance and the minimum distance is selected to determine the membership in a cluster until two convergent clusters are created. As the result, the HBV viruses in the first cluster is more virulent than the HBV viruses in the second cluster, so the HBV viruses in the first cluster can potentially evolve with HDV viruses that cause hepatitis D.
Rojas-Cartagena, Carmencita; Ortíz-Pineda, Pablo; Ramírez-Gómez, Francisco; Suárez-Castillo, Edna C.; Matos-Cruz, Vanessa; Rodríguez, Carlos; Ortíz-Zuazaga, Humberto; García-Arrarás, José E.
2010-01-01
Repair and regeneration are key processes for tissue maintenance, and their disruption may lead to disease states. Little is known about the molecular mechanisms that underline the repair and regeneration of the digestive tract. The sea cucumber Holothuria glaberrima represents an excellent model to dissect and characterize the molecular events during intestinal regeneration. To study the gene expression profile, cDNA libraries were constructed from normal, 3-day, and 7-day regenerating intestines of H. glaberrima. Clones were randomly sequenced and queried against the nonredundant protein database at the National Center for Biotechnology Information. RT-PCR analyses were made of several genes to determine their expression profile during intestinal regeneration. A total of 5,173 sequences from three cDNA libraries were obtained. About 46.2, 35.6, and 26.2% of the sequences for the normal, 3-days, and 7-days cDNA libraries, respectively, shared significant similarity with known sequences in the protein database of GenBank but only present 10% of similarity among them. Analysis of the libraries in terms of functional processes, protein domains, and most common sequences suggests that a differential expression profile is taking place during the regeneration process. Further examination of the expressed sequence tag dataset revealed that 12 putative genes are differentially expressed at significant level (R > 6). Experimental validation by RT-PCR analysis reveals that at least three genes (unknown C-4677-1, melanotransferrin, and centaurin) present a differential expression during regeneration. These findings strongly suggest that the gene expression profile varies among regeneration stages and provide evidence for the existence of differential gene expression. PMID:17579180
Hsmar1 Transposition Is Sensitive to the Topology of the Transposon Donor and the Target
Claeys Bouuaert, Corentin; Chalmers, Ronald
2013-01-01
Hsmar1 is a member of the Tc1-mariner superfamily of DNA transposons. These elements mobilize within the genome of their host by a cut-and-paste mechanism. We have exploited the in vitro reaction provided by Hsmar1 to investigate the effect of DNA supercoiling on transposon integration. We found that the topology of both the transposon and the target affect integration. Relaxed transposons have an integration defect that can be partially restored in the presence of elevated levels of negatively supercoiled target DNA. Negatively supercoiled DNA is a better target than nicked or positively supercoiled DNA, suggesting that underwinding of the DNA helix promotes target interactions. Like other Tc1-mariner elements, Hsmar1 integrates into 5′-TA dinucleotides. The direct vicinity of the target TA provides little sequence specificity for target interactions. However, transposition within a plasmid substrate was not random and some TA dinucleotides were targeted preferentially. The distribution of intramolecular target sites was not affected by DNA topology. PMID:23341977
Translocation-coupled DNA cleavage by the Type ISP restriction-modification enzymes
Chand, Mahesh Kumar; Nirwan, Neha; Diffin, Fiona M.; van Aelst, Kara; Kulkarni, Manasi; Pernstich, Christian; Szczelkun, Mark D.; Saikrishnan, Kayarat
2015-01-01
Endonucleolytic double-strand DNA break production requires separate strand cleavage events. Although catalytic mechanisms for simple dimeric endonucleases are available, there are many complex nuclease machines which are poorly understood in comparison. Here we studied the single polypeptide Type ISP restriction-modification (RM) enzymes, which cleave random DNA between distant target sites when two enzymes collide following convergent ATP-driven translocation. We report the 2.7 Angstroms resolution X-ray crystal structure of a Type ISP enzyme-DNA complex, revealing that both the helicase-like ATPase and nuclease are unexpectedly located upstream of the direction of translocation, inconsistent with simple nuclease domain-dimerization. Using single-molecule and biochemical techniques, we demonstrate that each ATPase remodels its DNA-protein complex and translocates along DNA without looping it, leading to a collision complex where the nuclease domains are distal. Sequencing of single cleavage events suggests a previously undescribed endonuclease model, where multiple, stochastic strand nicking events combine to produce DNA scission. PMID:26389736
Plumed-Ferrer, C; Barberio, A; Franklin-Guild, R; Werner, B; McDonough, P; Bennett, J; Gioia, G; Rota, N; Welcome, F; Nydam, D V; Moroni, P
2015-09-01
In total, 181 streptococci-like bacteria isolated from intramammary infections (IMI) were submitted by a veterinary clinic to Quality Milk Production Services (QMPS, Cornell University, Ithaca, NY). The isolates were characterized by sequence analysis, and 46 Lactococcus lactis ssp. lactis and 47 Lactococcus garvieae were tested for susceptibility to 17 antibiotics. No resistant strains were found for β-lactam antibiotics widely used in clinical practice (penicillin, ampicillin, and amoxicillin), and all minimum inhibitory concentrations (MIC) were far from the resistance breakpoints. Eight strains had MIC intermediate to cefazolin. The random amplification of polymorphic DNA (RAPD)-PCR fingerprint patterns showed a slightly higher heterogeneity for Lc. lactis ssp. lactis isolates than for Lc. garvieae isolates. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Assessing Date Palm Genetic Diversity Using Different Molecular Markers.
Atia, Mohamed A M; Sakr, Mahmoud M; Adawy, Sami S
2017-01-01
Molecular marker technologies which rely on DNA analysis provide powerful tools to assess biodiversity at different levels, i.e., among and within species. A range of different molecular marker techniques have been developed and extensively applied for detecting variability in date palm at the DNA level. Recently, the employment of gene-targeting molecular marker approaches to study biodiversity and genetic variations in many plant species has increased the attention of researchers interested in date palm to carry out phylogenetic studies using these novel marker systems. Molecular markers are good indicators of genetic distances among accessions, because DNA-based markers are neutral in the face of selection. Here we describe the employment of multidisciplinary molecular marker approaches: amplified fragment length polymorphism (AFLP), start codon targeted (SCoT) polymorphism, conserved DNA-derived polymorphism (CDDP), intron-targeted amplified polymorphism (ITAP), simple sequence repeats (SSR), and random amplified polymorphic DNA (RAPD) to assess genetic diversity in date palm.
Identification of a new retrotransposable element in loblolly pine
M.N. Islam-Faridi; A.M. Morse; K.E. Smith; J.M. Davis; S. Garcia; H.V. Amerson; M.A. Majid; T.L. Kubisiak; C.D. Nelson
2005-01-01
We initiated a project to locate the genomic position of fusiform rust resistance gene 1 (Fr1) in loblolly pine using fluorescent in situ hybridization (FISH). Four random amplified polymorphic DNA (RAPD) markers previously found to be tightly linked to Fr1 were cloned and sequenced, providing a total coverage of about 2 Kb. In order to obtain discernible signal of...
NASA Astrophysics Data System (ADS)
Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.
2017-07-01
DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
Yokota, Shin-ichi; Konno, Mutsuko; Fujiwara, Shin-ichi; Toita, Nariaki; Takahashi, Michiko; Yamamoto, Soh; Ogasawara, Noriko; Shiraishi, Tsukasa
2015-10-01
The infection route of Helicobacter pylori has been recognized to be mainly intrafamilial, preferentially mother-to-child, especially in developed countries. To determine the transmission route, we examined whether multilocus sequence typing (MLST) was useful for analysis of intrafamilial infection. The possibility of intraspousal infection was also evaluated. Clonal relationships between strains derived from 35 index Japanese pediatric patients, and their family members were analyzed by two genetic typing procedures, MLST and random amplified polymorphic DNA (RAPD) fingerprinting. Mostly coincident results were obtained by MLST and RAPD. By MLST, the allele of loci in the isolates mostly matched between the index child and both the father and mother for 9 (25.7%) of the 35 patients, between the index child and the mother for 25 (60.0%) of the 35 patients. MLST is useful for analyzing the infection route of H. pylori as a highly reproducible method. Intrafamilial, especially mother-to-children and sibling, infection is the dominant transmission route. Intraspousal infection is also thought to occur in about a quarter in the Japanese families. © 2015 John Wiley & Sons Ltd.
Fu, J J; Mei, Z Q; Tania, M; Yang, L Q; Cheng, J L; Khan, M A
2015-05-25
The sequence-characterized amplified region (SCAR) is a valuable molecular technique for the genetic identification of any species. This method is mainly derived from the molecular cloning of the amplified DNA fragments achieved from the random amplified polymorphic DNA (RAPD). In this study, we collected DNA from 10 species of Ganoderma mushroom and amplified the DNA using an improved RAPD technique. The amplified fragments were then cloned into a T-vector, and positive clones were screened, indentified, and sequenced for the development of SCAR markers. After designing PCR primers and optimizing PCR conditions, 4 SCAR markers, named LZ1-4, LZ2-2, LZ8-2, and LZ9-15, were developed, which were specific to Ganoderma gibbosum (LZ1-4 and LZ8-2), Ganoderma sinense (LZ2-2 and LZ8-2), Ganoderma tropicum (LZ8-2), and Ganoderma lucidum HG (LZ9-15). These 4 novel SCAR markers were deposited into GenBank with the accession Nos. KM391935, KM391936, KM391937, and KM391938, respectively. Thus, in this study we developed specific SCAR markers for the identification and authentication of different Ganoderma species.
Zambelli, Filippo; Mertens, Joke; Dziedzicka, Dominika; Sterckx, Johan; Markouli, Christina; Keller, Alexander; Tropel, Philippe; Jung, Laura; Viville, Stephane; Van de Velde, Hilde; Geens, Mieke; Seneca, Sara; Sermon, Karen; Spits, Claudia
2018-06-07
In this study, we deep-sequenced the mtDNA of human embryonic and induced pluripotent stem cells (hESCs and hiPSCs) and their source cells and found that the majority of variants pre-existed in the cells used to establish the lines. Early-passage hESCs carried few and low-load heteroplasmic variants, similar to those identified in oocytes and inner cell masses. The number and heteroplasmic loads of these variants increased with prolonged cell culture. The study of 120 individual cells of early- and late-passage hESCs revealed a significant diversity in mtDNA heteroplasmic variants at the single-cell level and that the variants that increase during time in culture are always passenger to the appearance of chromosomal abnormalities. We found that early-passage hiPSCs carry much higher loads of mtDNA variants than hESCs, which single-fibroblast sequencing proved pre-existed in the source cells. Finally, we show that these variants are stably transmitted during short-term differentiation. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Grawunder, U; Lieber, M R
1997-01-01
The recombination activating gene (RAG) 1 and 2 proteins are required for initiation of V(D)J recombination in vivo and have been shown to be sufficient to introduce DNA double-strand breaks at recombination signal sequences (RSSs) in a cell-free assay in vitro. RSSs consist of a highly conserved palindromic heptamer that is separated from a slightly less conserved A/T-rich nonamer by either a 12 or 23 bp spacer of random sequence. Despite the high sequence specificity of RAG-mediated cleavage at RSSs, direct binding of the RAG proteins to these sequences has been difficult to demonstrate by standard methods. Even when this can be demonstrated, questions about the order of events for an individual RAG-RSS complex will require methods that monitor aspects of the complex during transitions from one step of the reaction to the next. Here we have used template-independent DNA polymerase terminal deoxynucleotidyl transferase (TdT) in order to assess occupancy of the reaction intermediates by the RAG complex during the reaction. In addition, this approach allows analysis of the accessibility of end products of a RAG-catalyzed cleavage reaction for N nucleotide addition. The results indicate that RAG proteins form a long-lived complex with the RSS once the initial nick is generated, because the 3'-OH group at the nick remains obstructed for TdT-catalyzed N nucleotide addition. In contrast, the 3'-OH group generated at the signal end after completion of the cleavage reaction can be efficiently tailed by TdT, suggesting that the RAG proteins disassemble from the signal end after DNA double-strand cleavage has been completed. Therefore, a single RAG complex maintains occupancy from the first step (nick formation) to the second step (cleavage). In addition, the results suggest that N region diversity at V(D)J junctions within rearranged immunoglobulin and T cell receptor gene loci can only be introduced after the generation of RAG-catalyzed DNA double-strand breaks, i.e. during the DNA end joining phase of the V(D)J recombination reaction. PMID:9060432
Nicosia, Aldo; Maggio, Teresa; Mazzola, Salvatore; Cuttitta, Angela
2013-10-30
Anemonia viridis is a widespread and extensively studied Mediterranean species of sea anemone from which a large number of polypeptide toxins, such as blood depressing substances (BDS) peptides, have been isolated. The first members of this class, BDS-1 and BDS-2, are polypeptides belonging to the β-defensin fold family and were initially described for their antihypertensive and antiviral activities. BDS-1 and BDS-2 are 43 amino acid peptides characterised by three disulfide bonds that act as neurotoxins affecting Kv3.1, Kv3.2 and Kv3.4 channel gating kinetics. In addition, BDS-1 inactivates the Nav1.7 and Nav1.3 channels. The development of a large dataset of A. viridis expressed sequence tags (ESTs) and the identification of 13 putative BDS-like cDNA sequences has attracted interest, especially as scientific and diagnostic tools. A comparison of BDS cDNA sequences showed that the untranslated regions are more conserved than the protein-coding regions. Moreover, the KA/KS ratios calculated for all pairwise comparisons showed values greater than 1, suggesting mechanisms of accelerated evolution. The structures of the BDS homologs were predicted by molecular modelling. All toxins possess similar 3D structures that consist of a triple-stranded antiparallel β-sheet and an additional small antiparallel β-sheet located downstream of the cleavage/maturation site; however, the orientation of the triple-stranded β-sheet appears to differ among the toxins. To characterise the spatial expression profile of the putative BDS cDNA sequences, tissue-specific cDNA libraries, enriched for BDS transcripts, were constructed. In addition, the proper amplification of ectodermal or endodermal markers ensured the tissue specificity of each library. Sequencing randomly selected clones from each library revealed ectodermal-specific expression of ten BDS transcripts, while transcripts of BDS-8, BDS-13, BDS-14 and BDS-15 failed to be retrieved, likely due to under-representation in our cDNA libraries. The calculation of the relative abundance of BDS transcripts in the cDNA libraries revealed that BDS-1, BDS-3, BDS-4, BDS-5 and BDS-6 are the most represented transcripts.
Large-Scale Concatenation cDNA Sequencing
Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.
1997-01-01
A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174
Mariella, Jr., Raymond P.
2008-11-18
A method of synthesizing a desired double-stranded DNA of a predetermined length and of a predetermined sequence. Preselected sequence segments that will complete the desired double-stranded DNA are determined. Preselected segment sequences of DNA that will be used to complete the desired double-stranded DNA are provided. The preselected segment sequences of DNA are assembled to produce the desired double-stranded DNA.
Homologous and heterologous recombination between adenovirus vector DNA and chromosomal DNA.
Stephen, Sam Laurel; Sivanandam, Vijayshankar Ganesh; Kochanek, Stefan
2008-11-01
Adenovirus vector DNA is perceived to remain as episome following gene transfer. We quantitatively and qualitatively analysed recombination between high capacity adenoviral vector (HC-AdV) and chromosomal DNA following gene transfer in vitro. We studied homologous and heterologous recombination with a single HC-AdV carrying (i) a large genomic HPRT fragment with the HPRT CHICAGO mutation causing translational stop upon homologous recombination with the HPRT locus and (ii) a selection marker to allow for clonal selection in the event of heterologous recombination. We analysed the sequences at the junctions between vector and chromosomal DNA. In primary cells and in cell lines, the frequency of homologous recombination ranged from 2 x 10(-5) to 1.6 x 10(-6). Heterologous recombination occurred at rates between 5.5 x 10(-3) and 1.1 x 10(-4). HC-AdV DNA integrated via the termini mostly as intact molecules. Analysis of the junction sequences indicated vector integration in a relatively random manner without an obvious preference for particular chromosomal regions, but with a preference for integration into genes. Integration into protooncogenes or tumor suppressor genes was not observed. Patchy homologies between vector termini and chromosomal DNA were found at the site of integration. Although the majority of integrations had occurred without causing mutations in the chromosomal DNA, cases of nucleotide substitutions and insertions were observed. In several cases, deletions of even relative large chromosomal regions were likely. These results extend previous information on the integration patterns of adenovirus vector DNA and contribute to a risk-benefit assessment of adenovirus-mediated gene transfer.
Nanopore Technology: A Simple, Inexpensive, Futuristic Technology for DNA Sequencing.
Gupta, P D
2016-10-01
In health care, importance of DNA sequencing has been fully established. Sanger's Capillary Electrophoresis DNA sequencing methodology is time consuming, cumbersome, hence become more expensive. Lately, because of its versatility DNA sequencing became house hold name, and therefore, there is an urgent need of simple, fast, inexpensive, DNA sequencing technology. In the beginning of this century efforts were made, and Nanopore DNA sequencing technology was developed; still it is infancy, nevertheless, it is the futuristic technology.
The genome-wide DNA sequence specificity of the anti-tumour drug bleomycin in human cells.
Murray, Vincent; Chen, Jon K; Tanaka, Mark M
2016-07-01
The cancer chemotherapeutic agent, bleomycin, cleaves DNA at specific sites. For the first time, the genome-wide DNA sequence specificity of bleomycin breakage was determined in human cells. Utilising Illumina next-generation DNA sequencing techniques, over 200 million bleomycin cleavage sites were examined to elucidate the bleomycin genome-wide DNA selectivity. The genome-wide bleomycin cleavage data were analysed by four different methods to determine the cellular DNA sequence specificity of bleomycin strand breakage. For the most highly cleaved DNA sequences, the preferred site of bleomycin breakage was at 5'-GT* dinucleotide sequences (where the asterisk indicates the bleomycin cleavage site), with lesser cleavage at 5'-GC* dinucleotides. This investigation also determined longer bleomycin cleavage sequences, with preferred cleavage at 5'-GT*A and 5'- TGT* trinucleotide sequences, and 5'-TGT*A tetranucleotides. For cellular DNA, the hexanucleotide DNA sequence 5'-RTGT*AY (where R is a purine and Y is a pyrimidine) was the most highly cleaved DNA sequence. It was striking that alternating purine-pyrimidine sequences were highly cleaved by bleomycin. The highest intensity cleavage sites in cellular and purified DNA were very similar although there were some minor differences. Statistical nucleotide frequency analysis indicated a G nucleotide was present at the -3 position (relative to the cleavage site) in cellular DNA but was absent in purified DNA.
Edwards, W. Barry
2013-01-01
The aim of this study was to identify potential ligands of PSMA suitable for further development as novel PSMA-targeted peptides using phage display technology. The human PSMA protein was immobilized as a target followed by incubation with a 15-mer phage display random peptide library. After one round of prescreening and two rounds of screening, high-stringency screening at the third round of panning was performed to identify the highest affinity binders. Phages which had a specific binding activity to PSMA in human prostate cancer cells were isolated and the DNA corresponding to the 15-mers were sequenced to provide three consensus sequences: GDHSPFT, SHFSVGS and EVPRLSLLAVFL as well as other sequences that did not display consensus. Two of the peptide sequences deduced from DNA sequencing of binding phages, SHSFSVGSGDHSPFT and GRFLTGGTGRLLRIS were labeled with 5-carboxyfluorescein and shown to bind and co-internalize with PSMA on human prostate cancer cells by fluorescence microscopy. The high stringency requirements yielded peptides with affinities KD∼1 µM or greater which are suitable starting points for affinity maturation. While these values were less than anticipated, the high stringency did yield peptide sequences that apparently bound to different surfaces on PSMA. These peptide sequences could be the basis for further development of peptides for prostate cancer tumor imaging and therapy. PMID:23935860
Loukanov, Alexandre; Filipov, Chavdar; Lecheva, Marta; Emin, Saim
2015-11-01
The immobilization and stretching of randomly coiled DNA molecules on hydrophobic carbon film is a challenging microscopic technique, which possess various applications, especially for genome sequencing. In this report the pyrenyl nucleus is used as an anchor moiety to acquire higher affinity of double stranded DNA to the graphite surface. DNA and pyrene are joined through a linker composed of four aliphatic methylene groups. For the preparation of pyrene-terminated DNA a multifunctional phosphoramidite monomer compound was designed. It contains pyrenylbutoxy group as an anchor moiety for π-stacking attachment to the carbon film, 2-cyanoethyloxy, and diisopropylamino as coupling groups for conjugation to activated oligonucleotide chain or DNA molecule. This monomer derivative was suitable for incorporation into automated solid-phase DNA synthesis and was attached to the 5' terminus of the DNA chain through a phosphodiester linkage. The successful immobilization and stretching of pyrene-terminated DNA was demonstrated by conventional 100 kV transmission electron microscope. The microscopic analysis confirmed the stretched shape of the negatively charged nucleic acid pieces on the hydrophobic carbon film. © 2015 Wiley Periodicals, Inc.
Kuhn, G C S; Teo, C H; Schwarzacher, T; Heslop-Harrison, J S
2009-05-01
Satellite DNA (satDNA) is a major component of genomes but relatively little is known about the fine-scale organization of unrelated satDNAs residing at the same chromosome location, and the sequence structure and dynamics of satDNA junctions. We studied the organization and sequence junctions of two nonhomologous satDNAs, pBuM and DBC-150, in three species from the neotropical Drosophila buzzatii cluster (repleta group). In situ hybridization to microchromosomes, interphase nuclei and extended DNA fibers showed frequent interspersion of the two satellites in D. gouveai, D. antonietae and, to a lesser extent, D. seriema. We isolated by PCR six pBuM x DBC-150 junctions: four are exclusive to D. gouveai and two are exclusive to D. antonietae. The six junction breakpoints occur at different positions within monomers, suggesting independent origin. Four junctions showed abrupt transitions between the two satellites, whereas two junctions showed a distinct 10 bp tandem duplication before the junction. Unlike pBuM, DBC-150 junction repeats are more variable than randomly cloned monomers and showed diagnostic features in common to a 3-monomer higher-order repeat seen in the sister species D. serido. The high levels of interspersion between pBuM and DBC-150 repeats suggest extensive rearrangements between the two satellites, maybe favored by specific features of the microchromosomes. Our interpretation is that the junctions evolved by multiples events of illegitimate recombination between nonhomologous satDNA repeats, with subsequent rounds of unequal crossing-over expanding the copy number of some of the junctions.
Si, Zengzhi; Du, Bing; Huo, Jinxi; He, Shaozhen; Liu, Qingchang; Zhai, Hong
2016-11-21
Sweetpotato, Ipomoea batatas (L.) Lam., is an important food crop widely grown in the world. However, little is known about the genome of this species because it is a highly heterozygous hexaploid. Gaining a more in-depth knowledge of sweetpotato genome is therefore necessary and imperative. In this study, the first bacterial artificial chromosome (BAC) library of sweetpotato was constructed. Clones from the BAC library were end-sequenced and analyzed to provide genome-wide information about this species. The BAC library contained 240,384 clones with an average insert size of 101 kb and had a 7.93-10.82 × coverage of the genome, and the probability of isolating any single-copy DNA sequence from the library was more than 99%. Both ends of 8310 BAC clones randomly selected from the library were sequenced to generate 11,542 high-quality BAC-end sequences (BESs), with an accumulative length of 7,595,261 bp and an average length of 658 bp. Analysis of the BESs revealed that 12.17% of the sweetpotato genome were known repetitive DNA, including 7.37% long terminal repeat (LTR) retrotransposons, 1.15% Non-LTR retrotransposons and 1.42% Class II DNA transposons etc., 18.31% of the genome were identified as sweetpotato-unique repetitive DNA and 10.00% of the genome were predicted to be coding regions. In total, 3,846 simple sequences repeats (SSRs) were identified, with a density of one SSR per 1.93 kb, from which 288 SSRs primers were designed and tested for length polymorphism using 20 sweetpotato accessions, 173 (60.07%) of them produced polymorphic bands. Sweetpotato BESs had significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum than those of Vitis vinifera, Theobroma cacao and Arabidopsis thaliana. The first BAC library for sweetpotato has been successfully constructed. The high quality BESs provide first insights into sweetpotato genome composition, and have significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum. These resources as a robust platform will be used in high-resolution mapping, gene cloning, assembly of genome sequences, comparative genomics and evolution for sweetpotato.
Tóth, Júlia; van Aelst, Kara; Salmons, Hannah; Szczelkun, Mark D.
2012-01-01
DNA cleavage by the Type III Restriction–Modification (RM) enzymes requires the binding of a pair of RM enzymes at two distant, inversely orientated recognition sequences followed by helicase-catalysed ATP hydrolysis and long-range communication. Here we addressed the dissociation from DNA of these enzymes at two stages: during long-range communication and following DNA cleavage. First, we demonstrated that a communicating species can be trapped in a DNA domain without a recognition site, with a non-specific DNA association lifetime of ∼200 s. If free DNA ends were present the lifetime became too short to measure, confirming that ends accelerate dissociation. Secondly, we observed that Type III RM enzymes can dissociate upon DNA cleavage and go on to cleave further DNA molecules (they can ‘turnover’, albeit inefficiently). The relationship between the observed cleavage rate and enzyme concentration indicated independent binding of each site and a requirement for simultaneous interaction of at least two enzymes per DNA to achieve cleavage. In light of various mechanisms for helicase-driven motion on DNA, we suggest these results are most consistent with a thermally driven random 1D search model (i.e. ‘DNA sliding’). PMID:22523084
Single Cell Total RNA Sequencing through Isothermal Amplification in Picoliter-Droplet Emulsion.
Fu, Yusi; Chen, He; Liu, Lu; Huang, Yanyi
2016-11-15
Prevalent single cell RNA amplification and sequencing chemistries mainly focus on polyadenylated RNAs in eukaryotic cells by using oligo(dT) primers for reverse transcription. We develop a new RNA amplification method, "easier-seq", to reverse transcribe and amplify the total RNAs, both with and without polyadenylate tails, from a single cell for transcriptome sequencing with high efficiency, reproducibility, and accuracy. By distributing the reverse transcribed cDNA molecules into 1.5 × 10 5 aqueous droplets in oil, the cDNAs are isothermally amplified using random primers in each of these 65-pL reactors separately. This new method greatly improves the ease of single-cell RNA sequencing by reducing the experimental steps. Meanwhile, with less chance to induce errors, this method can easily maintain the quality of single-cell sequencing. In addition, this polyadenylate-tail-independent method can be seamlessly applied to prokaryotic cell RNA sequencing.
Al-Atiyat, R M; Aljumaah, R S
2014-08-27
This study aimed to estimate evolutionary distances and to reconstruct phylogeny trees between different Awassi sheep populations. Thirty-two sheep individuals from three different geographical areas of Jordan and the Kingdom of Saudi Arabia (KSA) were randomly sampled. DNA was extracted from the tissue samples and sequenced using the T7 promoter universal primer. Different phylogenetic trees were reconstructed from 0.64-kb DNA sequences using the MEGA software with the best general time reverse distance model. Three methods of distance estimation were then used. The maximum composite likelihood test was considered for reconstructing maximum likelihood, neighbor-joining and UPGMA trees. The maximum likelihood tree indicated three major clusters separated by cytosine (C) and thymine (T). The greatest distance was shown between the South sheep and North sheep. On the other hand, the KSA sheep as an outgroup showed shorter evolutionary distance to the North sheep population than to the others. The neighbor-joining and UPGMA trees showed quite reliable clusters of evolutionary differentiation of Jordan sheep populations from the Saudi population. The overall results support geographical information and ecological types of the sheep populations studied. Summing up, the resulting phylogeny trees may contribute to the limited information about the genetic relatedness and phylogeny of Awassi sheep in nearby Arab countries.
Read clouds uncover variation in complex regions of the human genome
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-01-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences
Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L.
2017-01-01
An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5′-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5′-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. PMID:28628204
Yao, Lin; Yang, Qian; Song, Jinzhu; Tan, Chong; Guo, Changhong; Wang, Li; Qu, Lianhai; Wang, Yun
2013-04-01
Trichoderma harzianum 88, a filamentous soil fungus, is an effective biocontrol agent against several plant pathogens. High-throughput sequencing was used here to study the mycoparasitism mechanisms of T. harzianum 88. Plate confrontation tests of T. harzianum 88 against plant pathogens were conducted, and a cDNA library was constructed from T. harzianum 88 mycelia in the presence of plant pathogen cell walls. Randomly selected transcripts from the cDNA library were compared with eukaryotic plant and fungal genomes. Of the 1,386 transcripts sequenced, the most abundant Gene Ontology (GO) classification group was "physiological process". Differential expression of 19 genes was confirmed by real-time RT-PCR at different mycoparasitism stages against plant pathogens. Gene expression analysis revealed the transcription of various genes involved in mycoparasitism of T. harzianum 88. Our study provides helpful insights into the mechanisms of T. harzianum 88-plant pathogen interactions.
Facilitated sequence counting and assembly by template mutagenesis
Levy, Dan; Wigler, Michael
2014-01-01
Presently, inferring the long-range structure of the DNA templates is limited by short read lengths. Accurate template counts suffer from distortions occurring during PCR amplification. We explore the utility of introducing random mutations in identical or nearly identical templates to create distinguishable patterns that are inherited during subsequent copying. We simulate the applications of this process under assumptions of error-free sequencing and perfect mapping, using cytosine deamination as a model for mutation. The simulations demonstrate that within readily achievable conditions of nucleotide conversion and sequence coverage, we can accurately count the number of otherwise identical molecules as well as connect variants separated by long spans of identical sequence. We discuss many potential applications, such as transcript profiling, isoform assembly, haplotype phasing, and de novo genome assembly. PMID:25313059
Environmental distribution, abundance and activity of the Miscellaneous Crenarchaeotal Group
NASA Astrophysics Data System (ADS)
Lloyd, K. G.; Biddle, J.; Teske, A.
2011-12-01
Many marine sedimentary microbes have only been identified by 16S rRNA sequences. Consequently, little is known about the types of metabolism, activity levels, or relative abundance of these groups in marine sediments. We found that one of these uncultured groups, called the Miscellaneous Crenarchaeotal Group (MCG), dominated clone libraries made from reverse transcribed 16S rRNA, and 454 pyrosequenced 16S rRNA genes, in the White Oak River estuary. Primers suitable for quantitative PCR were developed for MCG and used to show that 16S rRNA DNA copy numbers from MCG account for nearly all the archaeal 16S rRNA genes present. RT-qPCR shows much less MCG rRNA than total archaeal rRNA, but comparisons of different primers for each group suggest bias in the RNA-based work relative to the DNA-based work. There is no evidence of a population shift with depth below the sulfate-methane transition zone, suggesting that the metabolism of MCG may not be tied to sulfur or methane cycles. We classified 2,771 new sequences within the SSU Silva 106 database that, along with the classified sequences in the Silva database was used to make an MCG database of 4,646 sequences that allowed us to increase the named subgroups of MCG from 7 to 19. Percent terrestrial sequences in each subgroup is positively correlated with percent of the marine sequences that are nearshore, suggesting that membership in the different subgroups is not random, but dictated by environmental selective pressures. Given their high phylogenetic diversity, ubiquitous distribution in anoxic environments, and high DNA copy number relative to total archaea, members of MCG are most likely anaerobic heterotrophs who are integral to the post-depositional marine carbon cycle.
An improved model for whole genome phylogenetic analysis by Fourier transform.
Yin, Changchuan; Yau, Stephen S-T
2015-10-07
DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Capture-SELEX: Selection of DNA Aptamers for Aminoglycoside Antibiotics
2012-01-01
Small organic molecules are challenging targets for an aptamer selection using the SELEX technology (SELEX—Systematic Evolution of Ligans by EXponential enrichment). Often they are not suitable for immobilization on solid surfaces, which is a common procedure in known aptamer selection methods. The Capture-SELEX procedure allows the selection of DNA aptamers for solute targets. A special SELEX library was constructed with the aim to immobilize this library on magnetic beads or other surfaces. For this purpose a docking sequence was incorporated into the random region of the library enabling hybridization to a complementary oligo fixed on magnetic beads. Oligonucleotides of the library which exhibit high affinity to the target and a secondary structure fitting to the target are released from the beads for binding to the target during the aptamer selection process. The oligonucleotides of these binding complexes were amplified, purified, and immobilized via the docking sequence to the magnetic beads as the starting point of the following selection round. Based on this Capture-SELEX procedure, the successful DNA aptamer selection for the aminoglycoside antibiotic kanamycin A as a small molecule target is described. PMID:23326761
Chang, S C; Macêdo, D P C; Souza-Motta, C M; Oliveira, N T
2013-08-12
Fusarium verticillioides is a pathogen of agriculturally important crops, especially maize. It is considered one of the most important pathogens responsible for fumonisin contamination of food products, which causes severe, chronic, and acute intoxication in humans and animals. Moreover, it is recognized as a cause of localized infections in immunocompetent patients and disseminated infections among severely immunosuppressed patients. Several molecular tools have been used to analyze the intraspecific variability of fungi. The objective of this study was to use molecular markers to compare pathogenic isolates of F. verticillioides and isolates of the same species obtained from clinical samples of patients with Fusarium mycoses. The molecular markers that we used were inter-simple sequence repeat markers (primers GTG5 and GACA4), intron splice site primer (primer EI1), random amplified polymorphic DNA marker (primer OPW-6), and restriction fragment length polymorphism-internal transcribed spacer (ITS) from rDNA. From the data obtained, clusters were generated based on the UPGMA clustering method. The amplification products obtained using primers ITS4 and ITS5 and loci ITS1-5.8-ITS2 of the rDNA yielded fragments of approximately 600 bp for all the isolates. Digestion of the ITS region fragment using restriction enzymes such as EcoRI, DraI, BshI, AluI, HaeIII, HinfI, MspI, and PstI did not permit differentiation among pathogenic and clinical isolates. The inter-simple sequence repeat, intron splice site primer, and random amplified polymorphic DNA markers presented high genetic homogeneity among clinical isolates in contrast to the high variability found among the phytopathogenic isolates of F. verticillioides.
Abu Salim, Kamariah; Chase, Mark W.; Dexter, Kyle G.; Pennington, R. Toby; Tan, Sylvester; Kaye, Maria Ellen; Samuel, Rosabelle
2017-01-01
DNA barcoding is a fast and reliable tool to assess and monitor biodiversity and, via community phylogenetics, to investigate ecological and evolutionary processes that may be responsible for the community structure of forests. In this study, DNA barcodes for the two widely used plastid coding regions rbcL and matK are used to contribute to identification of morphologically undetermined individuals, as well as to investigate phylogenetic structure of tree communities in 70 subplots (10 × 10m) of a 25-ha forest-dynamics plot in Brunei (Borneo, Southeast Asia). The combined matrix (rbcL + matK) comprised 555 haplotypes (from ≥154 genera, 68 families and 25 orders sensu APG, Angiosperm Phylogeny Group, 2016), making a substantial contribution to tree barcode sequences from Southeast Asia. Barcode sequences were used to reconstruct phylogenetic relationships using maximum likelihood, both with and without constraining the topology of taxonomic orders to match that proposed by the Angiosperm Phylogeny Group. A third phylogenetic tree was reconstructed using the program Phylomatic to investigate the influence of phylogenetic resolution on results. Detection of non-random patterns of community assembly was determined by net relatedness index (NRI) and nearest taxon index (NTI). In most cases, community assembly was either random or phylogenetically clustered, which likely indicates the importance to community structure of habitat filtering based on phylogenetically correlated traits in determining community structure. Different phylogenetic trees gave similar overall results, but the Phylomatic tree produced greater variation across plots for NRI and NTI values, presumably due to noise introduced by using an unresolved phylogenetic tree. Our results suggest that using a DNA barcode tree has benefits over the traditionally used Phylomatic approach by increasing precision and accuracy and allowing the incorporation of taxonomically unidentified individuals into analyses. PMID:29049301
Brikun, I; Suziedelis, K; Berg, D E
1994-01-01
Derivatives of Escherichia coli K-12 of known ancestry were characterized by random amplified polymorphic DNA (RAPD) fingerprinting to better understand genome evolution in this family of closely related strains. This sensitive method entails PCR amplification with arbitrary primers at low stringency and yields arrays of anonymous DNA fragments that are strain specific. Among 150 fragments scored, eight were polymorphic in that they were produced from some but not all strains. Seven polymorphic bands were chromosomal, and one was from the F-factor plasmid. Five of the six mapped polymorphic chromosomal bands came from just 7% of the genome, a 340-kb segment that includes the terminus of replication. Two of these were from the cryptic Rac prophage, and the inability to amplify them from strains was attributable to deletion (excision) or to rearrangement of Rac. Two other terminus-region segments that resulted in polymorphic bands appeared to have sustained point mutations that affected the ability to amplify them. Control experiments showed that RAPD bands from the 340-kb terminus-region segment and also from two plasmids (P1 and F) were represented in approximate proportion to their size. Optimization experiments showed that the concentration of thermostable polymerase strongly affected the arrays of RAPD products obtained. Comparison of RAPD polymorphisms and positions of strains exhibiting them in the pedigree suggests that many sequence changes occurred in these historic E. coli strains during their storage. We propose that the clustering of such mutations near the terminus reflects errors during completion of chromosome replication, possibly during slow growth in the stab cultures that were often used to store E. coli strains in the early years of bacterial genetics. Images PMID:8132463
Mason, Christopher E.; Shu, Feng-Jue; Wang, Cheng; Session, Ryan M.; Kallen, Roland G.; Sidell, Neil; Yu, Tianwei; Liu, Mei Hui; Cheung, Edwin; Kallen, Caleb B.
2010-01-01
Location analysis for estrogen receptor-α (ERα)-bound cis-regulatory elements was determined in MCF7 cells using chromatin immunoprecipitation (ChIP)-on-chip. Here, we present the estrogen response element (ERE) sequences that were identified at ERα-bound loci and quantify the incidence of ERE sequences under two stringencies of detection: <10% and 10–20% nucleotide deviation from the canonical ERE sequence. We demonstrate that ∼50% of all ERα-bound loci do not have a discernable ERE and show that most ERα-bound EREs are not perfect consensus EREs. Approximately one-third of all ERα-bound ERE sequences reside within repetitive DNA sequences, most commonly of the AluS family. In addition, the 3-bp spacer between the inverted ERE half-sites, rather than being random nucleotides, is C(A/T)G-enriched at bona fide receptor targets. Diverse ERα-bound loci were validated using electrophoretic mobility shift assay and ChIP-polymerase chain reaction (PCR). The functional significance of receptor-bound loci was demonstrated using luciferase reporter assays which proved that repetitive element ERE sequences contribute to enhancer function. ChIP-PCR demonstrated estrogen-dependent recruitment of the coactivator SRC3 to these loci in vivo. Our data demonstrate that ERα binds to widely variant EREs with less sequence specificity than had previously been suspected and that binding at repetitive and nonrepetitive genomic targets is favored by specific trinucleotide spacers. PMID:20047966
Mason, Christopher E; Shu, Feng-Jue; Wang, Cheng; Session, Ryan M; Kallen, Roland G; Sidell, Neil; Yu, Tianwei; Liu, Mei Hui; Cheung, Edwin; Kallen, Caleb B
2010-04-01
Location analysis for estrogen receptor-alpha (ERalpha)-bound cis-regulatory elements was determined in MCF7 cells using chromatin immunoprecipitation (ChIP)-on-chip. Here, we present the estrogen response element (ERE) sequences that were identified at ERalpha-bound loci and quantify the incidence of ERE sequences under two stringencies of detection: <10% and 10-20% nucleotide deviation from the canonical ERE sequence. We demonstrate that approximately 50% of all ERalpha-bound loci do not have a discernable ERE and show that most ERalpha-bound EREs are not perfect consensus EREs. Approximately one-third of all ERalpha-bound ERE sequences reside within repetitive DNA sequences, most commonly of the AluS family. In addition, the 3-bp spacer between the inverted ERE half-sites, rather than being random nucleotides, is C(A/T)G-enriched at bona fide receptor targets. Diverse ERalpha-bound loci were validated using electrophoretic mobility shift assay and ChIP-polymerase chain reaction (PCR). The functional significance of receptor-bound loci was demonstrated using luciferase reporter assays which proved that repetitive element ERE sequences contribute to enhancer function. ChIP-PCR demonstrated estrogen-dependent recruitment of the coactivator SRC3 to these loci in vivo. Our data demonstrate that ERalpha binds to widely variant EREs with less sequence specificity than had previously been suspected and that binding at repetitive and nonrepetitive genomic targets is favored by specific trinucleotide spacers.
Gender Identification in Date Palm Using Molecular Markers.
Awan, Faisal Saeed; Maryam; Jaskani, Muhammad J; Sadia, Bushra
2017-01-01
Breeding of date palm is complicated because of its long life cycle and heterozygous nature. Sexual propagation of date palm does not produce true-to-type plants. Sex of date palms cannot be identified until the first flowering stage. Molecular markers such as random amplified polymorphic DNA (RAPD), sequence-characterized amplified regions (SCAR), and simple sequence repeats (SSR) have successfully been used to identify the sex-linked loci in the plant genome and to isolate the corresponding genes. This chapter highlights the use of three molecular markers including RAPD, SCAR, and SSR to identify the gender of date palm seedlings.
A genetic linkage map for hazelnut (Corylus avellana L.) based on RAPD and SSR markerswac
Shawn A. Mehlenbacher; Rebecca N. Brown; Eduardo R. Nouhra; Tufan Gokirmak; Nahla V. Bassil; Thomas L. Kubisiak
2006-01-01
A linkage map for European hazelnut (Corylus avellana L.) was constructed using random amplified polymorphic DNA (RAPD) and simple sequence repeat (SSR) markers and the 2-way pseudotestcross approach. A full-sib population of 144 seedlings from the cross OSU 252.146 x OSU 414.062 was used. RAPD markers in testcross configuration,segregating 1:I, were...
Yin, Changchuan
2015-04-01
To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
Single-cell genomic sequencing using Multiple Displacement Amplification.
Lasken, Roger S
2007-10-01
Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction. The few femtograms of DNA in a bacterium are amplified into micrograms of high molecular weight DNA suitable for DNA library construction and Sanger sequencing. The MDA-generated DNA also performs well when used directly as template for pyrosequencing by the 454 Life Sciences method. While MDA from single cells loses some of the genomic sequence, this approach will greatly accelerate the pace of sequencing from uncultured microbes. The genetically linked sequences from single cells are also a powerful tool to be used in guiding genomic assembly of shotgun sequences of multiple organisms from environmental DNA extracts (metagenomic sequences).
Encounter times of chromatin loci influenced by polymer decondensation
NASA Astrophysics Data System (ADS)
Amitai, A.; Holcman, D.
2018-03-01
The time for a DNA sequence to find its homologous counterpart depends on a long random search inside the cell nucleus. Using polymer models, we compute here the mean first encounter time (MFET) between two sites located on two different polymer chains and confined locally by potential wells. We find that reducing tethering forces acting on the polymers results in local decondensation, and numerical simulations of the polymer model show that these changes are associated with a reduction of the MFET by several orders of magnitude. We derive here new asymptotic formula for the MFET, confirmed by Brownian simulations. We conclude from the present modeling approach that the fast search for homology is mediated by a local chromatin decondensation due to the release of multiple chromatin tethering forces. The present scenario could explain how the homologous recombination pathway for double-stranded DNA repair is controlled by its random search step.
DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.
Austerlitz, Frederic; David, Olivier; Schaeffer, Brigitte; Bleakley, Kevin; Olteanu, Madalina; Leblois, Raphael; Veuille, Michel; Laredo, Catherine
2009-11-10
DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
Hamula, Camille L A; Peng, Hanyong; Wang, Zhixin; Tyrrell, Gregory J; Li, Xing-Fang; Le, X Chris
2016-03-15
Streptococcus pyogenes is a clinically important pathogen consisting of various serotypes determined by different M proteins expressed on the cell surface. The M type is therefore a useful marker to monitor the spread of invasive S. pyogenes in a population. Serotyping and nucleic acid amplification/sequencing methods for the identification of M types are laborious, inconsistent, and usually confined to reference laboratories. The primary objective of this work is to develop a technique that enables generation of aptamers binding to specific M-types of S. pyogenes. We describe here an in vitro technique that directly used live bacterial cells and the Systematic Evolution of Ligands by Exponential Enrichment (SELEX) strategy. Live S. pyogenes cells were incubated with DNA libraries consisting of 40-nucleotides randomized sequences. Those sequences that bound to the cells were separated, amplified using polymerase chain reaction (PCR), purified using gel electrophoresis, and served as the input DNA pool for the next round of SELEX selection. A specially designed forward primer containing extended polyA20/5Sp9 facilitated gel electrophoresis purification of ssDNA after PCR amplification. A counter-selection step using non-target cells was introduced to improve selectivity. DNA libraries of different starting sequence diversity (10(16) and 10(14)) were compared. Aptamer pools from each round of selection were tested for their binding to the target and non-target cells using flow cytometry. Selected aptamer pools were then cloned and sequenced. Individual aptamer sequences were screened on the basis of their binding to the 10 M-types that were used as targets. Aptamer pools obtained from SELEX rounds 5-8 showed high affinity to the target S. pyogenes cells. Tests against non-target Streptococcus bovis, Streptococcus pneumoniae, and Enterococcus species demonstrated selectivity of these aptamers for binding to S. pyogenes. Several aptamer sequences were found to bind preferentially to the M11 M-type of S. pyogenes. Estimated binding dissociation constants (Kd) were in the low nanomolar range for the M11 specific sequences; for example, sequence E-CA20 had a Kd of 7±1 nM. These affinities are comparable to those of a monoclonal antibody. The improved bacterial cell-SELEX technique is successful in generating aptamers selective for S. pyogenes and some of its M-types. These aptamers are potentially useful for detecting S. pyogenes, achieving binding profiles of the various M-types, and developing new M-typing technologies for non-specialized laboratories or point-of-care testing. Copyright © 2015 Elsevier Inc. All rights reserved.
Ma, Y-Z; Tomita, M
2013-01-01
Thinopyrum intermedium is a useful source of resistance genes for Barley Yellow Dwarf Virus (BYDV), one of the most damaging wheat diseases. In this study, wheat/Th. intermedium translocation lines with a BYDV resistance gene were developed using the Th. intermedium 7Ai- 1 chromosome. Genomic in situ hybridization (GISH), using a Th. intermedium total genomic DNA probe, enabled detection of 7Ai-1-derived small chromatins containing a BYDV resistance gene, which were translocated onto the end of wheat chromosomes in the lines Y95011 and Y960843. Random amplified polymorphic DNA (RAPD) analyses using 120 random 10-mer primers were conducted to compare the BYDV-resistant translocation lines with susceptible lines. Two primers amplified the DNA fragments specific to the resistant line that would be useful as molecular markers to identify 7Ai-1-derived BYDV resistance chromatin in the wheat genome. Additionally, the isolated Th. intermedium-specific retrotransposon-like sequence pTi28 can be used to identify Th. intermedium chromatin transferred to the wheat genome.
Acquisition of New DNA Sequences After Infection of Chicken Cells with Avian Myeloblastosis Virus
Shoyab, M.; Baluda, M. A.; Evans, R.
1974-01-01
DNA-RNA hybridization studies between 70S RNA from avian myeloblastosis virus (AMV) and an excess of DNA from (i) AMV-induced leukemic chicken myeloblasts or (ii) a mixture of normal and of congenitally infected K-137 chicken embryos producing avian leukosis viruses revealed the presence of fast- and slow-hybridizing virus-specific DNA sequences. However, the leukemic cells contained twice the level of AMV-specific DNA sequences observed in normal chicken embryonic cells. The fast-reacting sequences were two to three times more numerous in leukemic DNA than in DNA from the mixed embryos. The slow-reacting sequences had a reiteration frequency of approximately 9 and 6, in the two respective systems. Both the fast- and the slow-reacting DNA sequences in leukemic cells exhibited a higher Tm (2 C) than the respective DNA sequences in normal cells. In normal and leukemic cells the slow hybrid sequences appeared to have a Tm which was 2 C higher than that of the fast hybrid sequences. Individual non-virus-producing chicken embryos, either group-specific antigen positive or negative, contained 40 to 100 copies of the fast sequences and 2 to 6 copies of the slowly hybridizing sequences per cell genome. Normal rat cells did not contain DNA that hybridized with AMV RNA, whereas non-virus-producing rat cells transformed by B-77 avian sarcoma virus contained only the slowly reacting sequences. The results demonstrate that leukemic cells transformed by AMV contain new AMV-specific DNA sequences which were not present before infection. PMID:16789139
Zhao, Yinhe; Wang, Guoying; Zhang, Jinpeng; Yang, Junbo; Peng, Shang; Gao, Lianming; Li, Chengyun; Hu, Jinyong; Li, Dezhu; Gao, Lizhi
2006-07-01
Asarum caudigerum (Aristolochiaceae) is an important species of paleoherb in relation to understanding the origin and evolution of angiosperm flowers, due to its basal position in the angiosperms. The aim of this study was to isolate floral-related genes from A. caudigerum, and to infer evolutionary relationships among florally expression-related genes, to further illustrate the origin and diversification of flowers in angiosperms. A subtracted floral cDNA library was constructed from floral buds using suppression subtractive hybridization (SSH). The cDNA of floral buds and leaves at the seedling stage were used as a tester and a driver, respectively. To further identify the function of putative MADS-box transcription factors, phylogenetic trees were reconstructed in order to infer evolutionary relationships within the MADS-box gene family. In the forward-subtracted floral cDNA library, 1920 clones were randomly sequenced, from which 567 unique expressed sequence tags (ESTs) were obtained. Among them, 127 genes failed to show significant similarity to any published sequences in GenBank and thus are putatively novel genes. Phylogenetic analysis indicated that a total of 29 MADS-box transcription factors were members of the APETALA3(AP3) subfamily, while nine others were putative MADS-box transcription factors that formed a cluster with MADS-box genes isolated from Amborella, the basal-most angiosperm, and those from the gymnosperms. This suggests that the origin of A. caudigerum is intermediate between the angiosperms and gymnosperms.
3-base periodicity in coding DNA is affected by intercodon dinucleotides
Sánchez, Joaquín
2011-01-01
All coding DNAs exhibit 3-base periodicity (TBP), which may be defined as the tendency of nucleotides and higher order n-tuples, e.g. trinucleotides (triplets), to be preferentially spaced by 3, 6, 9 etc, bases, and we have proposed an association between TBP and clustering of same-phase triplets. We here investigated if TBP was affected by intercodon dinucleotide tendencies and whether clustering of same-phase triplets was involved. Under constant protein sequence intercodon dinucleotide frequencies depend on the distribution of synonymous codons. So, possible effects were revealed by randomly exchanging synonymous codons without altering protein sequences to subsequently document changes in TBP via frequency distribution of distances (FDD) of DNA triplets. A tripartite positive correlation was found between intercodon dinucleotide frequencies, clustering of same-phase triplets and TBP. So, intercodon C|A (where “|” indicates the boundary between codons) was more frequent in native human DNA than in the codon-shuffled sequences; higher C|A frequency occurred along with more frequent clustering of C|AN triplets (where N jointly represents A, C, G and T) and with intense CAN TBP. The opposite was found for C|G, which was less frequent in native than in shuffled sequences; lower C|G frequency occurred together with reduced clustering of C|GN triplets and with less intense CGN TBP. We hence propose that intercodon dinucleotides affect TBP via same-phase triplet clustering. A possible biological relevance of our findings is briefly discussed. PMID:21814388
Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC
2006-01-01
Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935
Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth
2015-01-01
ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644
Development of new strains and related SCAR markers for an edible mushroom, Hypsizygus marmoreus.
Lee, Chang Y; Park, Jeong-Eun; Lee, Jia; Kim, Jong-Kuk; Ro, Hyeon-Su
2012-02-01
New fast-growing and less bitter varieties of Hypsizygus marmoreus were developed by crossing monokaryotic mycelia from a commercial strain (Hm1-1) and a wild strain (Hm3-10). Six of the better tasting new strains with a shorter cultivation period were selected from 400 crosses in a large-scale cultivation experiment. We attempted to develop sequence characterized amplified region (SCAR) markers to identify the new strain from other commercial strains. For the SCAR markers, we conducted molecular genetic analysis on a wild strain and the eight most cultivated H. marmoreus strains collected from various areas in East Asia by randomly amplified polymorphic DNA. Ten unique DNA bands for a commercial Hm1-1 strain and the Hm3-10 strain were extracted and their sequences were determined. Primer sets were designed based on the determined sequences. PCR reactions with the primer sets revealed that four primer sets successfully discriminated the new strains from other commercial strains and are thus suitable for commercial purposes. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
McCutchen-Maloney, Sandra L.
2002-01-01
DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.
Sanderson, Nicholas D.; Atkins, Bridget L.; Brent, Andrew J.; Cole, Kevin; Foster, Dona; McNally, Martin A.; Oakley, Sarah; Peto, Leon; Taylor, Adrian; Peto, Tim E. A.; Crook, Derrick W.; Eyre, David W.
2017-01-01
ABSTRACT Culture of multiple periprosthetic tissue samples is the current gold standard for microbiological diagnosis of prosthetic joint infections (PJI). Additional diagnostic information may be obtained through culture of sonication fluid from explants. However, current techniques can have relatively low sensitivity, with prior antimicrobial therapy and infection by fastidious organisms influencing results. We assessed if metagenomic sequencing of total DNA extracts obtained direct from sonication fluid can provide an alternative rapid and sensitive tool for diagnosis of PJI. We compared metagenomic sequencing with standard aerobic and anaerobic culture in 97 sonication fluid samples from prosthetic joint and other orthopedic device infections. Reads from Illumina MiSeq sequencing were taxonomically classified using Kraken. Using 50 derivation samples, we determined optimal thresholds for the number and proportion of bacterial reads required to identify an infection and confirmed our findings in 47 independent validation samples. Compared to results from sonication fluid culture, the species-level sensitivity of metagenomic sequencing was 61/69 (88%; 95% confidence interval [CI], 77 to 94%; for derivation samples 35/38 [92%; 95% CI, 79 to 98%]; for validation samples, 26/31 [84%; 95% CI, 66 to 95%]), and genus-level sensitivity was 64/69 (93%; 95% CI, 84 to 98%). Species-level specificity, adjusting for plausible fastidious causes of infection, species found in concurrently obtained tissue samples, and prior antibiotics, was 85/97 (88%; 95% CI, 79 to 93%; for derivation samples, 43/50 [86%; 95% CI, 73 to 94%]; for validation samples, 42/47 [89%; 95% CI, 77 to 96%]). High levels of human DNA contamination were seen despite the use of laboratory methods to remove it. Rigorous laboratory good practice was required to minimize bacterial DNA contamination. We demonstrate that metagenomic sequencing can provide accurate diagnostic information in PJI. Our findings, combined with the increasing availability of portable, random-access sequencing technology, offer the potential to translate metagenomic sequencing into a rapid diagnostic tool in PJI. PMID:28490492
Holahan, Matthew R.; Madularu, Dan; McConnell, Erin M.; Walsh, Ryan; DeRosa, Maria C.
2011-01-01
Systemic administration of the noncompetitive NMDA-receptor antagonist, MK-801, has been proposed to model cognitive deficits similar to those seen in patients with schizophrenia. The present work investigated the ability of a dopamine-binding DNA aptamer to regulate these MK-801-induced cognitive deficits when injected into the nucleus accumbens. Rats were trained to bar press for chocolate pellet rewards then randomly assigned to receive an intra-accumbens injection of a DNA aptamer (200 nM; n = 7), tris buffer (n = 6) or a randomized DNA oligonucleotide (n = 7). Animals were then treated systemically with MK-801 (0.1 mg/kg) and tested for their ability to extinguish their bar pressing response. Two control groups were also included that did not receive MK-801. Data revealed that injection of Tris buffer or the random oligonucleotide sequence into the nucleus accumbens prior to treatment with MK-801 did not reduce the MK-801-induced extinction deficit. Animals continued to press at a high rate over the entire course of the extinction session. Injection of the dopamine aptamer reversed this MK-801-induced elevation in lever pressing to levels as seen in rats not treated with MK-801. Tests for activity showed that the aptamer did not impair locomotor activity. Results demonstrate the in vivo utility of DNA aptamers as tools to investigate neurobiological processes in preclinical animal models of mental health disease. PMID:21779401
Direct observation of single flexible polymers using single stranded DNA†
Brockman, Christopher; Kim, Sun Ju
2012-01-01
Over the last 15 years, double stranded DNA (dsDNA) has been used as a model polymeric system for nearly all single polymer dynamics studies. However, dsDNA is a semiflexible polymer with markedly different molecular properties compared to flexible chains, including synthetic organic polymers. In this work, we report a new system for single polymer studies of flexible chains based on single stranded DNA (ssDNA). We developed a method to synthesize ssDNA for fluorescence microscopy based on rolling circle replication, which generates long strands (>65 kb) of ssDNA containing “designer” sequences, thereby preventing intramolecular base pair interactions. Polymers are synthesized to contain amine-modified bases randomly distributed along the backbone, which enables uniform labelling of polymer chains with a fluorescent dye to facilitate fluorescence microscopy and imaging. Using this approach, we synthesized ssDNA chains with long contour lengths (>30 μm) and relatively low dye loading ratios (~1 dye per 100 bases). In addition, we used epifluorescence microscopy to image single ssDNA polymer molecules stretching in flow in a microfluidic device. Overall, we anticipate that ssDNA will serve as a useful model system to probe the dynamics of polymeric materials at the molecular level. PMID:22956981
Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.
Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N
1984-03-26
The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic DNA methylation between sperm and oocyte DNA. The methylation levels of the minor satellite sequences did not change during spermiogenesis, and were not associated with the onset of meiosis or a specific stage in sperm development.
Process of labeling specific chromosomes using recombinant repetitive DNA
Moyzis, R.K.; Meyne, J.
1988-02-12
Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.
Host-Associated Metagenomics: A Guide to Generating Infectious RNA Viromes
Robert, Catherine; Pascalis, Hervé; Michelle, Caroline; Jardot, Priscilla; Charrel, Rémi; Raoult, Didier; Desnues, Christelle
2015-01-01
Background Metagenomic analyses have been widely used in the last decade to describe viral communities in various environments or to identify the etiology of human, animal, and plant pathologies. Here, we present a simple and standardized protocol that allows for the purification and sequencing of RNA viromes from complex biological samples with an important reduction of host DNA and RNA contaminants, while preserving the infectivity of viral particles. Principal Findings We evaluated different viral purification steps, random reverse transcriptions and sequence-independent amplifications of a pool of representative RNA viruses. Viruses remained infectious after the purification process. We then validated the protocol by sequencing the RNA virome of human body lice engorged in vitro with artificially contaminated human blood. The full genomes of the most abundant viruses absorbed by the lice during the blood meal were successfully sequenced. Interestingly, random amplifications differed in the genome coverage of segmented RNA viruses. Moreover, the majority of reads were taxonomically identified, and only 7–15% of all reads were classified as “unknown”, depending on the random amplification method. Conclusion The protocol reported here could easily be applied to generate RNA viral metagenomes from complex biological samples of different origins. Our protocol allows further virological characterizations of the described viral communities because it preserves the infectivity of viral particles and allows for the isolation of viruses. PMID:26431175
Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.
Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook
2014-11-01
As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
A Three-Dimensional Model of the Yeast Genome
NASA Astrophysics Data System (ADS)
Noble, William; Duan, Zhi-Jun; Andronescu, Mirela; Schutz, Kevin; McIlwain, Sean; Kim, Yoo Jung; Lee, Choli; Shendure, Jay; Fields, Stanley; Blau, C. Anthony
Layered on top of information conveyed by DNA sequence and chromatin are higher order structures that encompass portions of chromosomes, entire chromosomes, and even whole genomes. Interphase chromosomes are not positioned randomly within the nucleus, but instead adopt preferred conformations. Disparate DNA elements co-localize into functionally defined aggregates or factories for transcription and DNA replication. In budding yeast, Drosophila and many other eukaryotes, chromosomes adopt a Rabl configuration, with arms extending from centromeres adjacent to the spindle pole body to telomeres that abut the nuclear envelope. Nonetheless, the topologies and spatial relationships of chromosomes remain poorly understood. Here we developed a method to globally capture intra- and inter-chromosomal interactions, and applied it to generate a map at kilobase resolution of the haploid genome of Saccharomyces cerevisiae. The map recapitulates known features of genome organization, thereby validating the method, and identifies new features. Extensive regional and higher order folding of individual chromosomes is observed. Chromosome XII exhibits a striking conformation that implicates the nucleolus as a formidable barrier to interaction between DNA sequences at either end. Inter-chromosomal contacts are anchored by centromeres and include interactions among transfer RNA genes, among origins of early DNA replication and among sites where chromosomal breakpoints occur. Finally, we constructed a three-dimensional model of the yeast genome. Our findings provide a glimpse of the interface between the form and function of a eukaryotic genome.
Assessment of genome origins and genetic diversity in the genus Eleusine with DNA markers.
Salimath, S S; de Oliveira, A C; Godwin, I D; Bennetzen, J L
1995-08-01
Finger millet (Eleusine coracana), an allotetraploid cereal, is widely cultivated in the arid and semiarid regions of the world. Three DNA marker techniques, restriction fragment length polymorphism (RFLP), randomly amplified polymorphic DNA (RAPD), and inter simple sequence repeat amplification (ISSR), were employed to analyze 22 accessions belonging to 5 species of Eleusine. An 8 probe--3 enzyme RFLP combination, 18 RAPD primers, and 6 ISSR primers, respectively, revealed 14, 10, and 26% polymorphism in 17 accessions of E. coracana from Africa and Asia. These results indicated a very low level of DNA sequence variability in the finger millets but did allow each line to be distinguished. The different Eleusine species could be easily identified by DNA marker technology and the 16% intraspecific polymorphism exhibited by the two analyzed accessions of E. floccifolia suggested a much higher level of diversity in this species than in E. coracana. Between species, E. coracana and E. indica shared the most markers, while E. indica and E. tristachya shared a considerable number of markers, indicating that these three species form a close genetic assemblage within the Eleusine. Eleusine floccifolia and E. compressa were found to be the most divergent among the species examined. Comparison of RFLP, RAPD, and ISSR technologies, in terms of the quantity and quality of data output, indicated that ISSRs are particularly promising for the analysis of plant genome diversity.
[Variability of nuclear 18S-25S rDNA of Gentiana lutea L. in nature and in tissue culture in vitro].
Mel'nyk, V M; Spiridonova, K V; Andrieiev, I O; Strashniuk, N M; Kunakh, V A
2004-01-01
18S-25S rDNA sequence in genomes of G. lutea plants from different natural populations and from tissue culture has been studied with blot-hybridization method. It was shown that ribosomal repeats are represented by the variants which differ for their size and for the presence of additional HindIII restriction site. Genome of individual plant usually possesses several variants of DNA repeats. Interpopulation variability according to their quantitative ratio and to the presence of some of them has been shown. Modifications of the range of rDNA repeats not exceeding intraspecific variability were observed in callus tissues in comparison with the plants of initial population. Non-randomness of genome modifications in the course of cell adaptation to in vitro conditions makes it possible to some extent to forecast these modifications in tissue culture.
NASA Technical Reports Server (NTRS)
Balcer-Kubiczek, E. K.; Meltzer, S. J.; Han, L. H.; Zhang, X. F.; Shi, Z. M.; Harrison, G. H.; Abraham, J. M.
1997-01-01
A novel polymerase chain reaction (PCR)-based method was used to identify candidate genes whose expression is altered in cancer cells by ionizing radiation. Transcriptional induction of randomly selected genes in control versus irradiated human HL60 cells was compared. Among several complementary DNA (cDNA) clones recovered by this approach, one cDNA clone (CL68-5) was downregulated in X-irradiated HL60 cells but unaffected by 12-O-tetradecanoyl phorbol-13-acetate, forskolin, or cyclosporin-A. DNA sequencing of the CL68-5 cDNA revealed 100% nucleotide sequence homology to the reported human Csa-19 gene. Northern blot analysis of RNA from control and irradiated cells revealed the expression of a single 0.7-kilobase (kb) messenger RNA (mRNA) transcript. This 0.7-kb Csa-19 mRNA transcript was also expressed in a variety of human adult and corresponding fetal normal tissues. Moreover, when the effect of X- or fission neutron-irradiation on Csa-19 mRNA was compared in cultured human cells differing in p53 gene status (p53-/- versus p53+/+), downregulation of Csa-19 by X-rays or fission neutrons was similar in p53-wild type and p53-null cell lines. Our results provide the first known example of a radiation-responsive gene in human cancer cells whose expression is not associated with p53, adenylate cyclase or protein kinase C.
Metagenomic characterization of airborne viral DNA diversity in the near-surface atmosphere.
Whon, Tae Woong; Kim, Min-Soo; Roh, Seong Woon; Shin, Na-Ri; Lee, Hae-Won; Bae, Jin-Woo
2012-08-01
Airborne viruses are expected to be ubiquitous in the atmosphere but they still remain poorly understood. This study investigated the temporal and spatial dynamics of airborne viruses and their genotypic characteristics in air samples collected from three distinct land use types (a residential district [RD], a forest [FR], and an industrial complex [IC]) and from rainwater samples freshly precipitated at the RD site (RD-rain). Viral abundance exhibited a seasonal fluctuation in the range between 1.7 × 10(6) and 4.0 × 10(7) viruses m(-3), which increased from autumn to winter and decreased toward spring, but no significant spatial differences were observed. Temporal variations in viral abundance were inversely correlated with seasonal changes in temperature and absolute humidity. Metagenomic analysis of air viromes amplified by rolling-circle phi29 polymerase-based random hexamer priming indicated the dominance of plant-associated single-stranded DNA (ssDNA) geminivirus-related viruses, followed by animal-infecting circovirus-related sequences, with low numbers of nanoviruses and microphages-related genomes. Particularly, the majority of the geminivirus-related viruses were closely related to ssDNA mycoviruses that infect plant-pathogenic fungi. Phylogenetic analysis based on the replication initiator protein sequence indicated that the airborne ssDNA viruses were distantly related to known ssDNA viruses, suggesting that a high diversity of viruses were newly discovered. This research is the first to report the seasonality of airborne viruses and their genetic diversity, which enhances our understanding of viral ecology in temperate regions.
Metagenomic Characterization of Airborne Viral DNA Diversity in the Near-Surface Atmosphere
Whon, Tae Woong; Kim, Min-Soo; Roh, Seong Woon; Shin, Na-Ri; Lee, Hae-Won
2012-01-01
Airborne viruses are expected to be ubiquitous in the atmosphere but they still remain poorly understood. This study investigated the temporal and spatial dynamics of airborne viruses and their genotypic characteristics in air samples collected from three distinct land use types (a residential district [RD], a forest [FR], and an industrial complex [IC]) and from rainwater samples freshly precipitated at the RD site (RD-rain). Viral abundance exhibited a seasonal fluctuation in the range between 1.7 × 106 and 4.0 × 107 viruses m−3, which increased from autumn to winter and decreased toward spring, but no significant spatial differences were observed. Temporal variations in viral abundance were inversely correlated with seasonal changes in temperature and absolute humidity. Metagenomic analysis of air viromes amplified by rolling-circle phi29 polymerase-based random hexamer priming indicated the dominance of plant-associated single-stranded DNA (ssDNA) geminivirus-related viruses, followed by animal-infecting circovirus-related sequences, with low numbers of nanoviruses and microphages-related genomes. Particularly, the majority of the geminivirus-related viruses were closely related to ssDNA mycoviruses that infect plant-pathogenic fungi. Phylogenetic analysis based on the replication initiator protein sequence indicated that the airborne ssDNA viruses were distantly related to known ssDNA viruses, suggesting that a high diversity of viruses were newly discovered. This research is the first to report the seasonality of airborne viruses and their genetic diversity, which enhances our understanding of viral ecology in temperate regions. PMID:22623790
Enlightenment of Yeast Mitochondrial Homoplasmy: Diversified Roles of Gene Conversion
Ling, Feng; Mikawa, Tsutomu; Shibata, Takehiko
2011-01-01
Mitochondria have their own genomic DNA. Unlike the nuclear genome, each cell contains hundreds to thousands of copies of mitochondrial DNA (mtDNA). The copies of mtDNA tend to have heterogeneous sequences, due to the high frequency of mutagenesis, but are quickly homogenized within a cell (“homoplasmy”) during vegetative cell growth or through a few sexual generations. Heteroplasmy is strongly associated with mitochondrial diseases, diabetes and aging. Recent studies revealed that the yeast cell has the machinery to homogenize mtDNA, using a common DNA processing pathway with gene conversion; i.e., both genetic events are initiated by a double-stranded break, which is processed into 3′ single-stranded tails. One of the tails is base-paired with the complementary sequence of the recipient double-stranded DNA to form a D-loop (homologous pairing), in which repair DNA synthesis is initiated to restore the sequence lost by the breakage. Gene conversion generates sequence diversity, depending on the divergence between the donor and recipient sequences, especially when it occurs among a number of copies of a DNA sequence family with some sequence variations, such as in immunoglobulin diversification in chicken. MtDNA can be regarded as a sequence family, in which the members tend to be diversified by a high frequency of spontaneous mutagenesis. Thus, it would be interesting to determine why and how double-stranded breakage and D-loop formation induce sequence homogenization in mitochondria and sequence diversification in nuclear DNA. We will review the mechanisms and roles of mtDNA homoplasmy, in contrast to nuclear gene conversion, which diversifies gene and genome sequences, to provide clues toward understanding how the common DNA processing pathway results in such divergent outcomes. PMID:24710143
"First generation" automated DNA sequencing technology.
Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M
2011-10-01
Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
Maggert, Keith A.
2014-01-01
The ribosomal DNA (rDNA) arrays are causal agents in X-Y chromosome pairing in meiosis I of Drosophila males. Despite broad variation in X-linked and Y-linked rDNA copy number, polymorphisms in regulatory/spacer sequences between rRNA genes, and variance in copy number of interrupting R1 and R2 retrotransposable elements, there is little evidence that different rDNA arrays affect pairing efficacy. I investigated whether induced rDNA copy number polymorphisms affect chromosome pairing in a “competitive” situation in which complex pairing configurations were possible using males with XYY constitution. Using a common normal X chromosome, one of two different full-length Y chromosomes, and a third chromosome from a series of otherwise-isogenic rDNA deletions, I detected no differences in X-Y or Y-Y pairing or chromosome segregation frequencies that could not be attributed to random variation alone. This work was performed in the context of an undergraduate teaching program at Texas A&M University, and I discuss the pedagogical utility of this and other such experiments. PMID:24449686
Sun, Qinghui; Ba, Zhaofen; Wu, Guoying; Wang, Wei; Lin, Shuxiang; Yang, Hongjiang
2016-05-01
Carbapenem resistance mechanisms were investigated in 32 imipenem-resistant Pseudomonas aeruginosa clinical isolates recovered from hospitalised children. Sequence analysis revealed that 31 of the isolates had an insertion sequence element ISRP10 disrupting the porin gene oprD, demonstrating that ISRP10 inactivation of oprD conferred imipenem resistance in the majority of the isolates. Multilocus sequence typing (MLST) was used to discriminate the isolates. In total, 11 sequence types (STs) were identified including 3 novel STs, and 68.3% (28/41) of the tested strains were characterised as clone ST253. In combination with random amplified polymorphic DNA (RAPD) analysis, the imipenem-resistant isolates displayed a relatively high degree of genetic variability and were unlikely associated with nosocomial infections. Copyright © 2016 Elsevier B.V. and the International Society of Chemotherapy. All rights reserved.
Wu, Jia Qian; Du, Jiang; Rozowsky, Joel; Zhang, Zhengdong; Urban, Alexander E; Euskirchen, Ghia; Weissman, Sherman; Gerstein, Mark; Snyder, Michael
2008-01-03
Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced. We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins. We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.
Influence of DNA sequence on the structure of minicircles under torsional stress
Wang, Qian; Irobalieva, Rossitza N.; Chiu, Wah; Schmid, Michael F.; Fogg, Jonathan M.; Zechiedrich, Lynn
2017-01-01
Abstract The sequence dependence of the conformational distribution of DNA under various levels of torsional stress is an important unsolved problem. Combining theory and coarse-grained simulations shows that the DNA sequence and a structural correlation due to topology constraints of a circle are the main factors that dictate the 3D structure of a 336 bp DNA minicircle under torsional stress. We found that DNA minicircle topoisomers can have multiple bend locations under high torsional stress and that the positions of these sharp bends are determined by the sequence, and by a positive mechanical correlation along the sequence. We showed that simulations and theory are able to provide sequence-specific information about individual DNA minicircles observed by cryo-electron tomography (cryo-ET). We provided a sequence-specific cryo-ET tomogram fitting of DNA minicircles, registering the sequence within the geometric features. Our results indicate that the conformational distribution of minicircles under torsional stress can be designed, which has important implications for using minicircle DNA for gene therapy. PMID:28609782
Importance Sampling of Word Patterns in DNA and Protein Sequences
Chan, Hock Peng; Chen, Louis H.Y.
2010-01-01
Abstract Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: palindromes and inverted repeats, patterns arising from position-specific weight matrices (PSWMs), and co-occurrences of pairs of motifs. PMID:21128856
Analysis of DNA Sequences by an Optical Time-Integrating Correlator: Proof-of-Concept Experiments.
1992-05-01
DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0 CUSTOM GENERATORS FOR DNA SEQUENCES 10 3.1 Hardware Design 10...of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5 Figure 4: Coarse analysis of a DNA sequence. 7 Figure 5: Fine...a 20-bases long database. 32 xiii LIST OF TABLES PAGE Table 1: Short representations of the DNA bases where each base is represented by 7-bits long
Comparing viral metagenomics methods using a highly multiplexed human viral pathogens reagent
Li, Linlin; Deng, Xutao; Mee, Edward T.; Collot-Teixeira, Sophie; Anderson, Rob; Schepelmann, Silke; Minor, Philip D.; Delwart, Eric
2014-01-01
Unbiased metagenomic sequencing holds significant potential as a diagnostic tool for the simultaneous detection of any previously genetically described viral nucleic acids in clinical samples. Viral genome sequences can also inform on likely phenotypes including drug susceptibility or neutralization serotypes. In this study, different variables of the laboratory methods often used to generate viral metagenomics libraries on the efficiency of viral detection and virus genome coverage were compared. A biological reagent consisting of 25 different human RNA and DNA viral pathogens was used to estimate the effect of filtration and nuclease digestion, DNA/RNA extraction methods, pre-amplification and the use of different library preparation kits on the detection of viral nucleic acids. Filtration and nuclease treatment led to slight decreases in the percentage of viral sequence reads and number of viruses detected. For nucleic acid extractions silica spin columns improved viral sequence recovery relative to magnetic beads and Trizol extraction. Pre-amplification using random RT-PCR while generating more viral sequence reads resulted in detection of fewer viruses, more overlapping sequences, and lower genome coverage. The ScriptSeq library preparation method retrieved more viruses and a greater fraction of their genomes than the TruSeq and Nextera methods. Viral metagenomics sequencing was able to simultaneously detect up to 22 different viruses in the biological reagent analyzed including all those detected by qPCR. Further optimization will be required for the detection of viruses in biologically more complex samples such as tissues, blood, or feces. PMID:25497414
Sun, Cheng; Wyngaard, Grace; Walton, D Brian; Wichman, Holly A; Mueller, Rachel Lockridge
2014-03-11
Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution--some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 - 75 Gb, 12-74 Gb of which are lost from pre-somatic cell lineages at germline--soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms.
2014-01-01
Background Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution — some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 – 75 Gb, 12–74 Gb of which are lost from pre-somatic cell lineages at germline – soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Results Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Conclusions Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms. PMID:24618421
Rajendram, D; Ayenza, R; Holder, F M; Moran, B; Long, T; Shah, H N
2006-12-01
We assessed the potential use of Whatman FTA paper as a device for archiving and long-term storage of bacterial cell suspensions of over 400 bacterial strains representing 61 genera, the molecular applications of immobilised DNA on FTA paper, and tested its microbial inactivation properties. The FTA paper extracted bacterial DNA is of sufficiently high quality to successfully carryout the molecular detection of several key genes including 16S rRNA, esp (Enterococcus surface protein), Bft (Bacteroides fragilis enterotoxin) and por (porin protein) by PCR and for DNA fingerprinting by random amplified polymorphic DNA-PCR (RAPD-PCR). To test the long-term stability of the FTA immobilised DNA, 100 of the 400 archived bacterial samples were randomly selected following 3 years of storage at ambient temperature and PCR amplification was used to monitor its success. All of the 100 samples were successfully amplified using the 16S rDNA gene as a target and confirmed by DNA sequencing. Furthermore, the DNA was eluted into solution from the FTA cards using a new alkaline elution procedure for evaluation by real-time PCR-based assays. The viability of cells retained on the FTA cards varied among broad groups of bacteria. For the more fragile gram-negative species, no viable cells were retained even at high cell densities of between 10(7) and 10(8) colony forming units (cfu) ml(-1), and for the most robust species such as spore-formers and acid-fast bacteria, complete inactivation was achieved at cell densities ranging between 10(1) and 10(4) cfu ml(-1). The inactivation of bacterial cells on FTA cards suggest that this is a safe medium for the storage and transport of bacterial nucleic acids.
Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo
2014-01-01
Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could access information from all organisms in a biological system without explicit genomic information. The Memory protocol has high potential for many applications, including in situ biomonitoring of ecosystems, screening for diseases, biosensing of pathological features in water and food supplies, and non-biological information processing of memory devices, among many.
Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting
NASA Astrophysics Data System (ADS)
Chen, C. H. Winston; Taranenko, N. I.; Zhu, Y. F.; Chung, C. N.; Allman, S. L.
1997-05-01
Since laser mass spectrometry has the potential for achieving very fast DNA analysis, we recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Sanger's enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. Our preliminary results indicate laser mass spectrometry can possible be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, we applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.
Colombo, M M; Swanton, M T; Donini, P; Prescott, D M
1984-01-01
Oxytricha nova is a hypotrichous ciliate with micronuclei and macronuclei. Micronuclei, which contain large, chromosomal-sized DNA, are genetically inert but undergo meiosis and exchange during cell mating. Macronuclei, which contain only small, gene-sized DNA molecules, provide all of the nuclear RNA needed to run the cell. After cell mating the macronucleus is derived from a micronucleus, a derivation that includes excision of the genes from chromosomes and elimination of the remaining DNA. The eliminated DNA includes all of the repetitious sequences and approximately 95% of the unique sequences. We cloned large restriction fragments from the micronucleus that confer replication ability on a replication-deficient plasmid in Saccharomyces cerevisiae. Sequences that confer replication ability are called autonomously replicating sequences. The frequency and effectiveness of autonomously replicating sequences in micronuclear DNA are similar to those reported for DNAs of other organisms introduced into yeast cells. Of the 12 micronuclear fragments with autonomously replicating sequence activity, 9 also showed homology to macronuclear DNA, indicating that they contain a macronuclear gene sequence. We conclude from this that autonomously replicating sequence activity is nonrandomly distributed throughout micronuclear DNA and is preferentially associated with those regions of micronuclear DNA that contain genes. Images PMID:6092934
DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation
Boedicker, James Q.; Garcia, Hernan G.; Johnson, Stephanie; Phillips, Rob
2014-01-01
As the chief informational molecule of life, DNA is subject to extensive physical manipulations. The energy required to deform double-helical DNA depends on sequence, and this mechanical code of DNA influences gene regulation, such as through nucleosome positioning. Here we examine the sequence-dependent flexibility of DNA in bacterial transcription factor-mediated looping, a context for which the role of sequence remains poorly understood. Using a suite of synthetic constructs repressed by the Lac repressor and two well-known sequences that show large flexibility differences in vitro, we make precise statistical mechanical predictions as to how DNA sequence influences loop formation and test these predictions using in vivo transcription and in vitro single-molecule assays. Surprisingly, sequence-dependent flexibility does not affect in vivo gene regulation. By theoretically and experimentally quantifying the relative contributions of sequence and the DNA-bending protein HU to DNA mechanical properties, we reveal that bending by HU dominates DNA mechanics and masks intrinsic sequence-dependent flexibility. Such a quantitative understanding of how mechanical regulatory information is encoded in the genome will be a key step towards a predictive understanding of gene regulation at single-base pair resolution. PMID:24231252
El-Sherry, Shiem; Ogedengbe, Mosun E; Hafeez, Mian A; Barta, John R
2013-07-01
Multiple 18S rDNA sequences were obtained from two single-oocyst-derived lines of each of Eimeria meleagrimitis and Eimeria adenoeides. After analysing the 15 new 18S rDNA sequences from two lines of E. meleagrimitis and 17 new sequences from two lines of E. adenoeides, there were clear indications that divergent, paralogous 18S rDNA copies existed within the nuclear genome of E. meleagrimitis. In contrast, mitochondrial cytochrome c oxidase subunit I (COI) partial sequences from all lines of a particular Eimeria sp. were identical and, in phylogenetic analyses, COI sequences clustered unambiguously in monophyletic and highly-supported clades specific to individual Eimeria sp. Phylogenetic analysis of the new 18S rDNA sequences from E. meleagrimitis showed that they formed two distinct clades: Type A with four new sequences; and Type B with nine new sequences; both Types A and B sequences were obtained from each of the single-oocyst-derived lines of E. meleagrimitis. Together these rDNA types formed a well-supported E. meleagrimitis clade. Types A and B 18S rDNA sequences from E. meleagrimitis had a mean sequence identity of only 97.4% whereas mean sequence identity within types was 99.1-99.3%. The observed intraspecific sequence divergence among E. meleagrimitis 18S rDNA sequence types was even higher (approximately 2.6%) than the interspecific sequence divergence present between some well-recognized species such as Eimeria tenella and Eimeria necatrix (1.1%). Our observations suggest that, unlike COI sequences, 18S rDNA sequences are not reliable molecular markers to be used alone for species identification with coccidia, although 18S rDNA sequences have clear utility for phylogenetic reconstruction of apicomplexan parasites at the genus and higher taxonomic ranks. Copyright © 2013. Published by Elsevier Ltd.
Improved deoxyribozymes for synthesis of covalently branched DNA and RNA.
Lee, Christine S; Mui, Timothy P; Silverman, Scott K
2011-01-01
A covalently branched nucleic acid can be synthesized by joining the 2'-hydroxyl of the branch-site ribonucleotide of a DNA or RNA strand to the activated 5'-phosphorus of a separate DNA or RNA strand. We have previously used deoxyribozymes to synthesize several types of branched nucleic acids for experiments in biotechnology and biochemistry. Here, we report in vitro selection experiments to identify improved deoxyribozymes for synthesis of branched DNA and RNA. Each of the new deoxyribozymes requires Mn²(+) as a cofactor, rather than Mg²(+) as used by our previous branch-forming deoxyribozymes, and each has an initially random region of 40 rather than 22 or fewer combined nucleotides. The deoxyribozymes all function by forming a three-helix-junction (3HJ) complex with their two oligonucleotide substrates. For synthesis of branched DNA, the best new deoxyribozyme, 8LV13, has k(obs) on the order of 0.1 min⁻¹, which is about two orders of magnitude faster than our previously identified 15HA9 deoxyribozyme. 8LV13 also functions at closer-to-neutral pH than does 15HA9 (pH 7.5 versus 9.0) and has useful tolerance for many DNA substrate sequences. For synthesis of branched RNA, two new deoxyribozymes, 8LX1 and 8LX6, were identified with broad sequence tolerances and substantial activity at pH 7.5, versus pH 9.0 for many of our previous deoxyribozymes that form branched RNA. These experiments provide new, and in key aspects improved, practical catalysts for preparation of synthetic branched DNA and RNA.
Primer-Free Aptamer Selection Using A Random DNA Library
Pan, Weihua; Xin, Ping; Patrick, Susan; Dean, Stacey; Keating, Christine; Clawson, Gary
2010-01-01
Aptamers are highly structured oligonucleotides (DNA or RNA) that can bind to targets with affinities comparable to antibodies 1. They are identified through an in vitro selection process called Systematic Evolution of Ligands by EXponential enrichment (SELEX) to recognize a wide variety of targets, from small molecules to proteins and other macromolecules 2-4. Aptamers have properties that are well suited for in vivo diagnostic and/or therapeutic applications: Besides good specificity and affinity, they are easily synthesized, survive more rigorous processing conditions, they are poorly immunogenic, and their relatively small size can result in facile penetration of tissues. Aptamers that are identified through the standard SELEX process usually comprise ~80 nucleotides (nt), since they are typically selected from nucleic acid libraries with ~40 nt long randomized regions plus fixed primer sites of ~20 nt on each side. The fixed primer sequences thus can comprise nearly ~50% of the library sequences, and therefore may positively or negatively compromise identification of aptamers in the selection process 3, although bioinformatics approaches suggest that the fixed sequences do not contribute significantly to aptamer structure after selection 5. To address these potential problems, primer sequences have been blocked by complementary oligonucleotides or switched to different sequences midway during the rounds of SELEX 6, or they have been trimmed to 6-9 nt 7, 8. Wen and Gray 9 designed a primer-free genomic SELEX method, in which the primer sequences were completely removed from the library before selection and were then regenerated to allow amplification of the selected genomic fragments. However, to employ the technique, a unique genomic library has to be constructed, which possesses limited diversity, and regeneration after rounds of selection relies on a linear reamplification step. Alternatively, efforts to circumvent problems caused by fixed primer sequences using high efficiency partitioning are met with problems regarding PCR amplification 10. We have developed a primer-free (PF) selection method that significantly simplifies SELEX procedures and effectively eliminates primer-interference problems 11, 12. The protocols work in a straightforward manner. The central random region of the library is purified without extraneous flanking sequences and is bound to a suitable target (for example to a purified protein or complex mixtures such as cell lines). Then the bound sequences are obtained, reunited with flanking sequences, and re-amplified to generate selected sub-libraries. As an example, here we selected aptamers to S100B, a protein marker for melanoma. Binding assays showed Kd s in the 10-7 - 10-8 M range after a few rounds of selection, and we demonstrate that the aptamers function effectively in a sandwich binding format. PMID:20689511
Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang
2018-01-05
DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.
Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang
2018-01-01
DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html PMID:29416743
NASA Astrophysics Data System (ADS)
Rybenkov, Valentin V.
2016-09-01
The ability of living systems to defy thermodynamics without explicitly violating it is a continued source of inspiration to many biophysicists. The story of type-2 DNA topoisomerases is a beautiful example from that book. DNA topoisomerases catalyze a concerted DNA cleavage-religation reaction, which is interjected by a strand passage event. This sequence of events results in a seemingly unhindered transfer of one piece of DNA through another upon their random collision. An obvious consequence of such transfer is a change in the topological state of the colliding DNAs; hence the name of the enzymes, topoisomerases. There are several classes of topoisomerases, which differ in how they capture the cleaved and transported DNA segments (which are often referred to as the gate and transfer segments; or the G- and T-segments, to be short). Type-2 topoisomerases have two cleavage-religation centers. They open a gate in double stranded DNA and transfer another piece of double stranded DNA through it [1]. And in doing so, they manage to collect information about the rest of the DNA and perform strand passage in a directional manner so as to take the molecule away from the thermodynamic equilibrium [2].
Tripathi, Pooja; Muth, Theodore R.
2017-01-01
Agrobacterium tumefaciens mediated T-DNA integration is a common tool for plant genome manipulation. However, there is controversy regarding whether T-DNA integration is biased towards genes or randomly distributed throughout the genome. In order to address this question, we performed high-throughput mapping of T-DNA-genome junctions obtained in the absence of selection at several time points after infection. T-DNA-genome junctions were detected as early as 6 hours post-infection. T-DNA distribution was apparently uniform throughout the chromosomes, yet local biases toward AT-rich motifs and T-DNA border sequence micro-homology were detected. Analysis of the epigenetic landscape of previously isolated sites of T-DNA integration in Kanamycin-selected transgenic plants showed an association with extremely low methylation and nucleosome occupancy. Conversely, non-selected junctions from this study showed no correlation with methylation and had chromatin marks, such as high nucleosome occupancy and high H3K27me3, that correspond to three-dimensional-interacting heterochromatin islands embedded within euchromatin. Such structures may play a role in capturing and silencing invading T-DNA. PMID:28742090
Shah, Kushani; Thomas, Shelby; Stein, Arnold
2013-01-01
In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C Sanger sequencing reactions. They prepare and run the gels, perform Southern blots (which require only 10 min), and detect sequencing ladders using a colorimetric detection system. Students enlarge their sequencing ladders from digital images of their small nylon membranes, and read the sequence manually. They compare their reads with the actual DNA sequence using BLAST2. After mastering the DNA sequencing system, students prepare their own DNA from a cheek swab, polymerase chain reaction-amplify a region of their DNA that encompasses a SNP of interest, and perform sequencing to determine their genotype at the SNP position. A family pedigree can also be constructed. The SNP chosen by the instructor was rs17822931, which is in the ABCC11 gene and is the determinant of human earwax type. Genotypes at the rs178229931 site vary in different ethnic populations. © 2013 by The International Union of Biochemistry and Molecular Biology.
Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas
2009-06-01
The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.
Single-Stranded Condensation Stochastically Blocks G-Quadruplex Assembly in Human Telomeric RNA.
Gutiérrez, Irene; Garavís, Miguel; de Lorenzo, Sara; Villasante, Alfredo; González, Carlos; Arias-Gonzalez, J Ricardo
2018-05-17
TERRA is an RNA molecule transcribed from human subtelomeric regions toward chromosome ends potentially involved in regulation of heterochromatin stability, semiconservative replication, and telomerase inhibition, among others. TERRA contains tandem repeats of the sequence GGGUUA, with a strong tendency to fold into a four-stranded arrangement known as a parallel G-quadruplex. Here, we demonstrate by using single-molecule force spectroscopy that this potential is limited by the inherent capacity of RNA to self-associate randomly and further condense into entropically more favorable structures. We stretched RNA constructions with more than four and less than eight hexanucleotide repeats, thus unable to form several G-quadruplexes in tandem, flanked by non-G-rich overhangs of random sequence by optical tweezers on a one by one basis. We found that condensed RNA stochastically blocks G-quadruplex folding pathways with a near 20% probability, a behavior that is not found in DNA analogous molecules.
Entropy of finite random binary sequences with weak long-range correlations.
Melnik, S S; Usatenko, O V
2014-11-01
We study the N-step binary stationary ergodic Markov chain and analyze its differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain through the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses the two-point correlators instead of the block probability, it makes it possible to calculate the entropy of strings at much longer distances than using standard methods. A fluctuation contribution to the entropy due to finiteness of random chains is examined. This contribution can be of the same order as its regular part even at the relatively short lengths of subsequences. A self-similar structure of entropy with respect to the decimation transformations is revealed for some specific forms of the pair correlation function. Application of the theory to the DNA sequence of the R3 chromosome of Drosophila melanogaster is presented.
Entropy of finite random binary sequences with weak long-range correlations
NASA Astrophysics Data System (ADS)
Melnik, S. S.; Usatenko, O. V.
2014-11-01
We study the N -step binary stationary ergodic Markov chain and analyze its differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain through the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses the two-point correlators instead of the block probability, it makes it possible to calculate the entropy of strings at much longer distances than using standard methods. A fluctuation contribution to the entropy due to finiteness of random chains is examined. This contribution can be of the same order as its regular part even at the relatively short lengths of subsequences. A self-similar structure of entropy with respect to the decimation transformations is revealed for some specific forms of the pair correlation function. Application of the theory to the DNA sequence of the R3 chromosome of Drosophila melanogaster is presented.
Holes influence the mutation spectrum of human mitochondrial DNA
NASA Astrophysics Data System (ADS)
Villagran, Martha; Miller, John
Mutations drive evolution and disease, showing highly non-random patterns of variant frequency vs. nucleotide position. We use computational DNA hole spectroscopy [M.Y. Suarez-Villagran & J.H. Miller, Sci. Rep. 5, 13571 (2015)] to reveal sites of enhanced hole probability in selected regions of human mitochondrial DNA. A hole is a mobile site of positive charge created when an electron is removed, for example by radiation or contact with a mutagenic agent. The hole spectra are quantum mechanically computed using a two-stranded tight binding model of DNA. We observe significant correlation between spectra of hole probabilities and of genetic variation frequencies from the MITOMAP database. These results suggest that hole-enhanced mutation mechanisms exert a substantial, perhaps dominant, influence on mutation patterns in DNA. One example is where a trapped hole induces a hydrogen bond shift, known as tautomerization, which then triggers a base-pair mismatch during replication. Our results deepen overall understanding of sequence specific mutation rates, encompassing both hotspots and cold spots, which drive molecular evolution.
The span of correlations in dolphin whistle sequences
NASA Astrophysics Data System (ADS)
Ferrer-i-Cancho, Ramon; McCowan, Brenda
2012-06-01
Long-range correlations are found in symbolic sequences from human language, music and DNA. Determining the span of correlations in dolphin whistle sequences is crucial for shedding light on their communicative complexity. Dolphin whistles share various statistical properties with human words, i.e. Zipf's law for word frequencies (namely that the probability of the ith most frequent word of a text is about i-α) and a parallel of the tendency of more frequent words to have more meanings. The finding of Zipf's law for word frequencies in dolphin whistles has been the topic of an intense debate on its implications. One of the major arguments against the relevance of Zipf's law in dolphin whistles is that it is not possible to distinguish the outcome of a die-rolling experiment from that of a linguistic or communicative source producing Zipf's law for word frequencies. Here we show that statistically significant whistle-whistle correlations extend back to the second previous whistle in the sequence, using a global randomization test, and to the fourth previous whistle, using a local randomization test. None of these correlations are expected by a die-rolling experiment and other simple explanations of Zipf's law for word frequencies, such as Simon's model, that produce sequences of unpredictable elements.
2013-01-01
Background Mitochondrial DNA (mtDNA) typing can be a useful aid for identifying people from compromised samples when nuclear DNA is too damaged, degraded or below detection thresholds for routine short tandem repeat (STR)-based analysis. Standard mtDNA typing, focused on PCR amplicon sequencing of the control region (HVS I and HVS II), is limited by the resolving power of this short sequence, which misses up to 70% of the variation present in the mtDNA genome. Methods We used in-solution hybridisation-based DNA capture (using DNA capture probes prepared from modern human mtDNA) to recover mtDNA from post-mortem human remains in which the majority of DNA is both highly fragmented (<100 base pairs in length) and chemically damaged. The method ‘immortalises’ the finite quantities of DNA in valuable extracts as DNA libraries, which is followed by the targeted enrichment of endogenous mtDNA sequences and characterisation by next-generation sequencing (NGS). Results We sequenced whole mitochondrial genomes for human identification from samples where standard nuclear STR typing produced only partial profiles or demonstrably failed and/or where standard mtDNA hypervariable region sequences lacked resolving power. Multiple rounds of enrichment can substantially improve coverage and sequencing depth of mtDNA genomes from highly degraded samples. The application of this method has led to the reliable mitochondrial sequencing of human skeletal remains from unidentified World War Two (WWII) casualties approximately 70 years old and from archaeological remains (up to 2,500 years old). Conclusions This approach has potential applications in forensic science, historical human identification cases, archived medical samples, kinship analysis and population studies. In particular the methodology can be applied to any case, involving human or non-human species, where whole mitochondrial genome sequences are required to provide the highest level of maternal lineage discrimination. Multiple rounds of in-solution hybridisation-based DNA capture can retrieve whole mitochondrial genome sequences from even the most challenging samples. PMID:24289217
Read clouds uncover variation in complex regions of the human genome.
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-10-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.
Direct Detection and Sequencing of Damaged DNA Bases
2011-01-01
Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template - as a by-product of the sequencing method - through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications. PMID:22185597
Direct detection and sequencing of damaged DNA bases.
Clark, Tyson A; Spittle, Kristi E; Turner, Stephen W; Korlach, Jonas
2011-12-20
Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template - as a by-product of the sequencing method - through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications.
A comprehensive list of cloned human DNA sequences
Schmidtke, Jörg; Cooper, David N.
1987-01-01
A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:3575113
A comprehensive list of cloned human DNA sequences
Schmidtke, Jörg; Cooper, David N.
1990-01-01
A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:2333227
A comprehensive list of cloned human DNA sequences
Schmidtke, Jörg; Cooper, David N.
1988-01-01
A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:3368330
A comprehensive list of cloned human DNA sequences
Schmidtke, Jörg; Cooper, David N.
1989-01-01
A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:2654889
Theoretical modeling of masking DNA application in aptamer-facilitated biomarker discovery.
Cherney, Leonid T; Obrecht, Natalia M; Krylov, Sergey N
2013-04-16
In aptamer-facilitated biomarker discovery (AptaBiD), aptamers are selected from a library of random DNA (or RNA) sequences for their ability to specifically bind cell-surface biomarkers. The library is incubated with intact cells, and cell-bound DNA molecules are separated from those unbound and amplified by the polymerase chain reaction (PCR). The partitioning/amplification cycle is repeated multiple times while alternating target cells and control cells. Efficient aptamer selection in AptaBiD relies on the inclusion of masking DNA within the cell and library mixture. Masking DNA lacks primer regions for PCR amplification and is typically taken in excess to the library. The role of masking DNA within the selection mixture is to outcompete any nonspecific binding sequences within the initial library, thus allowing specific DNA sequences (i.e., aptamers) to be selected more efficiently. Efficient AptaBiD requires an optimum ratio of masking DNA to library DNA, at which aptamers still bind specific binding sites but nonaptamers within the library do not bind nonspecific binding sites. Here, we have developed a mathematical model that describes the binding processes taking place within the equilibrium mixture of masking DNA, library DNA, and target cells. An obtained mathematical solution allows one to estimate the concentration of masking DNA that is required to outcompete the library DNA at a desirable ratio of bound masking DNA to bound library DNA. The required concentration depends on concentrations of the library and cells as well as on unknown cell characteristics. These characteristics include the concentration of total binding sites on the cell surface, N, and equilibrium dissociation constants, K(nsL) and K(nsM), for nonspecific binding of the library DNA and masking DNA, respectively. We developed a theory that allows the determination of N, K(nsL), and K(nsM) based on measurements of EC50 values for cells mixed separately with the library and masking DNA (EC50 is the concentration of fluorescently labeled DNA at which half of the maximum fluorescence signal from DNA-bound cells is reached). We also obtained expressions for signals from bound DNA (measured by flow cytometry) in terms of N, K(nsL), and K(nsM). These expressions can be used for the verification of N, K(nsL), and K(nsM) values found from EC50 measurements. The developed procedure was applied to MCF-7 breast cancer cells, and corresponding values of N, K(nsL), and K(nsM) were established for the first time. The concentration of masking DNA required for AptaBiD with MCF-7 breast cancer cells was also estimated.
Li, Leilei; Wieme, Anneleen; Spitaels, Freek; Balzarini, Tom; Nunes, Olga C; Manaia, Célia M; Van Landschoot, Anita; De Vuyst, Luc; Cleenwerck, Ilse; Vandamme, Peter
2014-07-01
Five acetic acid bacteria isolates, awK9_3, awK9_4 ( = LMG 27543), awK9_5 ( = LMG 28092), awK9_6 and awK9_9, obtained during a study of micro-organisms present in traditionally produced kefir, were grouped on the basis of their MALDI-TOF MS profile with LMG 1530 and LMG 1531(T), two strains currently classified as members of the genus Acetobacter. Phylogenetic analysis based on nearly complete 16S rRNA gene sequences as well as on concatenated partial sequences of the housekeeping genes dnaK, groEL and rpoB indicated that these isolates were representatives of a single novel species together with LMG 1530 and LMG 1531(T) in the genus Acetobacter, with Acetobacter aceti, Acetobacter nitrogenifigens, Acetobacter oeni and Acetobacter estunensis as nearest phylogenetic neighbours. Pairwise similarity of 16S rRNA gene sequences between LMG 1531(T) and the type strains of the above-mentioned species were 99.7%, 99.1%, 98.4% and 98.2%, respectively. DNA-DNA hybridizations confirmed that status, while amplified fragment length polymorphism (AFLP) and random amplified polymorphic DNA (RAPD) data indicated that LMG 1531(T), LMG 1530, LMG 27543 and LMG 28092 represent at least two different strains of the novel species. The major fatty acid of LMG 1531(T) and LMG 27543 was C18 : 1ω7c. The major ubiquinone present was Q-9 and the DNA G+C contents of LMG 1531(T) and LMG 27543 were 58.3 and 56.7 mol%, respectively. The strains were able to grow on D-fructose and D-sorbitol as a single carbon source. They were also able to grow on yeast extract with 30% D-glucose and on standard medium with pH 3.6 or containing 1% NaCl. They had a weak ability to produce acid from d-arabinose. These features enabled their differentiation from their nearest phylogenetic neighbours. The name Acetobacter sicerae sp. nov. is proposed with LMG 1531(T) ( = NCIMB 8941(T)) as the type strain. © 2014 IUMS.
Parvari, R; Avivi, A; Lentner, F; Ziv, E; Tel-Or, S; Burstein, Y; Schechter, I
1988-03-01
cDNA clones encoding the variable and constant regions of chicken immunoglobulin (Ig) gamma-chains were obtained from spleen cDNA libraries. Southern blots of kidney DNA show that the variable region sequences of eight cDNA clones reveal the same set of bands corresponding to approximately 30 cross-hybridizing VH genes of one subgroup. Since the VH clones were randomly selected, it is likely that the bulk of chicken H-chains are encoded by a single VH subgroup. Nucleotide sequence determinations of two cDNA clones reveal VH, D, JH and the constant region. The VH segments are closely related to each other (83% homology) as expected for VH or the same subgroup. The JHs are 15 residues long and differ by one amino acid. The Ds differ markedly in sequence (20% homology) and size (10 and 20 residues). These findings strongly indicate multiple (at least two) D genes which by a combinatorial joining mechanism diversify the H-chains, a mechanism which is not operative in the chicken L-chain locus. The most notable among the chicken Igs is the so-called 7S IgG because its H-chain differs in many important aspects from any mammalian IgG. The sequence of the C gamma cDNA reported here resolves this issue. The chicken C gamma is 426 residues long with four CH domains (unlike mammalian C gamma which has three CH domains) and it shows 25% homology to the chicken C mu. The chicken C gamma is most related to the mammalian C epsilon in length, the presence of four CH domains and the distribution of cysteines in the CH1 and CH2 domains. We propose that the unique chicken C gamma is the ancestor of the mammalian C epsilon and C gamma subclasses, and discuss the evolution of the H-chain locus from that of chicken with presumably three genes (mu, gamma, alpha) to the mammalian loci with 8-10 H-chain genes.
Silicene nanoribbon as a new DNA sequencing device
NASA Astrophysics Data System (ADS)
Alesheikh, Sara; Shahtahmassebi, Nasser; Roknabadi, Mahmood Rezaee; Pilevar Shahri, Raheleh
2018-02-01
The importance of applying DNA sequencing in different fields, results in looking for fast and cheap methods. Nanotechnology helps this development by introducing nanostructures used for DNA sequencing. In this work we study the interaction between zigzag silicene nanoribbon and DNA nucleobases using DFT and non equilibrium Green's function approach, to investigate the possibility of using zigzag silicene nanoribbons as a biosensor for DNA sequencing.
Sequence periodicity in nucleosomal DNA and intrinsic curvature.
Nair, T Murlidharan
2010-05-17
Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.
Assessing the Fidelity of Ancient DNA Sequences Amplified From Nuclear Genes
Binladen, Jonas; Wiuf, Carsten; Gilbert, M. Thomas P.; Bunce, Michael; Barnett, Ross; Larson, Greger; Greenwood, Alex D.; Haile, James; Ho, Simon Y. W.; Hansen, Anders J.; Willerslev, Eske
2006-01-01
To date, the field of ancient DNA has relied almost exclusively on mitochondrial DNA (mtDNA) sequences. However, a number of recent studies have reported the successful recovery of ancient nuclear DNA (nuDNA) sequences, thereby allowing the characterization of genetic loci directly involved in phenotypic traits of extinct taxa. It is well documented that postmortem damage in ancient mtDNA can lead to the generation of artifactual sequences. However, as yet no one has thoroughly investigated the damage spectrum in ancient nuDNA. By comparing clone sequences from 23 fossil specimens, recovered from environments ranging from permafrost to desert, we demonstrate the presence of miscoding lesion damage in both the mtDNA and nuDNA, resulting in insertion of erroneous bases during amplification. Interestingly, no significant differences in the frequency of miscoding lesion damage are recorded between mtDNA and nuDNA despite great differences in cellular copy numbers. For both mtDNA and nuDNA, we find significant positive correlations between total sequence heterogeneity and the rates of type 1 transitions (adenine → guanine and thymine → cytosine) and type 2 transitions (cytosine → thymine and guanine → adenine), respectively. Type 2 transitions are by far the most dominant and increase relative to those of type 1 with damage load. The results suggest that the deamination of cytosine (and 5-methyl cytosine) to uracil (and thymine) is the main cause of miscoding lesions in both ancient mtDNA and nuDNA sequences. We argue that the problems presented by postmortem damage, as well as problems with contamination from exogenous sources of conserved nuclear genes, allelic variation, and the reliance on single nucleotide polymorphisms, call for great caution in studies relying on ancient nuDNA sequences. PMID:16299392
[Current applications of high-throughput DNA sequencing technology in antibody drug research].
Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong
2012-03-01
Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.
DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.
Sucher, Nikolaus J; Hennell, James R; Carles, Maria C
2012-01-01
DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.
Mammalian DNA enriched for replication origins is enriched for snap-back sequences.
Zannis-Hadjopoulos, M; Kaufmann, G; Martin, R G
1984-11-15
Using the instability of replication loops as a method for the isolation of double-stranded nascent DNA, extruded DNA enriched for replication origins was obtained and denatured. Snap-back DNA, single-stranded DNA with inverted repeats (palindromic sequences), reassociates rapidly into stem-loop structures with zero-order kinetics when conditions are changed from denaturing to renaturing, and can be assayed by chromatography on hydroxyapatite. Origin-enriched nascent DNA strands from mouse, rat and monkey cells growing either synchronously or asynchronously were purified and assayed for the presence of snap-back sequences. The results show that origin-enriched DNA is also enriched for snap-back sequences, implying that some origins for mammalian DNA replication contain or lie near palindromic sequences.
Bmcystatin, a cysteine proteinase inhibitor characterized from the tick Boophilus microplus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lima, Cassia A.; Sasaki, Sergio D.; Tanaka, Aparecida S.
2006-08-18
The bovine tick Rhipicephalus (Boophilus) microplus is a blood-sucking animal, which is responsible for Babesia spp and Anaplasma marginale transmission for cattle. From a B. microplus fat body cDNA library, 465 selected clones were sequenced randomly and resulted in 60 Contigs. An open reading frame (ORF) contains 98 amino acids named Bmcystatin, due to 70% amino acid identity to a classical type 1 cystatin from Ixodes scapularis tick (GenBank Accession No. DQ066227). The Bmcystatin amino acid sequence analysis showed two cysteine residues, theoretical pI of 5.92 and M{sub r} of 11kDa. Bmcystatin gene was cloned in pET 26b vector andmore » the protein expressed using bacteria Escherichia coli BL21 SI. Recombinant Bmcystatin (rBmcystatin) purified by affinity chromatography on Ni-NTA-agarose column and ionic exchange chromatography on HiTrap Q column presented molecular mass of 11kDa, by SDS-PAGE and the N-terminal amino acid sequenced revealed unprocessed N-terminal containing part of pelB signal sequence. Purified rBmcystatin showed to be a C1 cysteine peptidase inhibitor with K{sub i} value of 0.1 and 0.6nM for human cathepsin L and VTDCE (vitellin degrading cysteine endopeptidase), respectively. The rBmcystatin expression analyzed by semi-quantitative RT-PCR confirmed the amplification of a specific DNA sequence (294bp) in the fat body and ovary cDNA preparation. On the other hand, a protein band was detected in the fat body, ovary, and the salivary gland extracts using anti-Bmcystatin antibody by Western blot. The present results suggest a possible role of Bmcystatin in the ovary, even though the gene was cloned from the fat body, which could be another site of this protein synthesis.« less
Bmcystatin, a cysteine proteinase inhibitor characterized from the tick Boophilus microplus.
Lima, Cassia A; Sasaki, Sergio D; Tanaka, Aparecida S
2006-08-18
The bovine tick Rhipicephalus (Boophilus) microplus is a blood-sucking animal, which is responsible for Babesia spp and Anaplasma marginale transmission for cattle. From a B. microplus fat body cDNA library, 465 selected clones were sequenced randomly and resulted in 60 Contigs. An open reading frame (ORF) contains 98 amino acids named Bmcystatin, due to 70% amino acid identity to a classical type 1 cystatin from Ixodes scapularis tick (GenBank Accession No. ). The Bmcystatin amino acid sequence analysis showed two cysteine residues, theoretical pI of 5.92 and M(r) of 11 kDa. Bmcystatin gene was cloned in pET 26b vector and the protein expressed using bacteria Escherichia coli BL21 SI. Recombinant Bmcystatin (rBmcystatin) purified by affinity chromatography on Ni-NTA-agarose column and ionic exchange chromatography on HiTrap Q column presented molecular mass of 11 kDa, by SDS-PAGE and the N-terminal amino acid sequenced revealed unprocessed N-terminal containing part of pelB signal sequence. Purified rBmcystatin showed to be a C1 cysteine peptidase inhibitor with K(i) value of 0.1 and 0.6 nM for human cathepsin L and VTDCE (vitellin degrading cysteine endopeptidase), respectively. The rBmcystatin expression analyzed by semi-quantitative RT-PCR confirmed the amplification of a specific DNA sequence (294 bp) in the fat body and ovary cDNA preparation. On the other hand, a protein band was detected in the fat body, ovary, and the salivary gland extracts using anti-Bmcystatin antibody by Western blot. The present results suggest a possible role of Bmcystatin in the ovary, even though the gene was cloned from the fat body, which could be another site of this protein synthesis.
Birky, C William
2013-01-01
Phylogenetic trees of DNA sequences of a group of specimens may include clades of two kinds: those produced by stochastic processes (random genetic drift) within a species, and clades that represent different species. The ratio of the mean pairwise sequence difference between a pair of clades (K) to the mean pairwise sequence difference within a clade (θ) can be used to determine whether the clades are samples from different species (K/θ ≥ 4) or the same species (K/θ<4) with probability ≥ 0.95. Previously I applied this criterion to delimit species of asexual organisms. Here I use data from the literature to show how it can also be applied to delimit sexual species using four groups of sexual organisms as examples: ravens, spotted leopards, sea butterflies, and liverworts. Mitochondrial or chloroplast genes are used because these segregate earlier during speciation than most nuclear genes and hence detect earlier stages of speciation. In several cases the K/θ ratio was greater than 4, confirming the original authors' intuition that the clades were sufficiently different to be assigned to different species. But the K/θ ratio split each of two liverwort species into two evolutionary species, and showed that support for the distinction between the common and Chihuahuan raven species is weak. I also discuss some possible sources of error in using the K/θ ratio; the most significant one would be cases where males migrate between different populations but females do not, making the use of maternally inherited organelle genes problematic. The K/θ ratio must be used with some caution, like all other methods for species delimitation. Nevertheless, it is a simple theory-based quantitative method for using DNA sequences to make rigorous decisions about species delimitation in sexual as well as asexual eukaryotes.
Bentley, L; Fehrsen, J; Jordaan, F; Huismans, H; du Plessis, D H
2000-04-01
VP2 is an outer capsid protein of African horsesickness virus (AHSV) and is recognized by serotype-discriminatory neutralizing antibodies. With the objective of locating its antigenic regions, a filamentous phage library was constructed that displayed peptides derived from the fragmentation of a cDNA copy of the gene encoding VP2. Peptides ranging in size from approximately 30 to 100 amino acids were fused with pIII, the attachment protein of the display vector, fUSE2. To ensure maximum diversity, the final library consisted of three sub-libraries. The first utilized enzymatically fragmented DNA encoding only the VP2 gene, the second included plasmid sequences, while the third included a PCR step designed to allow different peptide-encoding sequences to recombine before ligation into the vector. The resulting composite library was subjected to immunoaffinity selection with AHSV-specific polyclonal chicken IgY, polyclonal horse immunoglobulins and a monoclonal antibody (MAb) known to neutralize AHSV. Antigenic peptides were located by sequencing the DNA of phages bound by the antibodies. Most antigenic determinants capable of being mapped by this method were located in the N-terminal half of VP2. Important binding areas were mapped with high resolution by identifying the minimum overlapping areas of the selected peptides. The MAb was also used to screen a random 17-mer epitope library. Sequences that may be part of a discontinuous neutralization epitope were identified. The amino acid sequences of the antigenic regions on VP2 of serotype 3 were compared with corresponding regions on three other serotypes, revealing regions with the potential to discriminate AHSV serotypes serologically.
Bjørnsgaard Aas, Anders; Davey, Marie Louise; Kauserud, Håvard
2017-07-01
The formation of chimeric sequences can create significant methodological bias in PCR-based DNA metabarcoding analyses. During mixed-template amplification of barcoding regions, chimera formation is frequent and well documented. However, profiling of fungal communities typically uses the more variable rDNA region ITS. Due to a larger research community, tools for chimera detection have been developed mainly for the 16S/18S markers. However, these tools are widely applied to the ITS region without verification of their performance. We examined the rate of chimera formation during amplification and 454 sequencing of the ITS2 region from fungal mock communities of different complexities. We evaluated the chimera detecting ability of two common chimera-checking algorithms: perseus and uchime. Large proportions of the chimeras reported were false positives. No false negatives were found in the data set. Verified chimeras accounted for only 0.2% of the total ITS2 reads, which is considerably less than what is typically reported in 16S and 18S metabarcoding analyses. Verified chimeric 'parent sequences' had significantly higher per cent identity to one another than to random members of the mock communities. Community complexity increased the rate of chimera formation. GC content was higher around the verified chimeric break points, potentially facilitating chimera formation through base pair mismatching in the neighbouring regions of high similarity in the chimeric region. We conclude that the hypervariable nature of the ITS region seems to buffer the rate of chimera formation in comparison with other, less variable barcoding regions, due to shorter regions of high sequence similarity. © 2016 John Wiley & Sons Ltd.
Martino, Amanda J.; Rhodes, Matthew E.; Biddle, Jennifer F.; Brandt, Leah D.; Tomsho, Lynn P.; House, Christopher H.
2011-01-01
A degenerate polymerase chain reaction (PCR)-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples. While optimized here for use with Roche 454 technology, the general framework presented may be applicable to other next generation sequencing systems as well (e.g., Illumina, Ion Torrent). The method, which we have called random amplification metagenomic PCR (RAMP), involves the use of specific primers from Roche 454 amplicon sequencing, modified by the addition of a degenerate region at the 3′ end. It utilizes a PCR reaction, which resulted in no amplification from blanks, even after 50 cycles of PCR. After efforts to optimize experimental conditions, the method was tested with DNA extracted from cultured E. coli cells, and genome coverage was estimated after sequencing on three different occasions. Coverage did not vary greatly with the different experimental conditions tested, and was around 62% with a sequencing effort equivalent to a theoretical genome coverage of 14.10×. The GC content of the sequenced amplification product was within 2% of the predicted values for this strain of E. coli. The method was also applied to DNA extracted from marine subsurface samples from ODP Leg 201 site 1229 (Peru Margin), and results of a taxonomic analysis revealed microbial communities dominated by Proteobacteria, Chloroflexi, Firmicutes, Euryarchaeota, and Crenarchaeota, among others. These results were similar to those obtained previously for those samples; however, variations in the proportions of taxa identified illustrates well the generally accepted view that community analysis is sensitive to both the amplification technique used and the method of assigning sequences to taxonomic groups. Overall, we find that RAMP represents a valid methodology for amplifying metagenomes from low-biomass samples. PMID:22319519
Genome-wide analysis of Tol2 transposon reintegration in zebrafish.
Kondrychyn, Igor; Garcia-Lecea, Marta; Emelyanov, Alexander; Parinov, Sergey; Korzh, Vladimir
2009-09-08
Tol2, a member of the hAT family of transposons, has become a useful tool for genetic manipulation of model animals, but information about its interactions with vertebrate genomes is still limited. Furthermore, published reports on Tol2 have mainly been based on random integration of the transposon system after co-injection of a plasmid DNA harboring the transposon and a transposase mRNA. It is important to understand how Tol2 would behave upon activation after integration into the genome. We performed a large-scale enhancer trap (ET) screen and generated 338 insertions of the Tol2 transposon-based ET cassette into the zebrafish genome. These insertions were generated by remobilizing the transposon from two different donor sites in two transgenic lines. We found that 39% of Tol2 insertions occurred in transcription units, mostly into introns. Analysis of the transposon target sites revealed no strict specificity at the DNA sequence level. However, Tol2 was prone to target AT-rich regions with weak palindromic consensus sequences centered at the insertion site. Our systematic analysis of sequential remobilizations of the Tol2 transposon from two independent sites within a vertebrate genome has revealed properties such as a tendency to integrate into transcription units and into AT-rich palindrome-like sequences. This information will influence the development of various applications involving DNA transposons and Tol2 in particular.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hancock, Stephen P.; Stella, Stefano; Cascio, Duilio
The abundant Fis nucleoid protein selectively binds poorly related DNA sequences with high affinities to regulate diverse DNA reactions. Fis binds DNA primarily through DNA backbone contacts and selects target sites by reading conformational properties of DNA sequences, most prominently intrinsic minor groove widths. High-affinity binding requires Fis-stabilized DNA conformational changes that vary depending on DNA sequence. In order to better understand the molecular basis for high affinity site recognition, we analyzed the effects of DNA sequence within and flanking the core Fis binding site on binding affinity and DNA structure. X-ray crystal structures of Fis-DNA complexes containing variable sequencesmore » in the noncontacted center of the binding site or variations within the major groove interfaces show that the DNA can adapt to the Fis dimer surface asymmetrically. We show that the presence and position of pyrimidine-purine base steps within the major groove interfaces affect both local DNA bending and minor groove compression to modulate affinities and lifetimes of Fis-DNA complexes. Sequences flanking the core binding site also modulate complex affinities, lifetimes, and the degree of local and global Fis-induced DNA bending. In particular, a G immediately upstream of the 15 bp core sequence inhibits binding and bending, and A-tracts within the flanking base pairs increase both complex lifetimes and global DNA curvatures. Taken together, our observations support a revised DNA motif specifying high-affinity Fis binding and highlight the range of conformations that Fis-bound DNA can adopt. Lastly, the affinities and DNA conformations of individual Fis-DNA complexes are likely to be tailored to their context-specific biological functions.« less
Hancock, Stephen P.; Stella, Stefano; Cascio, Duilio; ...
2016-03-09
The abundant Fis nucleoid protein selectively binds poorly related DNA sequences with high affinities to regulate diverse DNA reactions. Fis binds DNA primarily through DNA backbone contacts and selects target sites by reading conformational properties of DNA sequences, most prominently intrinsic minor groove widths. High-affinity binding requires Fis-stabilized DNA conformational changes that vary depending on DNA sequence. In order to better understand the molecular basis for high affinity site recognition, we analyzed the effects of DNA sequence within and flanking the core Fis binding site on binding affinity and DNA structure. X-ray crystal structures of Fis-DNA complexes containing variable sequencesmore » in the noncontacted center of the binding site or variations within the major groove interfaces show that the DNA can adapt to the Fis dimer surface asymmetrically. We show that the presence and position of pyrimidine-purine base steps within the major groove interfaces affect both local DNA bending and minor groove compression to modulate affinities and lifetimes of Fis-DNA complexes. Sequences flanking the core binding site also modulate complex affinities, lifetimes, and the degree of local and global Fis-induced DNA bending. In particular, a G immediately upstream of the 15 bp core sequence inhibits binding and bending, and A-tracts within the flanking base pairs increase both complex lifetimes and global DNA curvatures. Taken together, our observations support a revised DNA motif specifying high-affinity Fis binding and highlight the range of conformations that Fis-bound DNA can adopt. Lastly, the affinities and DNA conformations of individual Fis-DNA complexes are likely to be tailored to their context-specific biological functions.« less
Specific minor groove solvation is a crucial determinant of DNA binding site recognition
Harris, Lydia-Ann; Williams, Loren Dean; Koudelka, Gerald B.
2014-01-01
The DNA sequence preferences of nearly all sequence specific DNA binding proteins are influenced by the identities of bases that are not directly contacted by protein. Discrimination between non-contacted base sequences is commonly based on the differential abilities of DNA sequences to allow narrowing of the DNA minor groove. However, the factors that govern the propensity of minor groove narrowing are not completely understood. Here we show that the differential abilities of various DNA sequences to support formation of a highly ordered and stable minor groove solvation network are a key determinant of non-contacted base recognition by a sequence-specific binding protein. In addition, disrupting the solvent network in the non-contacted region of the binding site alters the protein's ability to recognize contacted base sequences at positions 5–6 bases away. This observation suggests that DNA solvent interactions link contacted and non-contacted base recognition by the protein. PMID:25429976
A Method for Preparing DNA Sequencing Templates Using a DNA-Binding Microplate
Yang, Yu; Hebron, Haroun R.; Hang, Jun
2009-01-01
A DNA-binding matrix was immobilized on the surface of a 96-well microplate and used for plasmid DNA preparation for DNA sequencing. The same DNA-binding plate was used for bacterial growth, cell lysis, DNA purification, and storage. In a single step using one buffer, bacterial cells were lysed by enzymes, and released DNA was captured on the plate simultaneously. After two wash steps, DNA was eluted and stored in the same plate. Inclusion of phosphates in the culture medium was found to enhance the yield of plasmid significantly. Purified DNA samples were used successfully in DNA sequencing with high consistency and reproducibility. Eleven vectors and nine libraries were tested using this method. In 10 μl sequencing reactions using 3 μl sample and 0.25 μl BigDye Terminator v3.1, the results from a 3730xl sequencer gave a success rate of 90–95% and read-lengths of 700 bases or more. The method is fully automatable and convenient for manual operation as well. It enables reproducible, high-throughput, rapid production of DNA with purity and yields sufficient for high-quality DNA sequencing at a substantially reduced cost. PMID:19568455
Dendritic Cell-Based Immunotherapy of Breast Cancer: Modulation by CpG DNA
2005-09-01
tumor-associated antigens and bacterial DNA oligodeoxynucleotides containing unmethylated CpG sequences (CpG DNA) further augment the immune priming...associated antigens by cytotoxic T lymphocytes, and bacterial DNA oligodeoxy- nucleotides containing unmethylated CpG sequences (CpG DNA) can further...further amplify their immunostimulatory capacity and bacterial DNA oligodeoxynucleotides (ODN) containing unmethylated CpG sequences (CpG DNA) provide such
Morozumi, Takeya; Toki, Daisuke; Eguchi-Ogawa, Tomoko; Uenishi, Hirohide
2011-09-01
Large-scale cDNA-sequencing projects require an efficient strategy for mass sequencing. Here we describe a method for sequencing pooled cDNA clones using a combination of transposon insertion and Gateway technology. Our method reduces the number of shotgun clones that are unsuitable for reconstruction of cDNA sequences, and has the advantage of reducing the total costs of the sequencing project.
Bogovič Matijašić, Bojana; Obermajer, Tanja; Lipoglavšek, Luka; Sernel, Tjaša; Locatelli, Igor; Kos, Mitja; Šmid, Alenka; Rogelj, Irena
2016-07-01
We conducted a randomized double-blind, placebo-controlled multicentric study to investigate the influence of a synbiotic fermented milk on the fecal microbiota composition of 30 adults with irritable bowel syndrome (IBS). The synbiotic product contained Lactobacillus acidophilus La-5, Bifidobacterium animalis ssp. lactis BB-12, Streptococcus thermophilus, and dietary fiber (90% inulin, 10% oligofructose), and a heat-treated fermented milk without probiotic bacteria or dietary fiber served as placebo. Stool samples were collected after a run-in period, a 4-wk consumption period, and a 1-wk follow-up period, and were subjected to real-time PCR and 16S rDNA profiling by next-generation sequencing. After 4wk of synbiotic (11 subjects) or placebo (19 subjects) consumption, a greater increase in DNA specific for L. acidophilus La-5 and Bifidobacterium animalis ssp. lactis was detected in the feces of the synbiotic group compared with the placebo group by quantitative real-time PCR. After 1wk of follow-up, the content of L. acidophilus La-5 and B. animalis ssp. lactis decreased to levels close to initial levels. No significant changes with time or differences between the groups were observed for Lactobacillus, Enterobacteriaceae, Bifidobacterium, or all bacteria. The presence of viable BB-12- and La-5-like bacteria in the feces resulting from the intake of synbiotic product was confirmed by random amplification of polymorphic DNA (RAPD)-PCR. At the end of consumption period, the feces of all subjects assigned to the synbiotic group contained viable bacteria with a BB-12-like RAPD profile, and after 1wk of follow-up, BB-12-like bacteria remained in the feces of 87.5% of these subjects. The presence of La-5-like colonies was observed less frequently (37.5 and 25% of subjects, respectively). Next-generation sequencing of 16S rDNA amplicons revealed that only the percentage of sequences assigned to Strep. thermophilus was temporarily increased in both groups, whereas the global profile of the fecal microbiota of patients was not altered by consumption of the synbiotic or placebo. In conclusion, daily consumption of a synbiotic fermented milk had a short-term effect on the amount and proportion of La-5-like strains and B. animalis ssp. lactis in the fecal microbiome of IBS patients. Furthermore, both synbiotic and placebo products caused a temporary increase in fecal Strep. thermophilus. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Biological sequence compression algorithms.
Matsumoto, T; Sadakane, K; Imai, H
2000-01-01
Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we improve the CTW so that characteristic structures of DNA sequences are available. Before encoding the next symbol, the algorithm searches an approximate repeat and palindrome using hash and dynamic programming. If there is a palindrome or an approximate repeat with enough length then our algorithm represents it with length and distance. By using this preprocessing, a new program achieves a little higher compression ratio than that of existing DNA-oriented compression algorithms. We also describe new compression algorithm for protein sequences.
High-Throughput Next-Generation Sequencing of Polioviruses
Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.
2016-01-01
ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.
Li, Qing; Hermanson, Peter J; Springer, Nathan M
2018-01-01
DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.
Krefft, Daria; Papkov, Aliaksei; Prusinowski, Maciej; Zylicz-Stachula, Agnieszka; Skowron, Piotr M
2018-05-11
Acoustic or hydrodynamic shearing, sonication and enzymatic digestion are used to fragment DNA. However, these methods have several disadvantages, such as DNA damage, difficulties in fragmentation control, irreproducibility and under-representation of some DNA segments. The DNA fragmentation tool would be a gentle enzymatic method, offering cleavage frequency high enough to eliminate DNA fragments distribution bias and allow for easy control of partial digests. Only three such frequently cleaving natural restriction endonucleases (REases) were discovered: CviJI, SetI and FaiI. Therefore, we have previously developed two artificial enzymatic specificities, cleaving DNA approximately every ~ 3-bp: TspGWI/sinefungin (SIN) and TaqII/SIN. In this paper we present the third developed specificity: TthHB27I/SIN(SAM) - a new genomic tool, based on Type IIS/IIC/IIG Thermus-family REases-methyltransferases (MTases). In the presence of dimethyl sulfoxide (DMSO) and S-adenosyl-L-methionine (SAM) or its analogue SIN, the 6-bp cognate TthHB27I recognition sequence 5'-CAARCA-3' is converted into a combined 3.2-3.0-bp 'site' or its statistical equivalent, while a cleavage distance of 11/9 nt is retained. Protocols for various modes of limited DNA digestions were developed. In the presence of DMSO and SAM or SIN, TthHB27I is transformed from rare 6-bp cutter to a very frequent one, approximately 3-bp. Thus, TthHB27I/SIN(SAM) comprises a new tool in the very low-represented segment of such prototype REases specificities. Moreover, this modified TthHB27I enzyme is uniquely suited for controlled DNA fragmentation, due to partial DNA cleavage, which is an inherent feature of the Thermus-family enzymes. Such tool can be used for quasi-random libraries generation as well as for other DNA manipulations, requiring high frequency cleavage and uniform distribution of cuts along DNA.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stapleton, Mark; Liao, Guochun; Brokstein, Peter
2002-08-12
Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5prime expressed sequence tags (EST) from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to {approx}40 percent of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remainingmore » genes, we have generated an additional 157,835 5prime ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0-22hr embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70 percent of the predicted genes in Drosophila.« less
Jiménez, Diego Javier; Montaña, José Salvador; Martínez, María Mercedes
2011-01-01
With the purpose of isolating and characterizing free nitrogen fixing bacteria (FNFB) of the genus Azotobacter, soil samples were collected randomly from different vegetable organic cultures with neutral pH in different zones of Boyacá-Colombia. Isolations were done in selective free nitrogen Ashby-Sucrose agar obtaining a recovery of 40%. Twenty four isolates were evaluated for colony and cellular morphology, pigment production and metabolic activities. Molecular characterization was carried out using amplified ribosomal DNA restriction analysis (ARDRA). After digestion of 16S rDNA Y1-Y3 PCR products (1487pb) with AluI, HpaII and RsaI endonucleases, a polymorphism of 16% was obtained. Cluster analysis showed three main groups based on DNA fingerprints. Comparison between ribotypes generated by isolates and in silico restriction of 16S rDNA partial sequences with same restriction enzymes was done with Gen Workbench v.2.2.4 software. Nevertheless, Y1-Y2 PCR products were analysed using BLASTn. Isolate C5T from tomato (Lycopersicon esculentum) grown soils presented the same in silico restriction patterns with A. chroococcum (AY353708) and 99% of similarity with the same sequence. Isolate C5CO from cauliflower (Brassica oleracea var. botrytis) grown soils showed black pigmentation in Ashby-Benzoate agar and high similarity (91%) with A. nigricans (AB175651) sequence. In this work we demonstrated the utility of molecular techniques and bioinformatics tools as a support to conventional techniques in characterization of the genus Azotobacter from vegetable-grown soils. PMID:24031700
Characterization of the repetitive DNA elements in the genome of fish lymphocystis disease viruses.
Schnitzler, P; Darai, G
1989-09-01
The complete DNA nucleotide sequence of the repetitive DNA elements in the genome of fish lymphocystis disease virus (FLDV) isolated from two different species (flounder and dab) was determined. The size of these repetitive DNA elements was found to be 1413 bp which corresponds to the DNA sequences of the 5' terminus of the EcoRI DNA fragment B (0.034 to 0.052 m.u.) and to the EcoRI DNA fragment M (0.718 to 0.736 m.u.) of the FLDV genome causing lymphocystis disease in flounder and plaice. The degree of DNA nucleotide homology between both regions was found to be 99%. The repetitive DNA element in the genome of FLDV isolated from other fish species (dab) was identified and is located within the EcoRI DNA fragment B and J of the viral genome. The DNA nucleotide sequence of one duplicate of this repetition (EcoRI DNA fragment J) was determined (1410 bp) and compared to the DNA nucleotide sequences of the repetitive DNA elements of the genome of FLDV isolated from flounder. It was found that the repetitive DNA elements of the genome of FLDV derived from two different fish species are highly conserved and possess a degree of DNA sequence homology of 94%. The DNA sequences of each strand of the individual repetitive element possess one open reading frame.
NASA Astrophysics Data System (ADS)
Suenaga, A.; Yatsu, C.; Komeiji, Y.; Uebayasi, M.; Meguro, T.; Yamato, I.
2000-08-01
Molecular dynamics simulation of Escherichia colitrp-repressor/operator complex was performed to elucidate protein-DNA interactions in solution for 800 ps on special-purpose computer MD-GRAPE. The Ewald summation method was employed to treat the electrostatic interaction without cutoff. DNA kept stable conformation in comparison with the result of the conventional cutoff method. Thus, the trajectories obtained were used to analyze the protein-DNA interaction and to understand the role of dynamics of water molecules forming sequence specific recognition interface. The dynamical cross-correlation map showed a significant positive correlation between the helix-turn-helix DNA-binding motifs and the major grooves of operator DNA. The extensive contact surface was stable during the simulation. Most of the contacts consisted of direct interactions between phosphates of DNA and the protein, but several water-mediated polar contacts were also observed. These water-mediated interactions, which were also seen in the crystal structure (Z. Otwinowski, et al., Nature, 335 (1998) 321) emerged spontaneously from the randomized initial configuration of the solvent. This result suggests the importance of the water-mediated interaction in specific recognition of DNA by the trp-repressor, consistent with X-ray structural information.
Direct Single-Molecule Observation of Mode and Geometry of RecA-Mediated Homology Search.
Lee, Andrew J; Endo, Masayuki; Hobbs, Jamie K; Wälti, Christoph
2018-01-23
Genomic integrity, when compromised by accrued DNA lesions, is maintained through efficient repair via homologous recombination. For this process the ubiquitous recombinase A (RecA), and its homologues such as the human Rad51, are of central importance, able to align and exchange homologous sequences within single-stranded and double-stranded DNA in order to swap out defective regions. Here, we directly observe the widely debated mechanism of RecA homology searching at a single-molecule level using high-speed atomic force microscopy (HS-AFM) in combination with tailored DNA origami frames to present the reaction targets in a way suitable for AFM-imaging. We show that RecA nucleoprotein filaments move along DNA substrates via short-distance facilitated diffusions, or slides, interspersed with longer-distance random moves, or hops. Importantly, from the specific interaction geometry, we find that the double-stranded substrate DNA resides in the secondary DNA binding-site within the RecA nucleoprotein filament helical groove during the homology search. This work demonstrates that tailored DNA origami, in conjunction with HS-AFM, can be employed to reveal directly conformational and geometrical information on dynamic protein-DNA interactions which was previously inaccessible at an individual single-molecule level.
[Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].
Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y
2017-08-01
To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine
Molecular contributions to conservation
Haig, Susan M.
1998-01-01
Recent advances in molecular technology have opened a new chapter in species conservation efforts, as well as population biology. DNA sequencing, MHC (major histocompatibility complex), minisatellite, microsatellite, and RAPD (random amplified polymorphic DNA) procedures allow for identification of parentage, more distant relatives, founders to new populations, unidentified individuals, population structure, effective population size, population-specific markers, etc. PCR (polymerase chain reaction) amplification of mitochondrial DNA, nuclear DNA, ribosomal DNA, chloroplast DNA, and other systems provide for more sophisticated analyses of metapopulation structure, hybridization events, and delineation of species, subspecies, and races, all of which aid in setting species recovery priorities. Each technique can be powerful in its own right but is most credible when used in conjunction with other molecular techniques and, most importantly, with ecological and demographic data collected from the field. Surprisingly few taxa of concern have been assayed with any molecular technique. Thus, rather than showcasing exhaustive details from a few well-known examples, this paper attempts to present a broad range of cases in which molecular techniques have been used to provide insight into conservation efforts.
Partial bisulfite conversion for unique template sequencing
Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael
2018-01-01
Abstract We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. PMID:29161423
Sequence periodicity in nucleosomal DNA and intrinsic curvature
2010-01-01
Background Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Results Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. Conclusions The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA. PMID:20487515
Murray, V
1999-01-01
This article reviews the literature concerning the sequence specificity of DNA-damaging agents. DNA-damaging agents are widely used in cancer chemotherapy. It is important to understand fully the determinants of DNA sequence specificity so that more effective DNA-damaging agents can be developed as antitumor drugs. There are five main methods of DNA sequence specificity analysis: cleavage of end-labeled fragments, linear amplification with Taq DNA polymerase, ligation-mediated polymerase chain reaction (PCR), single-strand ligation PCR, and footprinting. The DNA sequence specificity in purified DNA and in intact mammalian cells is reviewed for several classes of DNA-damaging agent. These include agents that form covalent adducts with DNA, free radical generators, topoisomerase inhibitors, intercalators and minor groove binders, enzymes, and electromagnetic radiation. The main sites of adduct formation are at the N-7 of guanine in the major groove of DNA and the N-3 of adenine in the minor groove, whereas free radical generators abstract hydrogen from the deoxyribose sugar and topoisomerase inhibitors cause enzyme-DNA cross-links to form. Several issues involved in the determination of the DNA sequence specificity are discussed. The future directions of the field, with respect to cancer chemotherapy, are also examined.
Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing
Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi
2016-01-01
Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039
Arias, Covadonga R.; Pujalte, María Jesús; Garay, Esperanza; Aznar, Rosa
1998-01-01
Genetic relationships among 132 strains of Vibrio vulnificus (clinical, environmental, and diseased-eel isolates from different geographic origins, as well as seawater and shellfish isolates from the western Mediterranean coast, including reference strains) were analyzed by random amplified polymorphic DNA (RAPD) PCR. Results were validated by ribotyping. For ribotyping, DNAs were digested with KpnI and hybridized with an oligonucleotide probe complementary to a highly conserved sequence in the 23S rRNA gene. Random amplification of DNA was performed with M13 and T3 universal primers. The comparison between ribotyping and RAPD PCR revealed an overall agreement regarding the high level of homogeneity of diseased-eel isolates in contrast to the genetic heterogeneity of Mediterranean isolates. The latter suggests the existence of autochthonous clones present in Mediterranean coastal waters. Both techniques have revealed a genetic proximity among Spanish fish farm isolates and a close relationship between four Spanish eel farm isolates and some Mediterranean isolates. Whereas the differentiation within diseased-eel isolates was only possible by ribotyping, RAPD PCR was able to differentiate phenotypically atypical isolates of V. vulnificus. On the basis of our results, RAPD PCR is proposed as a better technique than ribotyping for rapid typing in the routine analysis of new V. vulnificus isolates. PMID:9726889
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.
Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Kidd, Kenneth K; Pakstis, Andrew J; Speed, William C; Lagacé, Robert; Chang, Joseph; Wootton, Sharon; Haigh, Eva; Kidd, Judith R
2014-09-01
SNPs that are molecularly very close (<10kb) will generally have extremely low recombination rates, much less than 10(-4). Multiple haplotypes will often exist because of the history of the origins of the variants at the different sites, rare recombinants, and the vagaries of random genetic drift and/or selection. Such multiallelic haplotype loci are potentially important in forensic work for individual identification, for defining ancestry, and for identifying familial relationships. The new DNA sequencing capabilities currently available make possible continuous runs of a few hundred base pairs so that we can now determine the allelic combination of multiple SNPs on each chromosome of an individual, i.e., the phase, for multiple SNPs within a small segment of DNA. Therefore, we have begun to identify regions, encompassing two to four SNPs with an extent of <200bp that define multiallelic haplotype loci. We have identified candidate regions and have collected pilot data on many candidate microhaplotype loci. Here we present 31 microhaplotype loci that have at least three alleles, have high heterozygosity, are globally informative, and are statistically independent at the population level. This study of microhaplotype loci (microhaps) provides proof of principle that such markers exist and validates their usefulness for ancestry inference, lineage-clan-family inference, and individual identification. The true value of microhaplotypes will come with sequencing methods that can establish alleles unambiguously, including disentangling of mixtures, because a single sequencing run on a single strand of DNA will encompass all of the SNPs. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Genome editing with CompoZr custom zinc finger nucleases (ZFNs).
Hansen, Keith; Coussens, Matthew J; Sago, Jack; Subramanian, Shilpi; Gjoka, Monika; Briner, Dave
2012-06-14
Genome editing is a powerful technique that can be used to elucidate gene function and the genetic basis of disease. Traditional gene editing methods such as chemical-based mutagenesis or random integration of DNA sequences confer indiscriminate genetic changes in an overall inefficient manner and require incorporation of undesirable synthetic sequences or use of aberrant culture conditions, potentially confusing biological study. By contrast, transient ZFN expression in a cell can facilitate precise, heritable gene editing in a highly efficient manner without the need for administration of chemicals or integration of synthetic transgenes. Zinc finger nucleases (ZFNs) are enzymes which bind and cut distinct sequences of double-stranded DNA (dsDNA). A functional CompoZr ZFN unit consists of two individual monomeric proteins that bind a DNA "half-site" of approximately 15-18 nucleotides (see Figure 1). When two ZFN monomers "home" to their adjacent target sites the DNA-cleavage domains dimerize and create a double-strand break (DSB) in the DNA. Introduction of ZFN-mediated DSBs in the genome lays a foundation for highly efficient genome editing. Imperfect repair of DSBs in a cell via the non-homologous end-joining (NHEJ) DNA repair pathway can result in small insertions and deletions (indels). Creation of indels within the gene coding sequence of a cell can result in frameshift and subsequent functional knockout of a gene locus at high efficiency. While this protocol describes the use of ZFNs to create a gene knockout, integration of transgenes may also be conducted via homology-directed repair at the ZFN cut site. The CompoZr Custom ZFN Service represents a systematic, comprehensive, and well-characterized approach to targeted gene editing for the scientific community with ZFN technology. Sigma scientists work closely with investigators to 1) perform due diligence analysis including analysis of relevant gene structure, biology, and model system pursuant to the project goals, 2) apply this knowledge to develop a sound targeting strategy, 3) then design, build, and functionally validate ZFNs for activity in a relevant cell line. The investigator receives positive control genomic DNA and primers, and ready-to-use ZFN reagents supplied in both plasmid DNA and in-vitro transcribed mRNA format. These reagents may then be delivered for transient expression in the investigator's cell line or cell type of choice. Samples are then tested for gene editing at the locus of interest by standard molecular biology techniques including PCR amplification, enzymatic digest, and electrophoresis. After positive signal for gene editing is detected in the initial population, cells are single-cell cloned and genotyped for identification of mutant clones/alleles.
Characterization and Modulation of Proteins Involved in Sulfur Mustard Vesication
2000-06-01
PARP staining was present throughout the nucleus, the DBD showed a more localized punctate pattern in the region of the nucleolus and throughout the...34 oligonucleotide is synthesized that is identical in base composition to the antisense, but had a randomly generated sequence. This is an important control...reversed this inhibitory effect. The roles of PARP in modulating the composition and enzyme activities of the DNA synthesome were further investigated by
Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors.
Adalsteinsson, Viktor A; Ha, Gavin; Freeman, Samuel S; Choudhury, Atish D; Stover, Daniel G; Parsons, Heather A; Gydush, Gregory; Reed, Sarah C; Rotem, Denisse; Rhoades, Justin; Loginov, Denis; Livitz, Dimitri; Rosebrock, Daniel; Leshchiner, Ignaty; Kim, Jaegil; Stewart, Chip; Rosenberg, Mara; Francis, Joshua M; Zhang, Cheng-Zhong; Cohen, Ofir; Oh, Coyin; Ding, Huiming; Polak, Paz; Lloyd, Max; Mahmud, Sairah; Helvie, Karla; Merrill, Margaret S; Santiago, Rebecca A; O'Connor, Edward P; Jeong, Seong H; Leeson, Rachel; Barry, Rachel M; Kramkowski, Joseph F; Zhang, Zhenwei; Polacek, Laura; Lohr, Jens G; Schleicher, Molly; Lipscomb, Emily; Saltzman, Andrea; Oliver, Nelly M; Marini, Lori; Waks, Adrienne G; Harshman, Lauren C; Tolaney, Sara M; Van Allen, Eliezer M; Winer, Eric P; Lin, Nancy U; Nakabayashi, Mari; Taplin, Mary-Ellen; Johannessen, Cory M; Garraway, Levi A; Golub, Todd R; Boehm, Jesse S; Wagle, Nikhil; Getz, Gad; Love, J Christopher; Meyerson, Matthew
2017-11-06
Whole-exome sequencing of cell-free DNA (cfDNA) could enable comprehensive profiling of tumors from blood but the genome-wide concordance between cfDNA and tumor biopsies is uncertain. Here we report ichorCNA, software that quantifies tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations. We apply ichorCNA to 1439 blood samples from 520 patients with metastatic prostate or breast cancers. In the earliest tested sample for each patient, 34% of patients have ≥10% tumor-derived cfDNA, sufficient for standard coverage whole-exome sequencing. Using whole-exome sequencing, we validate the concordance of clonal somatic mutations (88%), copy number alterations (80%), mutational signatures, and neoantigens between cfDNA and matched tumor biopsies from 41 patients with ≥10% cfDNA tumor content. In summary, we provide methods to identify patients eligible for comprehensive cfDNA profiling, revealing its applicability to many patients, and demonstrate high concordance of cfDNA and metastatic tumor whole-exome sequencing.
An evolution based biosensor receptor DNA sequence generation algorithm.
Kim, Eungyeong; Lee, Malrey; Gatton, Thomas M; Lee, Jaewan; Zang, Yupeng
2010-01-01
A biosensor is composed of a bioreceptor, an associated recognition molecule, and a signal transducer that can selectively detect target substances for analysis. DNA based biosensors utilize receptor molecules that allow hybridization with the target analyte. However, most DNA biosensor research uses oligonucleotides as the target analytes and does not address the potential problems of real samples. The identification of recognition molecules suitable for real target analyte samples is an important step towards further development of DNA biosensors. This study examines the characteristics of DNA used as bioreceptors and proposes a hybrid evolution-based DNA sequence generating algorithm, based on DNA computing, to identify suitable DNA bioreceptor recognition molecules for stable hybridization with real target substances. The Traveling Salesman Problem (TSP) approach is applied in the proposed algorithm to evaluate the safety and fitness of the generated DNA sequences. This approach improves efficiency and stability for enhanced and variable-length DNA sequence generation and allows extension to generation of variable-length DNA sequences with diverse receptor recognition requirements.
Structural and Thermodynamic Signatures of DNA Recognition by Mycobacterium tuberculosis DnaA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsodikov, Oleg V.; Biswas, Tapan
An essential protein, DnaA, binds to 9-bp DNA sites within the origin of replication oriC. These binding events are prerequisite to forming an enigmatic nucleoprotein scaffold that initiates replication. The number, sequences, positions, and orientations of these short DNA sites, or DnaA boxes, within the oriCs of different bacteria vary considerably. To investigate features of DnaA boxes that are important for binding Mycobacterium tuberculosis DnaA (MtDnaA), we have determined the crystal structures of the DNA binding domain (DBD) of MtDnaA bound to a cognate MtDnaA-box (at 2.0 {angstrom} resolution) and to a consensus Escherichia coli DnaA-box (at 2.3 {angstrom}). Thesemore » structures, complemented by calorimetric equilibrium binding studies of MtDnaA DBD in a series of DnaA-box variants, reveal the main determinants of DNA recognition and establish the [T/C][T/A][G/A]TCCACA sequence as a high-affinity MtDnaA-box. Bioinformatic and calorimetric analyses indicate that DnaA-box sequences in mycobacterial oriCs generally differ from the optimal binding sequence. This sequence variation occurs commonly at the first 2 bp, making an in vivo mycobacterial DnaA-box effectively a 7-mer and not a 9-mer. We demonstrate that the decrease in the affinity of these MtDnaA-box variants for MtDnaA DBD relative to that of the highest-affinity box TTGTCCACA is less than 10-fold. The understanding of DnaA-box recognition by MtDnaA and E. coli DnaA enables one to map DnaA-box sequences in the genomes of M. tuberculosis and other eubacteria.« less
Lakshmanan, Lakshmi Narayanan; Gruber, Jan; Halliwell, Barry; Gunawan, Rudiyanto
2015-01-01
Non D-loop direct repeats (DRs) in mitochondrial DNA (mtDNA) have been commonly implicated in the mutagenesis of mtDNA deletions associated with neuromuscular disease and ageing. Further, these DRs have been hypothesized to put a constraint on the lifespan of mammals and are under a negative selection pressure. Using a compendium of 294 mammalian mtDNA, we re-examined the relationship between species lifespan and the mutagenicity of such DRs. Contradicting the prevailing hypotheses, we found no significant evidence that long-lived mammals possess fewer mutagenic DRs than short-lived mammals. By comparing DR counts in human mtDNA with those in selectively randomized sequences, we also showed that the number of DRs in human mtDNA is primarily determined by global mtDNA properties, such as the bias in synonymous codon usage (SCU) and nucleotide composition. We found that SCU bias in mtDNA positively correlates with DR counts, where repeated usage of a subset of codons leads to more frequent DR occurrences. While bias in SCU and nucleotide composition has been attributed to nucleotide mutational bias, mammalian mtDNA still exhibit higher SCU bias and DR counts than expected from such mutational bias, suggesting a lack of negative selection against non D-loop DRs. PMID:25855815
DNA barcode goes two-dimensions: DNA QR code web server.
Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.
TaxI: a software tool for DNA barcoding using distance methods
Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel
2005-01-01
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755
Tabor, Stanley; Richardson, Charles C.
1995-04-25
A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.
Kukita, Yoji; Matoba, Ryo; Uchida, Junji; Hamakawa, Takuya; Doki, Yuichiro; Imamura, Fumio; Kato, Kikuya
2015-08-01
Circulating tumour DNA (ctDNA) is an emerging field of cancer research. However, current ctDNA analysis is usually restricted to one or a few mutation sites due to technical limitations. In the case of massively parallel DNA sequencers, the number of false positives caused by a high read error rate is a major problem. In addition, the final sequence reads do not represent the original DNA population due to the global amplification step during the template preparation. We established a high-fidelity target sequencing system of individual molecules identified in plasma cell-free DNA using barcode sequences; this system consists of the following two steps. (i) A novel target sequencing method that adds barcode sequences by adaptor ligation. This method uses linear amplification to eliminate the errors introduced during the early cycles of polymerase chain reaction. (ii) The monitoring and removal of erroneous barcode tags. This process involves the identification of individual molecules that have been sequenced and for which the number of mutations have been absolute quantitated. Using plasma cell-free DNA from patients with gastric or lung cancer, we demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Aguilar, William; Paz, Manuel M; Vargas, Anayatzinc; Clement, Cristina C; Cheng, Shu-Yuan; Champeil, Elise
2018-04-20
Mitomycin C (MC), a potent antitumor drug, and decarbamoylmitomycin C (DMC), a derivative lacking the carbamoyl group, form highly cytotoxic DNA interstrand crosslinks. The major interstrand crosslink formed by DMC is the C1'' epimer of the major crosslink formed by MC. The molecular basis for the stereochemical configuration exhibited by DMC was investigated using biomimetic synthesis. The formation of DNA-DNA crosslinks by DMC is diastereospecific and diastereodivergent: Only the 1''S-diastereomer of the initially formed monoadduct can form crosslinks at GpC sequences, and only the 1''R-diastereomer of the monoadduct can form crosslinks at CpG sequences. We also show that CpG and GpC sequences react with divergent diastereoselectivity in the first alkylation step: 1"S stereochemistry is favored at GpC sequences and 1''R stereochemistry is favored at CpG sequences. Therefore, the first alkylation step results, at each sequence, in the selective formation of the diastereomer able to generate an interstrand DNA-DNA crosslink after the "second arm" alkylation. Examination of the known DNA adduct pattern obtained after treatment of cancer cell cultures with DMC indicates that the GpC sequence is the major target for the formation of DNA-DNA crosslinks in vivo by this drug. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sproul, John S; Maddison, David R
2017-11-01
Despite advances that allow DNA sequencing of old museum specimens, sequencing small-bodied, historical specimens can be challenging and unreliable as many contain only small amounts of fragmented DNA. Dependable methods to sequence such specimens are especially critical if the specimens are unique. We attempt to sequence small-bodied (3-6 mm) historical specimens (including nomenclatural types) of beetles that have been housed, dried, in museums for 58-159 years, and for which few or no suitable replacement specimens exist. To better understand ideal approaches of sample preparation and produce preparation guidelines, we compared different library preparation protocols using low amounts of input DNA (1-10 ng). We also explored low-cost optimizations designed to improve library preparation efficiency and sequencing success of historical specimens with minimal DNA, such as enzymatic repair of DNA. We report successful sample preparation and sequencing for all historical specimens despite our low-input DNA approach. We provide a list of guidelines related to DNA repair, bead handling, reducing adapter dimers and library amplification. We present these guidelines to facilitate more economical use of valuable DNA and enable more consistent results in projects that aim to sequence challenging, irreplaceable historical specimens. © 2017 John Wiley & Sons Ltd.
Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Chadaram, Sudha; Mande, Sharmila S
2011-11-30
Obtaining accurate estimates of microbial diversity using rDNA profiling is the first step in most metagenomics projects. Consequently, most metagenomic projects spend considerable amounts of time, money and manpower for experimentally cloning, amplifying and sequencing the rDNA content in a metagenomic sample. In the second step, the entire genomic content of the metagenome is extracted, sequenced and analyzed. Since DNA sequences obtained in this second step also contain rDNA fragments, rapid in silico identification of these rDNA fragments would drastically reduce the cost, time and effort of current metagenomic projects by entirely bypassing the experimental steps of primer based rDNA amplification, cloning and sequencing. In this study, we present an algorithm called i-rDNA that can facilitate the rapid detection of 16S rDNA fragments from amongst millions of sequences in metagenomic data sets with high detection sensitivity. Performance evaluation with data sets/database variants simulating typical metagenomic scenarios indicates the significantly high detection sensitivity of i-rDNA. Moreover, i-rDNA can process a million sequences in less than an hour on a simple desktop with modest hardware specifications. In addition to the speed of execution, high sensitivity and low false positive rate, the utility of the algorithmic approach discussed in this paper is immense given that it would help in bypassing the entire experimental step of primer-based rDNA amplification, cloning and sequencing. Application of this algorithmic approach would thus drastically reduce the cost, time and human efforts invested in all metagenomic projects. A web-server for the i-rDNA algorithm is available at http://metagenomics.atc.tcs.com/i-rDNA/
Biosensors for DNA sequence detection
NASA Technical Reports Server (NTRS)
Vercoutere, Wenonah; Akeson, Mark
2002-01-01
DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.
Thomas, W. Kelley; Vida, J. T.; Frisse, Linda M.; Mundo, Manuel; Baldwin, James G.
1997-01-01
To effectively integrate DNA sequence analysis and classical nematode taxonomy, we must be able to obtain DNA sequences from formalin-fixed specimens. Microdissected sections of nematodes were removed from specimens fixed in formalin, using standard protocols and without destroying morphological features. The fixed sections provided sufficient template for multiple polymerase chain reaction-based DNA sequence analyses. PMID:19274156
Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos
2005-01-01
We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308
Star, Bastiaan; Nederbragt, Alexander J.; Hansen, Marianne H. S.; Skage, Morten; Gilfillan, Gregor D.; Bradbury, Ian R.; Pampoulie, Christophe; Stenseth, Nils Chr; Jakobsen, Kjetill S.; Jentoft, Sissel
2014-01-01
Degradation-specific processes and variation in laboratory protocols can bias the DNA sequence composition from samples of ancient or historic origin. Here, we identify a novel artifact in sequences from historic samples of Atlantic cod (Gadus morhua), which forms interrupted palindromes consisting of reverse complementary sequence at the 5′ and 3′-ends of sequencing reads. The palindromic sequences themselves have specific properties – the bases at the 5′-end align well to the reference genome, whereas extensive misalignments exists among the bases at the terminal 3′-end. The terminal 3′ bases are artificial extensions likely caused by the occurrence of hairpin loops in single stranded DNA (ssDNA), which can be ligated and amplified in particular library creation protocols. We propose that such hairpin loops allow the inclusion of erroneous nucleotides, specifically at the 3′-end of DNA strands, with the 5′-end of the same strand providing the template. We also find these palindromes in previously published ancient DNA (aDNA) datasets, albeit at varying and substantially lower frequencies. This artifact can negatively affect the yield of endogenous DNA in these types of samples and introduces sequence bias. PMID:24608104
Yamada, Kazuhiko; Nishida-Umehara, Chizuko; Matsuda, Yoichi
2004-03-01
We isolated a new family of satellite DNA sequences from HaeIII- and EcoRI-digested genomic DNA of the Blakiston's fish owl ( Ketupa blakistoni). The repetitive sequences were organized in tandem arrays of the 174 bp element, and localized to the centromeric regions of all macrochromosomes, including the Z and W chromosomes, and microchromosomes. This hybridization pattern was consistent with the distribution of C-band-positive centromeric heterochromatin, and the satellite DNA sequences occupied 10% of the total genome as a major component of centromeric heterochromatin. The sequences were homogenized between macro- and microchromosomes in this species, and therefore intraspecific divergence of the nucleotide sequences was low. The 174 bp element cross-hybridized to the genomic DNA of six other Strigidae species, but not to that of the Tytonidae, suggesting that the satellite DNA sequences are conserved in the same family but fairly divergent between the different families in the Strigiformes. Secondly, the centromeric satellite DNAs were cloned from eight Strigidae species, and the nucleotide sequences of 41 monomer fragments were compared within and between species. Molecular phylogenetic relationships of the nucleotide sequences were highly correlated with both the taxonomy based on morphological traits and the phylogenetic tree constructed by DNA-DNA hybridization. These results suggest that the satellite DNA sequence has evolved by concerted evolution in the Strigidae and that it is a good taxonomic and phylogenetic marker to examine genetic diversity between Strigiformes species.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sobottka, Marcelo, E-mail: sobottka@mtm.ufsc.br; Hart, Andrew G., E-mail: ahart@dim.uchile.cl
Highlights: {yields} We propose a simple stochastic model to construct primitive DNA sequences. {yields} The model provide an explanation for Chargaff's second parity rule in primitive DNA sequences. {yields} The model is also used to predict a novel type of strand symmetry in primitive DNA sequences. {yields} We extend the results for bacterial DNA sequences and compare distributional properties intrinsic to the model to statistical estimates from 1049 bacterial genomes. {yields} We find out statistical evidences that the novel type of strand symmetry holds for bacterial DNA sequences. -- Abstract: Chargaff's second parity rule for short oligonucleotides states that themore » frequency of any short nucleotide sequence on a strand is approximately equal to the frequency of its reverse complement on the same strand. Recent studies have shown that, with the exception of organellar DNA, this parity rule generally holds for double-stranded DNA genomes and fails to hold for single-stranded genomes. While Chargaff's first parity rule is fully explained by the Watson-Crick pairing in the DNA double helix, a definitive explanation for the second parity rule has not yet been determined. In this work, we propose a model based on a hidden Markov process for approximating the distributional structure of primitive DNA sequences. Then, we use the model to provide another possible theoretical explanation for Chargaff's second parity rule, and to predict novel distributional aspects of bacterial DNA sequences.« less
Maruyama, Toru; Yamagishi, Keisuke; Mori, Tetsushi; Takeyama, Haruko
2015-01-01
Whole genome amplification (WGA) is essential for obtaining genome sequences from single bacterial cells because the quantity of template DNA contained in a single cell is very low. Multiple displacement amplification (MDA), using Phi29 DNA polymerase and random primers, is the most widely used method for single-cell WGA. However, single-cell MDA usually results in uneven genome coverage because of amplification bias, background amplification of contaminating DNA, and formation of chimeras by linking of non-contiguous chromosomal regions. Here, we present a novel MDA method, termed droplet MDA, that minimizes amplification bias and amplification of contaminants by using picoliter-sized droplets for compartmentalized WGA reactions. Extracted DNA fragments from a lysed cell in MDA mixture are divided into 105 droplets (67 pL) within minutes via flow through simple microfluidic channels. Compartmentalized genome fragments can be individually amplified in these droplets without the risk of encounter with reagent-borne or environmental contaminants. Following quality assessment of WGA products from single Escherichia coli cells, we showed that droplet MDA minimized unexpected amplification and improved the percentage of genome recovery from 59% to 89%. Our results demonstrate that microfluidic-generated droplets show potential as an efficient tool for effective amplification of low-input DNA for single-cell genomics and greatly reduce the cost and labor investment required for determination of nearly complete genome sequences of uncultured bacteria from environmental samples. PMID:26389587
Riley, D E; Wagner, B; Polley, L; Krieger, J N
1995-01-01
The protozoan parasite Tritrichomonas foetus causes infertility and spontaneous abortion in cattle. In Saskatchewan, Canada, the culture prevalence of trichomonads was 65 of 1,048 (6%) among 1,048 bulls tested within a 1-year period ending in April 1994. Saskatchewan was previously thought to be free of the parasite. To confirm the culture results, possible T. foetus DNA presence was determined by the PCR. All of the 16 culture-positive isolates tested were PCR positive by a single-band test, but one PCR product was weak. DNA fingerprinting by both T17 PCR and randomly amplified polymorphic DNA PCR revealed genetic variation or polymorphism among the T. foetus isolates. T17 PCR also revealed conserved loci that distinguished these T. foetus isolates from Trichomonas vaginalis, from a variety of other protozoa, and from prokaryotes. TCO-1 PCR, a PCR test designed to sample DNA sequence homologous to the 5' flank of a highly conserved cell division control gene, detected genetic polymorphism at low stringency and a conserved, single locus at higher stringency. These findings suggested that T. foetus isolates exhibit both conserved genetic loci and polymorphic loci detectable by independent PCR methods. Both conserved and polymorphic genetic loci may prove useful for improved clinical diagnosis of T. foetus. The polymorphic loci detected by PCR suggested either a long history of infection or multiple lines of T. foetus infection in Saskatchewan. Polymorphic loci detected by PCR may provide data for epidemiologic studies of T. foetus. PMID:7615746
The Airborne Metagenome in an Indoor Urban Environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tringe, Susannah; Zhang, Tao; Liu, Xuguo
2008-02-12
The indoor atmosphere is an ecological unit that impacts on public health. To investigate the composition of organisms in this space, we applied culture-independent approaches to microbes harvested from the air of two densely populated urban buildings, from which we analyzed 80 megabases genomic DNA sequence and 6000 16S rDNA clones. The air microbiota is primarily bacteria, including potential opportunistic pathogens commonly isolated from human-inhabited environments such as hospitals, but none of the data contain matches to virulent pathogens or bioterror agents. Comparison of air samples with each other and nearby environments suggested that the indoor air microbes are notmore » random transients from surrounding outdoor environments, but rather originate from indoor niches. Sequence annotation by gene function revealed specific adaptive capabilities enriched in the air environment, including genes potentially involved in resistance to desiccation and oxidative damage. This baseline index of air microbiota will be valuable for improving designs of surveillance for natural or man-made release of virulent pathogens.« less
In vitro selection of functional nucleic acids
NASA Technical Reports Server (NTRS)
Wilson, D. S.; Szostak, J. W.
1999-01-01
In vitro selection allows rare functional RNA or DNA molecules to be isolated from pools of over 10(15) different sequences. This approach has been used to identify RNA and DNA ligands for numerous small molecules, and recent three-dimensional structure solutions have revealed the basis for ligand recognition in several cases. By selecting high-affinity and -specificity nucleic acid ligands for proteins, promising new therapeutic and diagnostic reagents have been identified. Selection experiments have also been carried out to identify ribozymes that catalyze a variety of chemical transformations, including RNA cleavage, ligation, and synthesis, as well as alkylation and acyl-transfer reactions and N-glycosidic and peptide bond formation. The existence of such RNA enzymes supports the notion that ribozymes could have directed a primitive metabolism before the evolution of protein synthesis. New in vitro protein selection techniques should allow for a direct comparison of the frequency of ligand binding and catalytic structures in pools of random sequence polynucleotides versus polypeptides.
The Airborne Metagenome in an Indoor Urban Environment
Liu, Xuguo; Yu, Yiting; Lee, Wah Heng; Yap, Jennifer; Yao, Fei; Suan, Sim Tiow; Ing, Seah Keng; Haynes, Matthew; Rohwer, Forest; Wei, Chia Lin; Tan, Patrick; Bristow, James; Rubin, Edward M.; Ruan, Yijun
2008-01-01
The indoor atmosphere is an ecological unit that impacts on public health. To investigate the composition of organisms in this space, we applied culture-independent approaches to microbes harvested from the air of two densely populated urban buildings, from which we analyzed 80 megabases genomic DNA sequence and 6000 16S rDNA clones. The air microbiota is primarily bacteria, including potential opportunistic pathogens commonly isolated from human-inhabited environments such as hospitals, but none of the data contain matches to virulent pathogens or bioterror agents. Comparison of air samples with each other and nearby environments suggested that the indoor air microbes are not random transients from surrounding outdoor environments, but rather originate from indoor niches. Sequence annotation by gene function revealed specific adaptive capabilities enriched in the air environment, including genes potentially involved in resistance to desiccation and oxidative damage. This baseline index of air microbiota will be valuable for improving designs of surveillance for natural or man-made release of virulent pathogens. PMID:18382653
A Simulation of DNA Sequencing Utilizing 3M Post-It[R] Notes
ERIC Educational Resources Information Center
Christensen, Doug
2009-01-01
An inexpensive and equipment free approach to teaching the technical aspects of DNA sequencing. The activity described requires an instructor with a familiarity of DNA sequencing technology but provides a straight forward method of teaching the technical aspects of sequencing in the absence of expensive sequencing equipment. The final sequence…
Lee, James W.; Thundat, Thomas G.
2005-06-14
An apparatus and method for performing nucleic acid (DNA and/or RNA) sequencing on a single molecule. The genetic sequence information is obtained by probing through a DNA or RNA molecule base by base at nanometer scale as though looking through a strip of movie film. This DNA sequencing nanotechnology has the theoretical capability of performing DNA sequencing at a maximal rate of about 1,000,000 bases per second. This enhanced performance is made possible by a series of innovations including: novel applications of a fine-tuned nanometer gap for passage of a single DNA or RNA molecule; thin layer microfluidics for sample loading and delivery; and programmable electric fields for precise control of DNA or RNA movement. Detection methods include nanoelectrode-gated tunneling current measurements, dielectric molecular characterization, and atomic force microscopy/electrostatic force microscopy (AFM/EFM) probing for nanoscale reading of the nucleic acid sequences.
The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.
Khoe, Clairine V; Chung, Long H; Murray, Vincent
2018-06-01
The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Richardson, David S; Westerdahl, Helena
2003-12-01
The Great reed warbler (GRW) and the Seychelles warbler (SW) are congeners with markedly different demographic histories. The GRW is a normal outbred bird species while the SW population remains isolated and inbred after undergoing a severe population bottleneck. We examined variation at Major Histocompatibility Complex (MHC) class I exon 3 using restriction fragment length polymorphism, denaturing gradient gel electrophoresis and DNA sequencing. Although genetic variation was higher in the GRW, considerable variation has been maintained in the SW. The ten exon 3 sequences found in the SW were as diverged from each other as were a random sub-sample of the 67 sequences from the GRW. There was evidence for balancing selection in both species, and the phylogenetic analysis showing that the exon 3 sequences did not separate according to species, was consistent with transspecies evolution of the MHC.
Valdez-Velázquez, Laura L; Quintero-Hernández, Verónica; Romero-Gutiérrez, Maria Teresa; Coronas, Fredy I V; Possani, Lourival D
2013-01-01
Centruroides tecomanus is a Mexican scorpion endemic of the State of Colima, that causes human fatalities. This communication describes a proteome analysis obtained from milked venom and a transcriptome analysis from a cDNA library constructed from two pairs of venom glands of this scorpion. High perfomance liquid chromatography separation of soluble venom produced 80 fractions, from which at least 104 individual components were identified by mass spectrometry analysis, showing to contain molecular masses from 259 to 44,392 Da. Most of these components are within the expected molecular masses for Na(+)- and K(+)-channel specific toxic peptides, supporting the clinical findings of intoxication, when humans are stung by this scorpion. From the cDNA library 162 clones were randomly chosen, from which 130 sequences of good quality were identified and were clustered in 28 contigs containing, each, two or more expressed sequence tags (EST) and 49 singlets with only one EST. Deduced amino acid sequence analysis from 53% of the total ESTs showed that 81% (24 sequences) are similar to known toxic peptides that affect Na(+)-channel activity, and 19% (7 unique sequences) are similar to K(+)-channel especific toxins. Out of the 31 sequences, at least 8 peptides were confirmed by direct Edman degradation, using components isolated directly from the venom. The remaining 19%, 4%, 4%, 15% and 5% of the ESTs correspond respectively to proteins involved in cellular processes, antimicrobial peptides, venom components, proteins without defined function and sequences without similarity in databases. Among the cloned genes are those similar to metalloproteinases.
FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.
El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant
2016-01-01
A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.
Torque measurements reveal sequence-specific cooperative transitions in supercoiled DNA
Oberstrass, Florian C.; Fernandes, Louis E.; Bryant, Zev
2012-01-01
B-DNA becomes unstable under superhelical stress and is able to adopt a wide range of alternative conformations including strand-separated DNA and Z-DNA. Localized sequence-dependent structural transitions are important for the regulation of biological processes such as DNA replication and transcription. To directly probe the effect of sequence on structural transitions driven by torque, we have measured the torsional response of a panel of DNA sequences using single molecule assays that employ nanosphere rotational probes to achieve high torque resolution. The responses of Z-forming d(pGpC)n sequences match our predictions based on a theoretical treatment of cooperative transitions in helical polymers. “Bubble” templates containing 50–100 bp mismatch regions show cooperative structural transitions similar to B-DNA, although less torque is required to disrupt strand–strand interactions. Our mechanical measurements, including direct characterization of the torsional rigidity of strand-separated DNA, establish a framework for quantitative predictions of the complex torsional response of arbitrary sequences in their biological context. PMID:22474350
NASA Astrophysics Data System (ADS)
Yang, Hong
Until recently, recovery and analysis of genetic information encoded in ancient DNA sequences from Pleistocene fossils were impossible. Recent advances in molecular biology offered technical tools to obtain ancient DNA sequences from well-preserved Quaternary fossils and opened the possibilities to directly study genetic changes in fossil species to address various biological and paleontological questions. Ancient DNA studies involving Pleistocene fossil material and ancient DNA degradation and preservation in Quaternary deposits are reviewed. The molecular technology applied to isolate, amplify, and sequence ancient DNA is also presented. Authentication of ancient DNA sequences and technical problems associated with modern and ancient DNA contamination are discussed. As illustrated in recent studies on ancient DNA from proboscideans, it is apparent that fossil DNA sequence data can shed light on many aspects of Quaternary research such as systematics and phylogeny. conservation biology, evolutionary theory, molecular taphonomy, and forensic sciences. Improvement of molecular techniques and a better understanding of DNA degradation during fossilization are likely to build on current strengths and to overcome existing problems, making fossil DNA data a unique source of information for Quaternary scientists.
Prophagic DNA Fragments in Streptococcus agalactiae Strains and Association with Neonatal Meningitis
van der Mee-Marquet, Nathalie; Domelier, Anne-Sophie; Mereghetti, Laurent; Lanotte, Philippe; Rosenau, Agnès; van Leeuwen, Willem; Quentin, Roland
2006-01-01
We identified—by randomly amplified polymorphic DNA (RAPD) analysis at the population level followed by DNA differential display, cloning, and sequencing—three prophage DNA fragments (F5, F7, and F10) in Streptococcus agalactiae that displayed significant sequence similarity to the DNA of S. agalactiae and Streptococcus pyogenes. The F5 sequence aligned with a prophagic gene encoding the large subunit of a terminase, F7 aligned with a phage-associated cell wall hydrolase and a phage-associated lysin, and F10 aligned with a transcriptional regulator (ArpU family) and a phage-associated endonuclease. We first determined the prevalence of F5, F7, and F10 by PCR in a collection of 109 strains isolated in the 1980s and divided into two populations: one with a high risk of causing meningitis (HR group) and the other with a lower risk of causing meningitis (LR group). These fragments were significantly more prevalent in the HR group than in the LR group (P < 0.001). Our findings suggest that lysogeny has increased the ability of some S. agalactiae strains to invade the neonatal brain endothelium. We then determined the prevalence of F5, F7, and F10 by PCR in a collection of 40 strains recently isolated from neonatal meningitis cases for comparison with the cerebrospinal fluid (CSF) strains isolated in the 1980s. The prevalence of the three prophage DNA fragments was similar in these two populations isolated 15 years apart. We suggest that the prophage DNA fragments identified have remained stable in many CSF S. agalactiae strains, possibly due to their importance in virulence or fitness. PMID:16517893
Enantiospecific recognition of DNA sequences by a proflavine Tröger base.
Bailly, C; Laine, W; Demeunynck, M; Lhomme, J
2000-07-05
The DNA interaction of a chiral Tröger base derived from proflavine was investigated by DNA melting temperature measurements and complementary biochemical assays. DNase I footprinting experiments demonstrate that the binding of the proflavine-based Tröger base is both enantio- and sequence-specific. The (+)-isomer poorly interacts with DNA in a non-sequence-selective fashion. In sharp contrast, the corresponding (-)-isomer recognizes preferentially certain DNA sequences containing both A. T and G. C base pairs, such as the motifs 5'-GTT. AAC and 5'-ATGA. TCAT. This is the first experimental demonstration that acridine-type Tröger bases can be used for enantiospecific recognition of DNA sequences. Copyright 2000 Academic Press.
NASA Astrophysics Data System (ADS)
Peng, Jun; Ling, Jian; Zhang, Xiu-Qing; Bai, Hui-Ping; Zheng, Liyan; Cao, Qiu-E.; Ding, Zhong-Tao
2015-02-01
In this work, we designed a new fluorescent oligonucleotides-stabilized silver nanoclusters (DNA/AgNCs) probe for sensitive detection of mercury and copper ions. This probe contains two tailored DNA sequence. One is a signal probe contains a cytosine-rich sequence template for AgNCs synthesis and link sequence at both ends. The other is a guanine-rich sequence for signal enhancement and link sequence complementary to the link sequence of the signal probe. After hybridization, the fluorescence of hybridized double-strand DNA/AgNCs is 200-fold enhanced based on the fluorescence enhancement effect of DNA/AgNCs in proximity of guanine-rich DNA sequence. The double-strand DNA/AgNCs probe is brighter and stable than that of single-strand DNA/AgNCs, and more importantly, can be used as novel fluorescent probes for detecting mercury and copper ions. Mercury and copper ions in the range of 6.0-160.0 and 6-240 nM, can be linearly detected with the detection limits of 2.1 and 3.4 nM, respectively. Our results indicated that the analytical parameters of the method for mercury and copper ions detection are much better than which using a single-strand DNA/AgNCs.
Shao, Zhiyong; Graf, Shannon; Chaga, Oleg Y; Lavrov, Dennis V
2006-10-15
The 16,937-nuceotide sequence of the linear mitochondrial DNA (mt-DNA) molecule of the moon jelly Aurelia aurita (Cnidaria, Scyphozoa) - the first mtDNA sequence from the class Scypozoa and the first sequence of a linear mtDNA from Metazoa - has been determined. This sequence contains genes for 13 energy pathway proteins, small and large subunit rRNAs, and methionine and tryptophan tRNAs. In addition, two open reading frames of 324 and 969 base pairs in length have been found. The deduced amino-acid sequence of one of them, ORF969, displays extensive sequence similarity with the polymerase [but not the exonuclease] domain of family B DNA polymerases, and this ORF has been tentatively identified as dnab. This is the first report of dnab in animal mtDNA. The genes in A. aurita mtDNA are arranged in two clusters with opposite transcriptional polarities; transcription proceeding toward the ends of the molecule. The determined sequences at the ends of the molecule are nearly identical but inverted and lack any obvious potential secondary structures or telomere-like repeat elements. The acquisition of mitochondrial genomic data for the second class of Cnidaria allows us to reconstruct characteristic features of mitochondrial evolution in this animal phylum.
Recent patents of nanopore DNA sequencing technology: progress and challenges.
Zhou, Jianfeng; Xu, Bingqian
2010-11-01
DNA sequencing techniques witnessed fast development in the last decades, primarily driven by the Human Genome Project. Among the proposed new techniques, Nanopore was considered as a suitable candidate for the single DNA sequencing with ultrahigh speed and very low cost. Several fabrication and modification techniques have been developed to produce robust and well-defined nanopore devices. Many efforts have also been done to apply nanopore to analyze the properties of DNA molecules. By comparing with traditional sequencing techniques, nanopore has demonstrated its distinctive superiorities in main practical issues, such as sample preparation, sequencing speed, cost-effective and read-length. Although challenges still remain, recent researches in improving the capabilities of nanopore have shed a light to achieve its ultimate goal: Sequence individual DNA strand at single nucleotide level. This patent review briefly highlights recent developments and technological achievements for DNA analysis and sequencing at single molecule level, focusing on nanopore based methods.
Small tandemly repeated DNA sequences of higher plants likely originate from a tRNA gene ancestor.
Benslimane, A A; Dron, M; Hartmann, C; Rode, A
1986-01-01
Several monomers (177 bp) of a tandemly arranged repetitive nuclear DNA sequence of Brassica oleracea have been cloned and sequenced. They share up to 95% homology between one another and up to 80% with other satellite DNA sequences of Cruciferae, suggesting a common ancestor. Both strands of these monomers show more than 50% homology with many tRNA genes; the best homologies have been obtained with Lys and His yeast mitochondrial tRNA genes (respectively 64% and 60%). These results suggest that small tandemly repeated DNA sequences of plants may have evolved from a tRNA gene ancestor. These tandem repeats have probably arisen via a process involving reverse transcription of polymerase III RNA intermediates, as is the case for interspersed DNA sequences of mammalians. A model is proposed to explain the formation of such small tandemly repeated DNA sequences. Images PMID:3774553
Species classifier choice is a key consideration when analysing low-complexity food microbiome data.
Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D
2018-03-20
The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2 = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2 = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.
In vitro selection of high temperature Zn(2+)-dependent DNAzymes.
Nelson, Kevin E; Bruesehoff, Peter J; Lu, Yi
2005-08-01
In vitro selection of Zn(2+)-dependent RNA-cleaving DNAzymes with activity at 90 degrees C has yielded a diverse spool of selected sequences. The RNA cleavage efficiency was found in all cases to be specific for Zn(2+) over Pb(2+), Ca(2+), Cd(2+), Co(2+), Hg(2+), and Mg(2+). The Zn(2+)-dependent activity assay of the most active sequence showed that the DNAzyme possesses an apparent Zn(2+)-binding dissociation constant of 234 muM and that its activity increases with increasing temperatures from 50-90 degrees C. A fit of the Arrhenius plot data gave E(a) = 15.3 kcal mol(-1). Surprisingly, the selected Zn(2+)-dependent DNAzymes showed only a modest (approximately 3-fold) activity enhancement over the background rate of cleavage of random sequences containing a single embedded ribonucleotide within an otherwise DNA oligonucleotide. The result is attributable to the ability of DNA to sustain cleavage activity at high temperature with minimal secondary structure when Zn(2+) is present. Since this effect is highly specific for Zn(2+), this metal ion may play a special role in molecular evolution of nucleic acids at high temperature.
Next-Generation Sequencing Platforms
NASA Astrophysics Data System (ADS)
Mardis, Elaine R.
2013-06-01
Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.
Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus.
Hansen, Peter; Hecht, Jochen; Ibn-Salem, Jonas; Menkuec, Benjamin S; Roskosch, Sebastian; Truss, Matthias; Robinson, Peter N
2016-11-04
ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal of PCR duplicates and for quality control. Furthermore, we developed bespoke methods to estimate the width of the protected region resulting from protein-DNA binding and to infer binding positions from ChIP-nexus data. Finally, we applied our peak calling method as well as the two other methods MACE and MACS2 to the available ChIP-nexus data. The Q-nexus software is efficient and easy to use. Novel statistics about duplication rates in consideration of random barcodes are calculated. Our method for the estimation of the width of the protected region yields unbiased signatures that are highly reproducible for biological replicates and at the same time very specific for the respective factors analyzed. As judged by the irreproducible discovery rate (IDR), our peak calling algorithm shows a substantially better reproducibility. An implementation of Q-nexus is available at http://charite.github.io/Q/ .
Regulatory link between DNA methylation and active demethylation in Arabidopsis
Lei, Mingguang; Zhang, Huiming; Julian, Russell; Tang, Kai; Xie, Shaojun; Zhu, Jian-Kang
2015-01-01
De novo DNA methylation through the RNA-directed DNA methylation (RdDM) pathway and active DNA demethylation play important roles in controlling genome-wide DNA methylation patterns in plants. Little is known about how cells manage the balance between DNA methylation and active demethylation activities. Here, we report the identification of a unique RdDM target sequence, where DNA methylation is required for maintaining proper active DNA demethylation of the Arabidopsis genome. In a genetic screen for cellular antisilencing factors, we isolated several REPRESSOR OF SILENCING 1 (ros1) mutant alleles, as well as many RdDM mutants, which showed drastically reduced ROS1 gene expression and, consequently, transcriptional silencing of two reporter genes. A helitron transposon element (TE) in the ROS1 gene promoter negatively controls ROS1 expression, whereas DNA methylation of an RdDM target sequence between ROS1 5′ UTR and the promoter TE region antagonizes this helitron TE in regulating ROS1 expression. This RdDM target sequence is also targeted by ROS1, and defective DNA demethylation in loss-of-function ros1 mutant alleles causes DNA hypermethylation of this sequence and concomitantly causes increased ROS1 expression. Our results suggest that this sequence in the ROS1 promoter region serves as a DNA methylation monitoring sequence (MEMS) that senses DNA methylation and active DNA demethylation activities. Therefore, the ROS1 promoter functions like a thermostat (i.e., methylstat) to sense DNA methylation levels and regulates DNA methylation by controlling ROS1 expression. PMID:25733903
Attomole-level Genomics with Single-molecule Direct DNA, cDNA and RNA Sequencing Technologies.
Ozsolak, Fatih
2016-01-01
With the introduction of next-generation sequencing (NGS) technologies in 2005, the domination of microarrays in genomics quickly came to an end due to NGS's superior technical performance and cost advantages. By enabling genetic analysis capabilities that were not possible previously, NGS technologies have started to play an integral role in all areas of biomedical research. This chapter outlines the low-quantity DNA and cDNA sequencing capabilities and applications developed with the Helicos single molecule DNA sequencing technology.
Walker, M D; Park, C W; Rosen, A; Aronheim, A
1990-01-01
Cell specific expression of the insulin gene is achieved through transcriptional mechanisms operating on multiple DNA sequence elements located in the 5' flanking region of the gene. Of particular importance in the rat insulin I gene are two closely similar 9 bp sequences (IEB1 and IEB2): mutation of either of these leads to 5-10 fold reduction in transcriptional activity. We have screened an expression cDNA library derived from mouse pancreatic endocrine beta cells with a radioactive DNA probe containing multiple copies of the IEB1 sequence. A cDNA clone (A1) isolated by this procedure encodes a protein which shows efficient binding to the IEB1 probe, but much weaker binding to either an unrelated DNA probe or to a probe bearing a single base pair insertion within the recognition sequence. DNA sequence analysis indicates a protein belonging to the helix-loop-helix family of DNA-binding proteins. The ability of the protein encoded by clone A1 to recognize a number of wild type and mutant DNA sequences correlates closely with the ability of each sequence element to support transcription in vivo in the context of the insulin 5' flanking DNA. We conclude that the isolated cDNA may encode a transcription factor that participates in control of insulin gene expression. Images PMID:2181401
Highly multiplexed targeted DNA sequencing from single nuclei.
Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E
2016-02-01
Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...
2017-07-18
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883
Chromatin accessibility prediction via a hybrid deep convolutional neural network.
Liu, Qiao; Xia, Fei; Yin, Qijin; Jiang, Rui
2018-03-01
A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies. We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases. Deopen is freely available at https://github.com/kimmo1019/Deopen. ruijiang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
DOE Office of Scientific and Technical Information (OSTI.GOV)
Benasutti, M.; Ejadi, S.; Whitlow, M.D.
The mutagenic and carcinogenic chemical aflatoxin B/sub 1/ (AFB/sub 1/) reacts almost exclusively at the N(7)-position of guanine following activation to its reactive form, the 8,9-epoxide (AFB/sub 1/ oxide). In general N(7)-guanine adducts yield DNA strand breaks when heated in base, a property that serves as the basis for the Maxam-Gilbert DNA sequencing reaction specific for guanine. Using DNA sequencing methods, other workers have shown that AFB/sub 1/ oxide gives strand breaks at positions of guanines; however, the guanine bands varied in intensity. This phenomenon has been used to infer that AFB/sub 1/ oxide prefers to react with guanines inmore » some sequence contexts more than in others and has been referred to as sequence specificity of binding. Herein, data on the reaction of AFB/sub 1/ oxide with several synthetic DNA polymers with different sequences are presented, and (following hydrolysis) adduct levels are determine by high-pressure liquid chromatography. These results reveal that for AFB/sub 1/ oxide (1) the N(7)-guanine adduct is the major adduct found in all of the DNA polymers, (2) adduct levels vary in different sequences, and, thus, sequence specificity is also observed by this more direct method, and (3) the intensity of bands in DNA sequencing gels is likely to reflect adduct levels formed at the N(7)-position of guanine. Knowing this, a reinvestigation of the reactivity of guanines in different DNA sequences using DNA sequencing methods was undertaken. Methods are developed to determine the X (5'-side) base and the Y (3'-side) base are most influential in determining guanine reactivity. These rules in conjunction with molecular modeling studies were used to assess the binding sites that might be utilized by AFB/sub 1/ oxide in its reaction with DNA.« less
Chromosome specific repetitive DNA sequences
Moyzis, Robert K.; Meyne, Julianne
1991-01-01
A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).
Aggarwal, Pooja; Das Gupta, Mainak; Joseph, Agnel Praveen; Chatterjee, Nirmalya; Srinivasan, N.; Nath, Utpal
2010-01-01
The TCP transcription factors control multiple developmental traits in diverse plant species. Members of this family share an ∼60-residue-long TCP domain that binds to DNA. The TCP domain is predicted to form a basic helix-loop-helix (bHLH) structure but shares little sequence similarity with canonical bHLH domain. This classifies the TCP domain as a novel class of DNA binding domain specific to the plant kingdom. Little is known about how the TCP domain interacts with its target DNA. We report biochemical characterization and DNA binding properties of a TCP member in Arabidopsis thaliana, TCP4. We have shown that the 58-residue domain of TCP4 is essential and sufficient for binding to DNA and possesses DNA binding parameters comparable to canonical bHLH proteins. Using a yeast-based random mutagenesis screen and site-directed mutants, we identified the residues important for DNA binding and dimer formation. Mutants defective in binding and dimerization failed to rescue the phenotype of an Arabidopsis line lacking the endogenous TCP4 activity. By combining structure prediction, functional characterization of the mutants, and molecular modeling, we suggest a possible DNA binding mechanism for this class of transcription factors. PMID:20363772
ERIC Educational Resources Information Center
Shah, Kushani; Thomas, Shelby; Stein, Arnold
2013-01-01
In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C…
DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server
Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113
Spring-Connell, Alexander M.; Evich, Marina G.; Debelak, Harald; Seela, Frank; Germann, Markus W.
2016-01-01
A truly universal nucleobase enables a host of novel applications such as simplified templates for PCR primers, randomized sequencing and DNA based devices. A universal base must pair indiscriminately to each of the canonical bases with little or preferably no destabilization of the overall duplex. In reality, many candidates either destabilize the duplex or do not base pair indiscriminatingly. The novel base 8-aza-7-deazaadenine (pyrazolo[3,4-d]pyrimidin- 4-amine) N8-(2′deoxyribonucleoside), a deoxyadenosine analog (UB), pairs with each of the natural DNA bases with little sequence preference. We have utilized NMR complemented with molecular dynamic calculations to characterize the structure and dynamics of a UB incorporated into a DNA duplex. The UB participates in base stacking with little to no perturbation of the local structure yet forms an unusual base pair that samples multiple conformations. These local dynamics result in the complete disappearance of a single UB proton resonance under native conditions. Accommodation of the UB is additionally stabilized via heightened backbone conformational sampling. NMR combined with various computational techniques has allowed for a comprehensive characterization of both structural and dynamic effects of the UB in a DNA duplex and underlines that the UB as a strong candidate for universal base applications. PMID:27566150
Downes, Julia; Vartoukian, Sonia R; Dewhirst, Floyd E; Izard, Jacques; Chen, Tsute; Yu, Wen-Han; Sutcliffe, Iain C; Wade, William G
2009-05-01
Four strains of anaerobic, Gram-negative bacilli isolated from the human oral cavity were subjected to a comprehensive range of phenotypic and genotypic tests and were found to comprise a homogeneous group distinct from any species with validly published names. 16S rRNA and 23S rRNA gene sequence analyses and DNA-DNA reassociation data revealed that the strains constituted a novel group within the phylum 'Synergistetes' and were most closely related to Jonquetella anthropi. Two libraries of randomly cloned DNA were prepared from strain W5455(T) and were sequenced to provide a genome survey as a resource for metagenomic studies. A new genus and novel species, Pyramidobacter piscolens gen. nov., sp. nov., is proposed to accommodate these strains. The genus Pyramidobacter comprises strains that are anaerobic, non-motile, asaccharolytic bacilli that produce acetic and isovaleric acids and minor to trace amounts of propionic, isobutyric, succinic and phenylacetic acids as end products of metabolism. P. piscolens gen. nov., sp. nov. produced hydrogen sulphide but was otherwise largely biochemically unreactive. Growth was stimulated by the addition of glycine to broth media. The G+C content of the DNA of the type strain was 59 mol%. The type strain of Pyramidobacter piscolens sp. nov. is W5455(T) (=DSM 21147(T)=CCUG 55836(T)).
Analysis of DNA Sequences by An Optical Time-Integrating Correlator: Proof-Of-Concept Experiments.
1992-05-01
TABLES xv LIST OF ABBREVIATIONS xvii 1.0 INTRODUCTION 1 2.0 DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0...Zehnder architecture. 3 Figure 3: Short representations of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5... DNA bases where each base is represented by 7-bits long pseudorandom sequences. 4 Table 2: Long representations of the DNA bases with 255-bits maximum
SNP discovery through de novo deep sequencing using the next generation of DNA sequencers
USDA-ARS?s Scientific Manuscript database
The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....
A simple procedure for parallel sequence analysis of both strands of 5'-labeled DNA.
Razvi, F; Gargiulo, G; Worcel, A
1983-08-01
Ligation of a 5'-labeled DNA restriction fragment results in a circular DNA molecule carrying the two 32Ps at the reformed restriction site. Double digestions of the circular DNA with the original enzyme and a second restriction enzyme cleavage near the labeled site allows direct chemical sequencing of one 5'-labeled DNA strand. Similar double digestions, using an isoschizomer that cleaves differently at the 32P-labeled site, allows direct sequencing of the now 3'-labeled complementary DNA strand. It is possible to directly sequence both strands of cloned DNA inserts by using the above protocol and a multiple cloning site vector that provides the necessary restriction sites. The simultaneous and parallel visualization of both DNA strands eliminates sequence ambiguities. In addition, the labeled circular molecules are particularly useful for single-hit DNA cleavage studies and DNA footprint analysis. As an example, we show here an analysis of the micrococcal nuclease-induced breaks on the two strands of the somatic 5S RNA gene of Xenopus borealis, which suggests that the enzyme may recognize and cleave small AT-containing palindromes along the DNA helix.
A Glimpse into the Satellite DNA Library in Characidae Fish (Teleostei, Characiformes)
Utsunomia, Ricardo; Ruiz-Ruano, Francisco J.; Silva, Duílio M. Z. A.; Serrano, Érica A.; Rosa, Ivana F.; Scudeler, Patrícia E. S.; Hashimoto, Diogo T.; Oliveira, Claudio; Camacho, Juan Pedro M.; Foresti, Fausto
2017-01-01
Satellite DNA (satDNA) is an abundant fraction of repetitive DNA in eukaryotic genomes and plays an important role in genome organization and evolution. In general, satDNA sequences follow a concerted evolutionary pattern through the intragenomic homogenization of different repeat units. In addition, the satDNA library hypothesis predicts that related species share a series of satDNA variants descended from a common ancestor species, with differential amplification of different satDNA variants. The finding of a same satDNA family in species belonging to different genera within Characidae fish provided the opportunity to test both concerted evolution and library hypotheses. For this purpose, we analyzed here sequence variation and abundance of this satDNA family in ten species, by a combination of next generation sequencing (NGS), PCR and Sanger sequencing, and fluorescence in situ hybridization (FISH). We found extensive between-species variation for the number and size of pericentromeric FISH signals. At genomic level, the analysis of 1000s of DNA sequences obtained by Illumina sequencing and PCR amplification allowed defining 150 haplotypes which were linked in a common minimum spanning tree, where different patterns of concerted evolution were apparent. This also provided a glimpse into the satDNA library of this group of species. In consistency with the library hypothesis, different variants for this satDNA showed high differences in abundance between species, from highly abundant to simply relictual variants. PMID:28855916
Short, interspersed, and repetitive DNA sequences in Spiroplasma species.
Nur, I; LeBlanc, D J; Tully, J G
1987-03-01
Small fragments of DNA from an 8-kbp plasmid, pRA1, from a plant pathogenic strain of Spiroplasma citri were shown previously to be present in the chromosomal DNA of at least two species of Spiroplasma. We describe here the shot-gun cloning of chromosomal DNA from S. citri Maroc and the identification of two distinct sequences exhibiting homology to pRA1. Further subcloning experiments provided specific molecular probes for the identification of these two sequences in chromosomal DNA from three distinct plant pathogenic species of Spiroplasma. The results of Southern blot hybridization indicated that each of the pRA1-associated sequences is present as multiple copies in short, dispersed, and repetitive sequences in the chromosomes of these three strains. None of the sequences was detectable in chromosomal DNA from an additional nine Spiroplasma strains examined.