base pair sequences: Topics by Science.gov

Sample records for base pair sequences

Widespread Transient Hoogsteen Base-Pairs in Canonical Duplex DNA with Variable Energetics

PubMed Central

Alvey, Heidi S.; Gottardo, Federico L.; Nikolova, Evgenia N.; Al-Hashimi, Hashim M.

2015-01-01

Hoogsteen base-pairing involves a 180 degree rotation of the purine base relative to Watson-Crick base-pairing within DNA duplexes, creating alternative DNA conformations that can play roles in recognition, damage induction, and replication. Here, using Nuclear Magnetic Resonance R1ρ relaxation dispersion, we show that transient Hoogsteen base-pairs occur across more diverse sequence and positional contexts than previously anticipated. We observe sequence-specific variations in Hoogsteen base-pair energetic stabilities that are comparable to variations in Watson-Crick base-pair stability, with Hoogsteen base-pairs being more abundant for energetically less favorable Watson-Crick base-pairs. Our results suggest that the variations in Hoogsteen stabilities and rates of formation are dominated by variations in Watson-Crick base pair stability, suggesting a late transition state for the Watson-Crick to Hoogsteen conformational switch. The occurrence of sequence and position-dependent Hoogsteen base-pairs provide a new potential mechanism for achieving sequence-dependent DNA transactions. PMID:25185517
Sequence dependency of canonical base pair opening in the DNA double helix

PubMed Central

Villa, Alessandra

2017-01-01

The flipping-out of a DNA base from the double helical structure is a key step of many cellular processes, such as DNA replication, modification and repair. Base pair opening is the first step of base flipping and the exact mechanism is still not well understood. We investigate sequence effects on base pair opening using extensive classical molecular dynamics simulations targeting the opening of 11 different canonical base pairs in two DNA sequences. Two popular biomolecular force fields are applied. To enhance sampling and calculate free energies, we bias the simulation along a simple distance coordinate using a newly developed adaptive sampling algorithm. The simulation is guided back and forth along the coordinate, allowing for multiple opening pathways. We compare the calculated free energies with those from an NMR study and check assumptions of the model used for interpreting the NMR data. Our results further show that the neighboring sequence is an important factor for the opening free energy, but also indicates that other sequence effects may play a role. All base pairs are observed to have a propensity for opening toward the major groove. The preferred opening base is cytosine for GC base pairs, while for AT there is sequence dependent competition between the two bases. For AT opening, we identify two non-canonical base pair interactions contributing to a local minimum in the free energy profile. For both AT and CG we observe long-lived interactions with water and with sodium ions at specific sites on the open base pair. PMID:28369121
Base pairing among three cis-acting sequences contributes to template switching during hepadnavirus reverse transcription.

PubMed

Liu, Ning; Tian, Ru; Loeb, Daniel D

2003-02-18

Synthesis of the relaxed-circular (RC) DNA genome of hepadnaviruses requires two template switches during plus-strand DNA synthesis: primer translocation and circularization. Although primer translocation and circularization use different donor and acceptor sequences, and are distinct temporally, they share the common theme of switching from one end of the minus-strand template to the other end. Studies of duck hepatitis B virus have indicated that, in addition to the donor and acceptor sequences, three other cis-acting sequences, named 3E, M, and 5E, are required for the synthesis of RC DNA by contributing to primer translocation and circularization. The mechanism by which 3E, M, and 5E act was not known. We present evidence that these sequences function by base pairing with each other within the minus-strand template. 3E base-pairs with one portion of M (M3) and 5E base-pairs with an adjacent portion of M (M5). We found that disrupting base pairing between 3E and M3 and between 5E and M5 inhibited primer translocation and circularization. More importantly, restoring base pairing with mutant sequences restored the production of RC DNA. These results are consistent with the model that, within duck hepatitis B virus capsids, the ends of the minus-strand template are juxtaposed via base pairing to facilitate the two template switches during plus-strand DNA synthesis.
Genetic and DNA sequence analysis of the kanamycin resistance transposon Tn903.

PubMed Central

Grindley, N D; Joyce, C M

1980-01-01

The kanamycin resistance transposon Tn903 consists of a unique region of about 1000 base pairs bounded by a pair of 1050-base-pair inverted repeat sequences. Each repeat contains two Pvu II endonuclease cleavage sites separated by 520 base pairs. We have constructed derivatives of Tn903 in which this 520-base-pair fragment is deleted from one or both repeats. Those derivatives that lack both 520-base-pair fragments cannot transpose, whereas those that lack just one remain transposition proficient. One such transposable derivative, Tn903 delta I, has been selected for further study. We have determined the sequence of the intact inverted repeat. The 18 base pairs at each end are identical and inverted relative to one another, a structure characteristic of insertion sequences. Additional experiments indicate that a single inverted repeat from Tn903 can, in fact, transpose; we propose that this element be called IS903. To correlate the DNA sequence with genetic activities, we have created mutations by inserting a 10-base-pair DNA fragment at several sites within the intact repeat of Tn903 delta 1, and we have examined the effect of such insertions on transposability. The results suggest that IS903 encodes a 307-amino-acid polypeptide (a "transposase") that is absolutely required for transposition of IS903 or Tn903. Images PMID:6261245
The repeating nucleotide sequence in the repetitive mitochondrial DNA from a "low-density" petite mutant of yeast.

PubMed Central

Van Kreijl, C F; Bos, J L

1977-01-01

The repeating nucleotide sequence of 68 base pairs in the mtDNA from an ethidium-induced cytoplasmic petite mutant of yeast has been determined. For sequence analysis specifically primed and terminated RNA copies, obtained by in vitro transcription of the separated strands, were use. The sequence consists of 66 consecutive AT base pairs flanked by two GC pairs and comprises nearly all of the mutant mitochondrial genome. The sequence, moreover, also represents the first part of wild-type mtDNA sequence so far. Images PMID:198740
Base pairing among three cis-acting sequences contributes to template switching during hepadnavirus reverse transcription

PubMed Central

Liu, Ning; Tian, Ru; Loeb, Daniel D.

2003-01-01

Synthesis of the relaxed-circular (RC) DNA genome of hepadnaviruses requires two template switches during plus-strand DNA synthesis: primer translocation and circularization. Although primer translocation and circularization use different donor and acceptor sequences, and are distinct temporally, they share the common theme of switching from one end of the minus-strand template to the other end. Studies of duck hepatitis B virus have indicated that, in addition to the donor and acceptor sequences, three other cis-acting sequences, named 3E, M, and 5E, are required for the synthesis of RC DNA by contributing to primer translocation and circularization. The mechanism by which 3E, M, and 5E act was not known. We present evidence that these sequences function by base pairing with each other within the minus-strand template. 3E base-pairs with one portion of M (M3) and 5E base-pairs with an adjacent portion of M (M5). We found that disrupting base pairing between 3E and M3 and between 5E and M5 inhibited primer translocation and circularization. More importantly, restoring base pairing with mutant sequences restored the production of RC DNA. These results are consistent with the model that, within duck hepatitis B virus capsids, the ends of the minus-strand template are juxtaposed via base pairing to facilitate the two template switches during plus-strand DNA synthesis. PMID:12578983
Sequence-similar, structure-dissimilar protein pairs in the PDB.

PubMed

Kosloff, Mickey; Kolodny, Rachel

2008-05-01

It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs

PubMed Central

2017-01-01

Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package. PMID:29107980
1,8-Naphthyridine-2,7-diamine: a potential universal reader of Watson-Crick base pairs for DNA sequencing by electron tunneling.

PubMed

Liang, Feng; Lindsay, Stuart; Zhang, Peiming

2012-11-21

With the aid of Density Functional Theory (DFT), we designed 1,8-naphthyridine-2,7-diamine as a recognition molecule to read DNA base pairs for genomic sequencing by electron tunneling. NMR studies show that it can form stable triplets with both A : T and G : C base pairs through hydrogen bonding. Our results suggest that the naphthyridine molecule should be able to function as a universal base pair reader in a tunneling gap, generating distinguishable signatures under electrical bias for each of DNA base pairs.
1,8-Naphthyridine-2,7-diamine: A Potential Universal Reader of the Watson-Crick Base Pairs for DNA Sequencing by Electron Tunneling

PubMed Central

Liang, Feng; Lindsay, Stuart; Zhang, Peiming

2013-01-01

With the aid of Density Functional Theory (DFT), we designed 1,8-naphthyridine-2,7-diamine as a recognition molecule to read the DNA base pairs for genomic sequencing by electron tunneling. NMR studies show that it can form stable triplets with both A:T and G:C base pairs through hydrogen bonding. Our results suggest that the naphthyridine molecule should be able to function as a universal base pair reader in a tunneling gap, generating distinguishable signatures under electrical bias for each of DNA base pairs. PMID:23038027
A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA

PubMed Central

Lavery, Richard; Zakrzewska, Krystyna; Beveridge, David; Bishop, Thomas C.; Case, David A.; Cheatham, Thomas; Dixit, Surjit; Jayaram, B.; Lankas, Filip; Laughton, Charles; Maddocks, John H.; Michon, Alexis; Osman, Roman; Orozco, Modesto; Perez, Alberto; Singh, Tanya; Spackova, Nada; Sponer, Jiri

2010-01-01

It is well recognized that base sequence exerts a significant influence on the properties of DNA and plays a significant role in protein–DNA interactions vital for cellular processes. Understanding and predicting base sequence effects requires an extensive structural and dynamic dataset which is currently unavailable from experiment. A consortium of laboratories was consequently formed to obtain this information using molecular simulations. This article describes results providing information not only on all 10 unique base pair steps, but also on all possible nearest-neighbor effects on these steps. These results are derived from simulations of 50–100 ns on 39 different DNA oligomers in explicit solvent and using a physiological salt concentration. We demonstrate that the simulations are converged in terms of helical and backbone parameters. The results show that nearest-neighbor effects on base pair steps are very significant, implying that dinucleotide models are insufficient for predicting sequence-dependent behavior. Flanking base sequences can notably lead to base pair step parameters in dynamic equilibrium between two conformational sub-states. Although this study only provides limited data on next-nearest-neighbor effects, we suggest that such effects should be analyzed before attempting to predict the sequence-dependent behavior of DNA. PMID:19850719
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.

PubMed

Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia

2017-03-14

Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
Switch wear leveling

DOEpatents

Wu, Hunter; Sealy, Kylee; Gilchrist, Aaron

2015-09-01

An apparatus for switch wear leveling includes a switching module that controls switching for two or more pairs of switches in a switching power converter. The switching module controls switches based on a duty cycle control technique and closes and opens each switch in a switching sequence. The pairs of switches connect to a positive and negative terminal of a DC voltage source. For a first switching sequence a first switch of a pair of switches has a higher switching power loss than a second switch of the pair of switches. The apparatus includes a switch rotation module that changes the switching sequence of the two or more pairs of switches from the first switching sequence to a second switching sequence. The second switch of a pair of switches has a higher switching power loss than the first switch of the pair of switches during the second switching sequence.
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.

PubMed

Xu, Weijia; Ozer, Stuart; Gutell, Robin R

2009-01-01

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

PubMed Central

Xu, Weijia; Ozer, Stuart; Gutell, Robin R.

2010-01-01

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534
Hybrid-denovo: a de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags.

PubMed

Chen, Xianfeng; Johnson, Stephen; Jeraldo, Patricio; Wang, Junwen; Chia, Nicholas; Kocher, Jean-Pierre A; Chen, Jun

2018-03-01

Illumina paired-end sequencing has been increasingly popular for 16S rRNA gene-based microbiota profiling. It provides higher phylogenetic resolution than single-end reads due to a longer read length. However, the reverse read (R2) often has significant low base quality, and a large proportion of R2s will be discarded after quality control, resulting in a mixture of paired-end and single-end reads. A typical 16S analysis pipeline usually processes either paired-end or single-end reads but not a mixture. Thus, the quantification accuracy and statistical power will be reduced due to the loss of a large amount of reads. As a result, rare taxa may not be detectable with the paired-end approach, or low taxonomic resolution will result in a single-end approach. To have both the higher phylogenetic resolution provided by paired-end reads and the higher sequence coverage by single-end reads, we propose a novel OTU-picking pipeline, hybrid-denovo, that can process a hybrid of single-end and paired-end reads. Using high-quality paired-end reads as a gold standard, we show that hybrid-denovo achieved the highest correlation with the gold standard and performed better than the approaches based on paired-end or single-end reads in terms of quantifying the microbial diversity and taxonomic abundances. By applying our method to a rheumatoid arthritis (RA) data set, we demonstrated that hybrid-denovo captured more microbial diversity and identified more RA-associated taxa than a paired-end or single-end approach. Hybrid-denovo utilizes both paired-end and single-end 16S sequencing reads and is recommended for 16S rRNA gene targeted paired-end sequencing data.
Intervening sequences in a plant gene-comparison of the partial sequence of cDNA and genomic DNA of French bean phaseolin

NASA Astrophysics Data System (ADS)

Sun, S. M.; Slightom, J. L.; Hall, T. C.

1981-01-01

A plant gene coding for the major storage protein (phaseolin, G1-globulin) of the French bean was isolated from a genomic library constructed in the phage vector Charon 24A. Comparison of the nucleotide sequence of part of the gene with that of the cloned messenger RNA (cDNA) revealed the presence of three intervening sequences, all beginning with GTand ending with AG. The 5' and 3' boundaries of intervening sequences TVS-A (88 base pairs) and IVS-B (124 base pairs) are similar to those described for animal and viral genes, but the 3' boundary of IVS-C (129 base pairs) shows some differences. A sequence of 185 amino acids deduced from the cloned DMAs represents about 40% of a phaseolin polypeptide.
All gene-sized DNA molecules in four species of hypotrichs have the same terminal sequence and an unusual 3' terminus.

PubMed Central

Klobutcher, L A; Swanton, M T; Donini, P; Prescott, D M

1981-01-01

In hypotrichous ciliates, all of the macronuclear DNA is in the form of low molecular weight molecules with an average size of approximately 2200 base pairs. Total macronuclear DNA from four hypotrichs has been shown to have inverted terminal repeats by direct sequence analysis. In Oxytricha nova, Oxytricha sp., and Stylonychia pustulata, this terminal sequence may be written as 5'-C4A4C4A4C4 ... 3'-G4T4G4T4G4T4G4T4G4 ... In Euplotes aediculatus, the sequences is similar but differs in the lengths of the duplex region (28 base pairs) and of the putative 3' extension (14 base pairs). Also in Euplotes, a second common sequence of 5 base pairs (A-A-C-T-T-T-T-G-A-A) occurs internal to the terminal repeat and a 17-base-pair heterogeneous region: 5'-C4A4C4A4C4A4C4(X)17T-T-G-A-A ... 3'-G2T4G4T4G4T4G4T4G4T4G4(X)17A-A-C-T-T ... The length of the terminal repeat sequence for O. nova was confirmed in cloned macronuclear DNA molecules. Images PMID:6265931
Molecular switching behavior in isosteric DNA base pairs.

PubMed

Jissy, A K; Konar, Sukanya; Datta, Ayan

2013-04-15

The structures and proton-coupled behavior of adenine-thymine (A-T) and a modified base pair containing a thymine isostere, adenine-difluorotoluene (A-F), are studied in different solvents by dispersion-corrected density functional theory. The stability of the canonical Watson-Crick base pair and the mismatched pair in various solvents with low and high dielectric constants is analyzed. It is demonstrated that A-F base pairing is favored in solvents with low dielectric constant. The stabilization and conformational changes induced by protonation are also analyzed for the natural as well as the mismatched base pair. DNA sequences capable of changing their sequence conformation on protonation are used in the construction of pH-based molecular switches. An acidic medium has a profound influence in stabilizing the isostere base pair. Such a large gain in stability on protonation leads to an interesting pH-controlled molecular switch, which can be incorporated in a natural DNA tract. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Extending the language of DNA molecular recognition by polyamides: unexpected influence of imidazole and pyrrole arrangement on binding affinity and specificity.

PubMed

Buchmueller, Karen L; Staples, Andrew M; Howard, Cameron M; Horick, Sarah M; Uthe, Peter B; Le, N Minh; Cox, Kari K; Nguyen, Binh; Pacheco, Kimberly A O; Wilson, W David; Lee, Moses

2005-01-19

Pyrrole (Py) and imidazole (Im) polyamides can be designed to target specific DNA sequences. The effect that the pyrrole and imidazole arrangement, plus DNA sequence, have on sequence specificity and binding affinity has been investigated using DNA melting (DeltaT(M)), circular dichroism (CD), and surface plasmon resonance (SPR) studies. SPR results obtained from a complete set of triheterocyclic polyamides show a dramatic difference in the affinity of f-ImPyIm for its cognate DNA (K(eq) = 1.9 x 10(8) M(-1)) and f-PyPyIm for its cognate DNA (K(eq) = 5.9 x 10(5) M(-1)), which could not have been anticipated prior to characterization of these compounds. Moreover, f-ImPyIm has a 10-fold greater affinity for CGCG than distamycin A has for its cognate, AATT. To understand this difference, the triamide dimers are divided into two structural groupings: central and terminal pairings. The four possible central pairings show decreasing selectivity and affinity for their respective cognate sequences: -ImPy > -PyPy- > -PyIm- approximately -ImIm-. These results extend the language of current design motifs for polyamide sequence recognition to include the use of "words" for recognizing two adjacent base pairs, rather than "letters" for binding to single base pairs. Thus, polyamides designed to target Watson-Crick base pairs should utilize the strength of -ImPy- and -PyPy- central pairings. The f/Im and f/Py terminal groups yielded no advantage for their respective C/G or T/A base pairs. The exception is with the -ImPy- central pairing, for which f/Im has a 10-fold greater affinity for C/G than f/Py has for T/A.

Method for sequencing DNA base pairs

DOEpatents

Sessler, Andrew M.; Dawson, John

1993-01-01

The base pairs of a DNA structure are sequenced with the use of a scanning tunneling microscope (STM). The DNA structure is scanned by the STM probe tip, and, as it is being scanned, the DNA structure is separately subjected to a sequence of infrared radiation from four different sources, each source being selected to preferentially excite one of the four different bases in the DNA structure. Each particular base being scanned is subjected to such sequence of infrared radiation from the four different sources as that particular base is being scanned. The DNA structure as a whole is separately imaged for each subjection thereof to radiation from one only of each source.
The tolerance to exchanges of the Watson–Crick base pair in the hammerhead ribozyme core is determined by surrounding elements

PubMed Central

Przybilski, Rita; Hammann, Christian

2007-01-01

Tertiary interacting elements are important features of functional RNA molecules, for example, in all small nucleolytic ribozymes. The recent crystal structure of a tertiary stabilized type I hammerhead ribozyme revealed a conventional Watson–Crick base pair in the catalytic core, formed between nucleotides C3 and G8. We show that any Watson–Crick base pair between these positions retains cleavage competence in two type III ribozymes. In the Arabidopsis thaliana sequence, only moderate differences in cleavage rates are observed for the different base pairs, while the peach latent mosaic viroid (PLMVd) ribozyme exhibits a preference for a pyrimidine at position 3 and a purine at position 8. To understand these differences, we created a series of chimeric ribozymes in which we swapped sequence elements that surround the catalytic core. The kinetic characterization of the resulting ribozymes revealed that the tertiary interacting loop sequences of the PLMVd ribozyme are sufficient to induce the preference for Y3–R8 base pairs in the A. thaliana hammerhead ribozyme. In contrast to this, only when the entire stem–loops I and II of the A. thaliana sequences are grafted on the PLMVd ribozyme is any Watson–Crick base pair similarly tolerated. The data provide evidence for a complex interplay of secondary and tertiary structure elements that lead, mediated by long-range effects, to an individual modulation of the local structure in the catalytic core of different hammerhead ribozymes. PMID:17666711
Molecular cloning and nucleotide sequence of the alpha and beta subunits of allophycocyanin from the cyanelle genome of Cyanophora paradoxa.

PubMed Central

Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E

1985-01-01

The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916
Molecular dynamics study of some non-hydrogen-bonding base pair DNA strands

NASA Astrophysics Data System (ADS)

Tiwari, Rakesh K.; Ojha, Rajendra P.; Tiwari, Gargi; Pandey, Vishnudatt; Mall, Vijaysree

2018-05-01

In order to elucidate the structural activity of hydrophobic modified DNA, the DMMO2-D5SICS, base pair is introduced as a constituent in different set of 12-mer and 14-mer DNA sequences for the molecular dynamics (MD) simulation in explicit water solvent. AMBER 14 force field was employed for each set of duplex during the 200ns production-dynamics simulation in orthogonal-box-water solvent by the Particle-Mesh-Ewald (PME) method in infinite periodic boundary conditions (PBC) to determine conformational parameters of the complex. The force-field parameters of modified base-pair were calculated by Gaussian-code using Hartree-Fock /ab-initio methodology. RMSD Results reveal that the conformation of the duplex is sequence dependent and the binding energy of the complex depends on the position of the modified base-pair in the nucleic acid strand. We found that non-bonding energy had a significant contribution to stabilising such type of duplex in comparison to electrostatic energy. The distortion produced within strands by such type of base-pair was local and destabilised the duplex integrity near to substitution, moreover the binding energy of duplex depends on the position of substitution of hydrophobic base-pair and the DNA sequence and strongly supports the corresponding experimental study.
PHYLOGENETIC RELATIONSHIP OF ALEXANDRIUM MONILATUM (DINOPHYCEAE) TO OTHER ALEXANDRIUM SPECIES BASED ON 18S RIBOSOMAL RNA GENE SEQUENCES

EPA Science Inventory

The phylogenetic relationship of Alexandrium monilatum to other Alexandrium spp. was explored using 18S rDNA sequences. Maximum likelilhood phylogenetic analysis of the combined rDNA sequences established that A. monilatum paired with Alexandrium taylori and that the pair was the...
PHYLOGENETIC RELATIONSHIP OF ALEXANDRIUM MONILATUM (DINOPHYCAE)TO OTHER ALEXANDRIUM SPECIES BASED ON 18S RIBOSOMAL RNA GENE SEQUENCES

EPA Science Inventory

The phylogenetic relationship of Alexandrium monilatum to other Alexandrium spp. was explored using 18S rDNA sequences. Maximum likelihood phylogenetic analysis of the combined rDNA sequences established that A. monilatum paired with Alexandrium taylori and that the pair was the ...
Sequence of retrovirus provirus resembles that of bacterial transposable elements

NASA Astrophysics Data System (ADS)

Shimotohno, Kunitada; Mizutani, Satoshi; Temin, Howard M.

1980-06-01

The nucleotide sequences of the terminal regions of an infectious integrated retrovirus cloned in the modified λ phage cloning vector Charon 4A have been elucidated. There is a 569-base pair direct repeat at both ends of the viral DNA. The cell-virus junctions at each end consist of a 5-base pair direct repeat of cell DNA next to a 3-base pair inverted repeat of viral DNA. This structure resembles that of a transposable element and is consistent with the protovirus hypothesis that retroviruses evolved from the cell genome.
Method for sequencing DNA base pairs

DOEpatents

Sessler, A.M.; Dawson, J.

1993-12-14

The base pairs of a DNA structure are sequenced with the use of a scanning tunneling microscope (STM). The DNA structure is scanned by the STM probe tip, and, as it is being scanned, the DNA structure is separately subjected to a sequence of infrared radiation from four different sources, each source being selected to preferentially excite one of the four different bases in the DNA structure. Each particular base being scanned is subjected to such sequence of infrared radiation from the four different sources as that particular base is being scanned. The DNA structure as a whole is separately imaged for each subjection thereof to radiation from one only of each source. 6 figures.
Differential stabilities and sequence-dependent base pair opening dynamics of Watson-Crick base pairs with 5-hydroxymethylcytosine, 5-formylcytosine, or 5-carboxylcytosine.

PubMed

Szulik, Marta W; Pallan, Pradeep S; Nocek, Boguslaw; Voehler, Markus; Banerjee, Surajit; Brooks, Sonja; Joachimiak, Andrzej; Egli, Martin; Eichman, Brandt F; Stone, Michael P

2015-02-10

5-Hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC) form during active demethylation of 5-methylcytosine (5mC) and are implicated in epigenetic regulation of the genome. They are differentially processed by thymine DNA glycosylase (TDG), an enzyme involved in active demethylation of 5mC. Three modified Dickerson-Drew dodecamer (DDD) sequences, amenable to crystallographic and spectroscopic analyses and containing the 5'-CG-3' sequence associated with genomic cytosine methylation, containing 5hmC, 5fC, or 5caC placed site-specifically into the 5'-T(8)X(9)G(10)-3' sequence of the DDD, were compared. The presence of 5caC at the X(9) base increased the stability of the DDD, whereas 5hmC or 5fC did not. Both 5hmC and 5fC increased imino proton exchange rates and calculated rate constants for base pair opening at the neighboring base pair A(5):T(8), whereas 5caC did not. At the oxidized base pair G(4):X(9), 5fC exhibited an increase in the imino proton exchange rate and the calculated kop. In all cases, minimal effects to imino proton exchange rates occurred at the neighboring base pair C(3):G(10). No evidence was observed for imino tautomerization, accompanied by wobble base pairing, for 5hmC, 5fC, or 5caC when positioned at base pair G(4):X(9); each favored Watson-Crick base pairing. However, both 5fC and 5caC exhibited intranucleobase hydrogen bonding between their formyl or carboxyl oxygens, respectively, and the adjacent cytosine N(4) exocyclic amines. The lesion-specific differences observed in the DDD may be implicated in recognition of 5hmC, 5fC, or 5caC in DNA by TDG. However, they do not correlate with differential excision of 5hmC, 5fC, or 5caC by TDG, which may be mediated by differences in transition states of the enzyme-bound complexes.
Differential stabilities and sequence-dependent base pair opening dynamics of Watson–Crick base pairs with 5-hydroxymethylcytosine, 5-formylcytosine, or 5-carboxylcytosine

DOE PAGES

Szulik, Marta W.; Pallan, Pradeep S.; Nocek, Boguslaw; ...

2015-01-29

5-Hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC) form during active demethylation of 5-methylcytosine (5mC) and are implicated in epigenetic regulation of the genome. They are differentially processed by thymine DNA glycosylase (TDG), an enzyme involved in active demethylation of 5mC. Three modified Dickerson–Drew dodecamer (DDD) sequences, amenable to crystallographic and spectroscopic analyses and containing the 5'-CG-3' sequence associated with genomic cytosine methylation, containing 5hmC, 5fC, or 5caC placed site-specifically into the 5'-T 8X 9G 10-3' sequence of the DDD, were compared. The presence of 5caC at the X9 base increased the stability of the DDD, whereas 5hmC or 5fC didmore » not. Both 5hmC and 5fC increased imino proton exchange rates and calculated rate constants for base pair opening at the neighboring base pair A 5:T 8, whereas 5caC did not. At the oxidized base pair G 4:X 9, 5fC exhibited an increase in the imino proton exchange rate and the calculated k op. In all cases, minimal effects to imino proton exchange rates occurred at the neighboring base pair C 3:G 10. No evidence was observed for imino tautomerization, accompanied by wobble base pairing, for 5hmC, 5fC, or 5caC when positioned at base pair G 4:X 9; each favored Watson–Crick base pairing. However, both 5fC and 5caC exhibited intranucleobase hydrogen bonding between their formyl or carboxyl oxygens, respectively, and the adjacent cytosine N 4 exocyclic amines. The lesion-specific differences observed in the DDD may be implicated in recognition of 5hmC, 5fC, or 5caC in DNA by TDG. Furthermore, they do not correlate with differential excision of 5hmC, 5fC, or 5caC by TDG, which may be mediated by differences in transition states of the enzyme-bound complexes.« less
Differential Stabilities and Sequence-Dependent Base Pair Opening Dynamics of Watson–Crick Base Pairs with 5-Hydroxymethylcytosine, 5-Formylcytosine, or 5-Carboxylcytosine

PubMed Central

2016-01-01

5-Hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC) form during active demethylation of 5-methylcytosine (5mC) and are implicated in epigenetic regulation of the genome. They are differentially processed by thymine DNA glycosylase (TDG), an enzyme involved in active demethylation of 5mC. Three modified Dickerson–Drew dodecamer (DDD) sequences, amenable to crystallographic and spectroscopic analyses and containing the 5′-CG-3′ sequence associated with genomic cytosine methylation, containing 5hmC, 5fC, or 5caC placed site-specifically into the 5′-T8X9G10-3′ sequence of the DDD, were compared. The presence of 5caC at the X9 base increased the stability of the DDD, whereas 5hmC or 5fC did not. Both 5hmC and 5fC increased imino proton exchange rates and calculated rate constants for base pair opening at the neighboring base pair A5:T8, whereas 5caC did not. At the oxidized base pair G4:X9, 5fC exhibited an increase in the imino proton exchange rate and the calculated kop. In all cases, minimal effects to imino proton exchange rates occurred at the neighboring base pair C3:G10. No evidence was observed for imino tautomerization, accompanied by wobble base pairing, for 5hmC, 5fC, or 5caC when positioned at base pair G4:X9; each favored Watson–Crick base pairing. However, both 5fC and 5caC exhibited intranucleobase hydrogen bonding between their formyl or carboxyl oxygens, respectively, and the adjacent cytosine N4 exocyclic amines. The lesion-specific differences observed in the DDD may be implicated in recognition of 5hmC, 5fC, or 5caC in DNA by TDG. However, they do not correlate with differential excision of 5hmC, 5fC, or 5caC by TDG, which may be mediated by differences in transition states of the enzyme-bound complexes. PMID:25632825
ATP hydrolysis provides functions that promote rejection of pairings between different copies of long repeated sequences

PubMed Central

Danilowicz, Claudia; Hermans, Laura; Coljee, Vincent; Prévost, Chantal

2017-01-01

Abstract During DNA recombination and repair, RecA family proteins must promote rapid joining of homologous DNA. Repeated sequences with >100 base pair lengths occupy more than 1% of bacterial genomes; however, commitment to strand exchange was believed to occur after testing ∼20–30 bp. If that were true, pairings between different copies of long repeated sequences would usually become irreversible. Our experiments reveal that in the presence of ATP hydrolysis even 75 bp sequence-matched strand exchange products remain quite reversible. Experiments also indicate that when ATP hydrolysis is present, flanking heterologous dsDNA regions increase the reversibility of sequence matched strand exchange products with lengths up to ∼75 bp. Results of molecular dynamics simulations provide insight into how ATP hydrolysis destabilizes strand exchange products. These results inspired a model that shows how pairings between long repeated sequences could be efficiently rejected even though most homologous pairings form irreversible products. PMID:28854739
Sequence analysis of Leukemia DNA

NASA Astrophysics Data System (ADS)

Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

2018-03-01

Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.
A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.

PubMed

Bastien, Olivier; Ortet, Philippe; Roy, Sylvaine; Maréchal, Eric

2005-03-10

Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.
Detecting cooperative sequences in the binding of RNA Polymerase-II

NASA Astrophysics Data System (ADS)

Glass, Kimberly; Rozenberg, Julian; Girvan, Michelle; Losert, Wolfgang; Ott, Ed; Vinson, Charles

2008-03-01

Regulation of the expression level of genes is a key biological process controlled largely by the 1000 base pair (bp) sequence preceding each gene (the promoter region). Within that region transcription factor binding sites (TFBS), 5-10 bp long sequences, act individually or cooperate together in the recruitment of, and therefore subsequent gene transcription by, RNA Polymerase-II (RNAP). We have measured the binding of RNAP to promoters on a genome-wide basis using Chromatin Immunoprecipitation (ChIP-on-Chip) microarray assays. Using all 8-base pair long sequences as a test set, we have identified the DNA sequences that are enriched in promoters with high RNAP binding values. We are able to demonstrate that virtually all sequences enriched in such promoters contain a CpG dinucleotide, indicating that TFBS that contain the CpG dinucleotide are involved in RNAP binding to promoters. Further analysis shows that the presence of pairs of CpG containing sequences cooperate to enhance the binding of RNAP to the promoter.
Comparison of the conformation of an oligonucleotide containing a central G-T base pair with the non-mismatch sequence by proton NMR.

PubMed Central

Quignard, E; Fazakerley, G V; van der Marel, G; van Boom, J H; Guschlbauer, W

1987-01-01

We have recorded NOESY spectra of two non-selfcomplementary undecanucleotide duplexes. From the observed NOEs we do not detect any significant distortion of the helix when a G-C pair is replaced by a G-T pair and the normal interresidue connectivities can be followed through the mismatch site. We conclude that the 2D spectra of the non-exchangeable protons do not allow differentiation between a wobble or rare tautomer form for the mismatch. NOE measurements in H2O, however, clearly show that the mismatch adopts a wobble structure and give information on the hydration in the minor groove for the G-T base pair which is embedded between two A-T base pairs in the sequence. PMID:3033602
Sequence-dependent DNA deformability studied using molecular dynamics simulations.

PubMed

Fujii, Satoshi; Kono, Hidetoshi; Takenaka, Shigeori; Go, Nobuhiro; Sarai, Akinori

2007-01-01

Proteins recognize specific DNA sequences not only through direct contact between amino acids and bases, but also indirectly based on the sequence-dependent conformation and deformability of the DNA (indirect readout). We used molecular dynamics simulations to analyze the sequence-dependent DNA conformations of all 136 possible tetrameric sequences sandwiched between CGCG sequences. The deformability of dimeric steps obtained by the simulations is consistent with that by the crystal structures. The simulation results further showed that the conformation and deformability of the tetramers can highly depend on the flanking base pairs. The conformations of xATx tetramers show the most rigidity and are not affected by the flanking base pairs and the xYRx show by contrast the greatest flexibility and change their conformations depending on the base pairs at both ends, suggesting tetramers with the same central dimer can show different deformabilities. These results suggest that analysis of dimeric steps alone may overlook some conformational features of DNA and provide insight into the mechanism of indirect readout during protein-DNA recognition. Moreover, the sequence dependence of DNA conformation and deformability may be used to estimate the contribution of indirect readout to the specificity of protein-DNA recognition as well as nucleosome positioning and large-scale behavior of nucleic acids.
RNAHelix: computational modeling of nucleic acid structures with Watson-Crick and non-canonical base pairs.

PubMed

Bhattacharyya, Dhananjay; Halder, Sukanya; Basu, Sankar; Mukherjee, Debasish; Kumar, Prasun; Bansal, Manju

2017-02-01

Comprehensive analyses of structural features of non-canonical base pairs within a nucleic acid double helix are limited by the availability of a small number of three dimensional structures. Therefore, a procedure for model building of double helices containing any given nucleotide sequence and base pairing information, either canonical or non-canonical, is seriously needed. Here we describe a program RNAHelix, which is an updated version of our widely used software, NUCGEN. The program can regenerate duplexes using the dinucleotide step and base pair orientation parameters for a given double helical DNA or RNA sequence with defined Watson-Crick or non-Watson-Crick base pairs. The original structure and the corresponding regenerated structure of double helices were found to be very close, as indicated by the small RMSD values between positions of the corresponding atoms. Structures of several usual and unusual double helices have been regenerated and compared with their original structures in terms of base pair RMSD, torsion angles and electrostatic potentials and very high agreements have been noted. RNAHelix can also be used to generate a structure with a sequence completely different from an experimentally determined one or to introduce single to multiple mutation, but with the same set of parameters and hence can also be an important tool in homology modeling and study of mutation induced structural changes.
Unique Thermal Stability of Unnatural Hydrophobic Ds Bases in Double-Stranded DNAs.

PubMed

Kimoto, Michiko; Hirao, Ichiro

2017-10-20

Genetic alphabet expansion technology, the introduction of unnatural bases or base pairs into replicable DNA, has rapidly advanced as a new synthetic biology area. A hydrophobic unnatural base pair between 7-(2-thienyl)imidazo[4,5-b]pyridine (Ds) and 2-nitro-4-propynylpyrrole (Px) exhibited high fidelity as a third base pair in PCR. SELEX methods using the Ds-Px pair enabled high-affinity DNA aptamer generation, and introducing a few Ds bases into DNA aptamers extremely augmented their affinities and selectivities to target proteins. Here, to further scrutinize the functions of this highly hydrophobic Ds base, the thermal stabilities of double-stranded DNAs (dsDNA) containing a noncognate Ds-Ds or G-Ds pair were examined. The thermal stability of the Ds-Ds self-pair was as high as that of the natural G-C pair, and apart from the generally higher stability of the G-C pair than that of the A-T pair, most of the 5'-pyrimidine-Ds-purine-3' sequences, such as CDsA and TDsA, exhibited higher stability than the 5'-purine-Ds-pyrimidine-3' sequences, such as GDsC and ADsC, in dsDNAs. This trait enabled the GC-content-independent control of the thermal stability of the designed dsDNA fragments. The melting temperatures of dsDNA fragments containing the Ds-Ds pair can be predicted from the nearest-neighbor parameters including the Ds base. In addition, the noncognate G-Ds pair can efficiently distinguish its neighboring cognate natural base pairs from noncognate pairs. We demonstrated that real-time PCR using primers containing Ds accurately detected a single-nucleotide mismatch in target DNAs. These unique properties of the Ds base that affect the stabilities of the neighboring base pairs could impart new functions to DNA molecules and technologies.
Structural landscape of base pairs containing post-transcriptional modifications in RNA

PubMed Central

Seelam, Preethi P.; Sharma, Purshotam

2017-01-01

Base pairs involving post-transcriptionally modified nucleobases are believed to play important roles in a wide variety of functional RNAs. Here we present our attempts toward understanding the structural and functional role of naturally occurring modified base pairs using a combination of X-ray crystal structure database analysis, sequence analysis, and advanced quantum chemical methods. Our bioinformatics analysis reveals that despite their presence in all major secondary structural elements, modified base pairs are most prevalent in tRNA crystal structures and most commonly involve guanine or uridine modifications. Further, analysis of tRNA sequences reveals additional examples of modified base pairs at structurally conserved tRNA regions and highlights the conservation patterns of these base pairs in three domains of life. Comparison of structures and binding energies of modified base pairs with their unmodified counterparts, using quantum chemical methods, allowed us to classify the base modifications in terms of the nature of their electronic structure effects on base-pairing. Analysis of specific structural contexts of modified base pairs in RNA crystal structures revealed several interesting scenarios, including those at the tRNA:rRNA interface, antibiotic-binding sites on the ribosome, and the three-way junctions within tRNA. These scenarios, when analyzed in the context of available experimental data, allowed us to correlate the occurrence and strength of modified base pairs with their specific functional roles. Overall, our study highlights the structural importance of modified base pairs in RNA and points toward the need for greater appreciation of the role of modified bases and their interactions, in the context of many biological processes involving RNA. PMID:28341704

Novel primers for complete mitochondrial cytochrome b genesequencing in mammals

USGS Publications Warehouse

Naidu, Ashwin; Fitak, Robert R.; Munguia-Vega, Adrian; Culver, Melanie

2011-01-01

Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sample are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identified a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submitted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification.
Introducing a model of pairing based on base pair specific interactions between identical DNA sequences

NASA Astrophysics Data System (ADS)

(O' Lee, Dominic J.

2018-02-01

At present, there have been suggested two types of physical mechanism that may facilitate preferential pairing between DNA molecules, with identical or similar base pair texts, without separation of base pairs. One mechanism solely relies on base pair specific patterns of helix distortion being the same on the two molecules, discussed extensively in the past. The other mechanism proposes that there are preferential interactions between base pairs of the same composition. We introduce a model, built on this second mechanism, where both thermal stretching and twisting fluctuations are included, as well as the base pair specific helix distortions. Firstly, we consider an approximation for weak pairing interactions, or short molecules. This yields a dependence of the energy on the square root of the molecular length, which could explain recent experimental data. However, analysis suggests that this approximation is no longer valid at large DNA lengths. In a second approximation, for long molecules, we define two adaptation lengths for twisting and stretching, over which the pairing interaction can limit the accumulation of helix disorder. When the pairing interaction is sufficiently strong, both adaptation lengths are finite; however, as we reduce pairing strength, the stretching adaptation length remains finite but the torsional one becomes infinite. This second state persists to arbitrarily weak values of the pairing strength; suggesting that, if the molecules are long enough, the pairing energy scales as length. To probe differences between the two pairing mechanisms, we also construct a model of similar form. However, now, pairing between identical sequences solely relies on the intrinsic helix distortion patterns. Between the two models, we see interesting qualitative differences. We discuss our findings, and suggest new work to distinguish between the two mechanisms.
Case Study Projects for College Mathematics Courses Based on a Particular Function of Two Variables

ERIC Educational Resources Information Center

Shi, Y.

2007-01-01

Based on a sequence of number pairs, a recent paper (Mauch, E. and Shi, Y., 2005, Using a sequence of number pairs as an example in teaching mathematics, "Mathematics and Computer Education," 39(3), 198-205) presented some interesting examples that can be used in teaching high school and college mathematics classes such as algebra, geometry,…
A two-dimensional 1H-NMR study of the dam methylase site: comparison between the hemimethylated GATC sequence, its unmethylated analogue and a hemimethylated CATG sequence. The sequence dependence of methylation upon base-pair lifetimes.

PubMed

Fazakerley, G V; Quignard, E; Teoule, R; Guy, A; Guschlbauer, W

1987-09-15

We report two-dimensional NOE (NOESY) spectra on the sequence d(GCGATCATGG).d(CCATGATCGC) which contains the unmethylated dam site. As expected the DNA adopts a B-form conformation but appears to be distorted at the TG step of the second strand. This distorsion, probably bending, is not seen on the opposite strand. When the first strand is methylated on adenine in the GATC or CATG sequence the NOESY spectra indicate little or no change in the conformation. However the single strand-duplex exchange is slowed down to the slow-exchange region on a proton NMR time scale. We have assigned the exchangeable imino and cytidine amino resonances of the three duplexes. From the imino linewidths as a function of temperature, we observe that the unmethylated and the hemimethylated Gm6ATC duplexes melt normally from the ends. However, this is not so for the hemimethylated Cm6ATG duplex which, apart from the terminal base pairs, melts cooperatively and at higher temperature. In spectra recorded in H2O a second duplex is observed, for the Gm6ATC sequence, which we have not been able to identify. It is however unlikely to be a hairpin structure. Ultraviolet-melting curves also indicate the presence of two transitions for this duplex. The effect of methylation upon base-pair lifetimes has been studied by comparing the above three duplexes. Little effect is observed upon methylation in the GATC sequence but a drastic increase in the lifetimes of all base pairs is observed upon methylation in the CATG sequence.
pKa shifting in double-stranded RNA is highly dependent upon nearest neighbors and bulge positioning.

PubMed

Wilcox, Jennifer L; Bevilacqua, Philip C

2013-10-22

Shifting of pKa's in RNA is important for many biological processes; however, the driving forces responsible for shifting are not well understood. Herein, we determine how structural environments surrounding protonated bases affect pKa shifting in double-stranded RNA (dsRNA). Using (31)P NMR, we determined the pKa of the adenine in an A(+)·C base pair in various sequence and structural environments. We found a significant dependence of pKa on the base pairing strength of nearest neighbors and the location of a nearby bulge. Increasing nearest neighbor base pairing strength shifted the pKa of the adenine in an A(+)·C base pair higher by an additional 1.6 pKa units, from 6.5 to 8.1, which is well above neutrality. The addition of a bulge two base pairs away from a protonated A(+)·C base pair shifted the pKa by only ~0.5 units less than a perfectly base paired hairpin; however, positioning the bulge just one base pair away from the A(+)·C base pair prohibited formation of the protonated base pair as well as several flanking base pairs. Comparison of data collected at 25 °C and 100 mM KCl to biological temperature and Mg(2+) concentration revealed only slight pKa changes, suggesting that similar sequence contexts in biological systems have the potential to be protonated at biological pH. We present a general model to aid in the determination of the roles protonated bases may play in various dsRNA-mediated processes including ADAR editing, miRNA processing, programmed ribosomal frameshifting, and general acid-base catalysis in ribozymes.
Overcoming Sequence Misalignments with Weighted Structural Superposition

PubMed Central

Khazanov, Nickolay A.; Damm-Ganamet, Kelly L.; Quang, Daniel X.; Carlson, Heather A.

2012-01-01

An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian-weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD’s robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, SSM, CE, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics-scale analysis. HwRMSD can align homologs with low sequence identity and large conformational differences, cases where both sequence-based and structural-based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence-alignment method, substitution matrix, and gap parameters for each unique pair of homologs. PMID:22733542
Sequence-dependent base pair stepping dynamics in XPD helicase unwinding

PubMed Central

Qi, Zhi; Pugh, Robert A; Spies, Maria; Chemla, Yann R

2013-01-01

Helicases couple the chemical energy of ATP hydrolysis to directional translocation along nucleic acids and transient duplex separation. Understanding helicase mechanism requires that the basic physicochemical process of base pair separation be understood. This necessitates monitoring helicase activity directly, at high spatio-temporal resolution. Using optical tweezers with single base pair (bp) resolution, we analyzed DNA unwinding by XPD helicase, a Superfamily 2 (SF2) DNA helicase involved in DNA repair and transcription initiation. We show that monomeric XPD unwinds duplex DNA in 1-bp steps, yet exhibits frequent backsteps and undergoes conformational transitions manifested in 5-bp backward and forward steps. Quantifying the sequence dependence of XPD stepping dynamics with near base pair resolution, we provide the strongest and most direct evidence thus far that forward, single-base pair stepping of a helicase utilizes the spontaneous opening of the duplex. The proposed unwinding mechanism may be a universal feature of DNA helicases that move along DNA phosphodiester backbones. DOI: http://dx.doi.org/10.7554/eLife.00334.001 PMID:23741615
Guide-substrate base-pairing requirement for box H/ACA RNA-guided RNA pseudouridylation.

PubMed

De Zoysa, Meemanage D; Wu, Guowei; Katz, Raviv; Yu, Yi-Tao

2018-06-05

Box H/ACA RNAs are a group of small RNAs found in abundance in eukaryotes (as well as in archaea). Although their sequences differ, eukaryotic box H/ACA RNAs all share the same unique hairpin-hinge-hairpin-tail structure. Almost all of them function as guides that primarily direct pseudouridylation of rRNAs and spliceosomal snRNAs at specific sites. Although box H/ACA RNA-guided pseudouridylation has been extensively studied, the detailed rules governing this reaction, especially those concerning the guide RNA-substrate RNA base-pairing interactions that determine the specificity and efficiency of pseudouridylation, are still not exactly clear. This is particularly relevant given that the lengths of the guide sequences involved in base-pairing vary from one box H/ACA RNA to another. Here, we carry out a detailed investigation into guide-substrate base-pairing interactions, and identify the minimum number of base-pairs (8), required for RNA-guided pseudouridylation. In addition, we find that the pseudouridylation pocket, present in each hairpin of box H/ACA RNA, exhibits flexibility in fitting slightly different substrate sequences. Our results are consistent across three independent pseudouridylation pockets tested, suggesting that our findings are generally applicable to box H/ACA RNA-guided RNA pseudouridylation. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Silver(I)-Mediated Base Pairs in DNA Sequences Containing 7-Deazaguanine/Cytosine: towards DNA with Entirely Metallated Watson-Crick Base Pairs.

PubMed

Méndez-Arriaga, José M; Maldonado, Carmen R; Dobado, José A; Galindo, Miguel A

2018-03-26

DNA sequences comprising noncanonical 7-deazaguanine ( 7C G) and canonical cytosine (C) are capable of forming Watson-Crick base pairs via hydrogen bonds as well as silver(I)-mediated base pairs by coordination to central silver(I) ions. Duplexes I and II containing 7C G and C have been synthesized and characterized. The incorporation of silver(I) ions into these duplexes has been studied by means of temperature-dependent UV spectroscopy, circular dichroism, and DFT calculations. The results suggest the formation of DNA molecules comprising contiguous metallated 7C G-Ag I -C Watson-Crick base pairs that preserve the original B-type conformation. Furthermore, additional studies performed on duplex III indicated that, in the presence of Ag I ions, 7C G-C and 7C A-T Watson-Crick base pairs ( 7C A, 7-deazadenine; T, thymine) can be converted to metallated 7C G-Ag I -C and 7C A-Ag I -T base pairs inside the same DNA molecule whilst maintaining its initial double helix conformation. These findings are very important for the development of customized silver-DNA nanostructures based on a Watson-Crick complementarity pattern. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Using variable rate models to identify genes under selection in sequence pairs: their validity and limitations for EST sequences.

PubMed

Church, Sheri A; Livingstone, Kevin; Lai, Zhao; Kozik, Alexander; Knapp, Steven J; Michelmore, Richard W; Rieseberg, Loren H

2007-02-01

Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes.
NMR studies on the structure and dynamics of lac operator DNA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, S.C.

Nuclear Magnetic Resonance spectroscopy was used to elucidate the relationships between structure, dynamics and function of the gene regulatory sequence corresponding to the lactose operon operator of Escherichia coli. The length of the DNA fragments examined varied from 13 to 36 base pair, containing all or part of the operator sequence. These DNA fragments are either derived genetically or synthesized chemically. Resonances of the imino protons were assigned by one dimensional inter-base pair nuclear Overhauser enhancement (NOE) measurements. Imino proton exchange rates were measured by saturation recovery methods. Results from the kinetic measurements show an interesting dynamic heterogeneity with amore » maximum opening rate centered about a GTG/CAC sequence which correlates with the biological function of the operator DNA. This particular three base pair sequence occurs frequently and often symmetrically in prokaryotic nd eukaryotic DNA sites where one anticipates specific protein interaction for gene regulation. The observed sequence dependent imino proton exchange rate may be a reflection of variation of the local structure of regulatory DNA. The results also indicate that the observed imino proton exchange rates are length dependent.« less
Complete plastid genome sequence of goosegrass (Eleusine indica) and comparison with other Poaceae.

PubMed

Zhang, Hui; Hall, Nathan; McElroy, J Scott; Lowe, Elijah K; Goertzen, Leslie R

2017-02-05

Eleusine indica, also known as goosegrass, is a serious weed in at least 42 countries. In this paper we report the complete plastid genome sequence of goosegrass obtained by de novo assembly of paired-end and mate-paired reads generated by Illumina sequencing of total genomic DNA. The goosegrass plastome is a circular molecule of 135,151bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 20,919 bases. The large (LSC) and the small (SSC) single-copy regions span 80,667 bases and 12,646 bases, respectively. The plastome of goosegrass has 38.19% GC content and includes 108 unique genes, of which 76 are protein-coding, 28 are transfer RNA, and 4 are ribosomal RNA. The goosegrass plastome sequence was compared to eight other species of Poaceae. Although generally conserved with respect to Poaceae, this genomic resource will be useful for evolutionary studies within this weed species and the genus Eleusine. Copyright © 2016. Published by Elsevier B.V.
Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

NASA Astrophysics Data System (ADS)

Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

2016-11-01

In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.
Base pairing between the 3' exon and an internal guide sequence increases 3' splice site specificity in the Tetrahymena self-splicing rRNA intron.

PubMed Central

Suh, E R; Waring, R B

1990-01-01

It has been proposed that recognition of the 3' splice site in many group I introns involves base pairing between the start of the 3' exon and a region of the intron known as the internal guide sequence (R. W. Davies, R. B. Waring, J. Ray, T. A. Brown, and C. Scazzocchio, Nature [London] 300:719-724, 1982). We have examined this hypothesis, using the self-splicing rRNA intron from Tetrahymena thermophila. Mutations in the 3' exon that weaken this proposed pairing increased use of a downstream cryptic 3' splice site. Compensatory mutations in the guide sequence that restore this pairing resulted in even stronger selection of the normal 3' splice site. These changes in 3' splice site usage were more pronounced in the background of a mutation (414A) which resulted in an adenine instead of a guanine being the last base of the intron. These results show that the proposed pairing (P10) plays an important role in ensuring that cryptic 3' splice sites are selected against. Surprisingly, the 414A mutation alone did not result in activation of the cryptic 3' splice site. Images PMID:2342465
Loblolly pine SSR markers for shortleaf pine genetics

Treesearch

C. Dana Nelson; Sedley Josserand; Craig S. Echt; Jeff Koppelman

2007-01-01

Simple sequence repeats (SSR) are highly informative DNA-based markers widely used in population genetic and linkage mapping studies. We have been developing PCR primer pairs for amplifying SSR markers for loblolly pine (Pinus taeda L.) using loblolly pine DNA and EST sequence data as starting materials. Fifty primer pairs known to reliably amplify...
Exploring the Limits of DNA Size: Naphtho-homologated DNA Bases and Pairs

PubMed Central

Lee, Alex H. F.; Kool, Eric T.

2008-01-01

A new design for DNA bases and base pairs is described in which the pyrimidine bases are widened by naphtho-homologation. Two naphtho-homologated deoxyribosides, dyyT (1) and dyyC (2) were synthesized and could be incorporated into oligonucleotides as suitably protected phosphoramidite derivatives. The deoxyribosides were found to be fluorescent, with emission maxima at 446 and 433 nm, respectively. Studies with single substitutions of 1 and 2 in the natural DNA context revealed exceptionally strong base stacking propensity for both. Sequences containing multiple substitutions of 1 and 2 paired opposite adenine and guanine were subsequently mixed and studied by several analytical methods. Data from UV mixing experiments, FRET measurements, fluorescence quenching experiments, and hybridizations on beads suggest that complementary “doublewide DNA” (yyDNA) strands may self-assemble into helical complexes with 1:1 stoichiometry. Data from thermal denaturation plots and CD spectra were less conclusive. Control experiments in one sequence context gave evidence that yyDNA helices, if formed, are preferentially antiparallel and are sequence selective. Hypothesized base pairing schemes are analogous to Watson-Crick pairing, but with glycosidic C1′-C1′ distances widened by over 45%, to ca. 15.2 Å. The possible self-assembly of the double-wide DNA helix establishes a new limit for the size of information-encoding, DNA-like molecules, and the fluorescence of yyDNA bases suggests uses as reporters in monomeric and oligomeric forms. PMID:16834396
Sequence-based evidence for major histocompatibility complex-disassortative mating in a colonial seabird.

PubMed

Juola, Frans A; Dearborn, Donald C

2012-01-07

The major histocompatibility complex (MHC) is a polymorphic gene family associated with immune defence, and it can play a role in mate choice. Under the genetic compatibility hypothesis, females choose mates that differ genetically from their own MHC genotypes, avoiding inbreeding and/or enhancing the immunocompetence of their offspring. We tested this hypothesis of disassortative mating based on MHC genotypes in a population of great frigatebirds (Fregata minor) by sequencing the second exon of MHC class II B. Extensive haploid cloning yielded two to four alleles per individual, suggesting the amplification of two genes. MHC similarity between mates was not significantly different between pairs that did (n = 4) or did not (n = 42) exhibit extra-pair paternity. Comparing all 46 mated pairs to a distribution based on randomized re-pairings, we observed the following (i): no evidence for mate choice based on maximal or intermediate levels of MHC allele sharing (ii), significantly disassortative mating based on similarity of MHC amino acid sequences, and (iii) no evidence for mate choice based on microsatellite alleles, as measured by either allele sharing or similarity in allele size. This suggests that females choose mates that differ genetically from themselves at MHC loci, but not as an inbreeding-avoidance mechanism.
Charge transport through DNA based electronic barriers

NASA Astrophysics Data System (ADS)

Patil, Sunil R.; Chawda, Vivek; Qi, Jianqing; Anantram, M. P.; Sinha, Niraj

2018-05-01

We report charge transport in electronic 'barriers' constructed by sequence engineering in DNA. Considering the ionization potentials of Thymine-Adenine (AT) and Guanine-Cytosine (GC) base pairs, we treat AT as 'barriers'. The effect of DNA conformation (A and B form) on charge transport is also investigated. Particularly, the effect of width of 'barriers' on hole transport is investigated. Density functional theory (DFT) calculations are performed on energy minimized DNA structures to obtain the electronic Hamiltonian. The quantum transport calculations are performed using the Landauer-Buttiker framework. Our main findings are contrary to previous studies. We find that a longer A-DNA with more AT base pairs can conduct better than shorter A-DNA with a smaller number of AT base pairs. We also find that some sequences of A-DNA can conduct better than a corresponding B-DNA with the same sequence. The counterions mediated charge transport and long range interactions are speculated to be responsible for counter-intuitive length and AT content dependence of conductance of A-DNA.
Contacts between the factor TUF and RPG sequences.

PubMed

Vignais, M L; Huet, J; Buhler, J M; Sentenac, A

1990-08-25

The yeast TUF factor binds specifically to RPG-like sequences involved in multiple functions at enhancers, silencers, and telomeres. We have characterized the interaction of TUF with its optimal binding sequence, rpg-1 (1-ACACCCATACATTT-14), using a gel DNA-binding assay in combination with methylation protection and mutagenesis experiments. As many as 10 base pairs appear to be engaged in factor binding. Analysis of a collection of 30 different RPG mutants demonstrated the importance of 8 base pairs at position 2, 3, 4, 5, 6, 7, 10, and 12 and the critical role of the central GC pair at position 5. Methylation protection data on four different natural sites confirmed a close contact at positions 4, 5, 6, and 10 and suggested additional contacts at base pairs 8, 12, and 13. The derived consensus sequence was RCAAYCCRYNCAYY. A quantitative band shift analysis was used to determine the equilibrium dissociation constant for the complex of TUF and its optimal binding site rpg-1. The specific dissociation constant (K8) was found to be 1.3 x 10(-11) M. The comparison of the K8 value with the dissociation constant obtained for nonspecific DNA sites (Kn8 = 8.7 x 10(-6) M) shows the high binding selectivity of TUF for its specific RPG target.
Molecular basis of length polymorphism in the human zeta-globin gene complex.

PubMed Central

Goodbourn, S E; Higgs, D R; Clegg, J B; Weatherall, D J

1983-01-01

The length polymorphism between the human zeta-globin gene and its pseudogene is caused by an allele-specific variation in the copy number of a tandemly repeating 36-base-pair sequence. This sequence is related to a tandemly repeated 14-base-pair sequence in the 5' flanking region of the human insulin gene, which is known to cause length polymorphism, and to a repetitive sequence in intervening sequence (IVS) 1 of the pseudo-zeta-globin gene. Evidence is presented that the latter is also of variable length, probably because of differences in the copy number of the tandem repeat. The homology between the three length polymorphisms may be an indication of the presence of a more widespread group of related sequences in the human genome, which might be useful for generalized linkage studies. PMID:6308667

An algebraic hypothesis about the primeval genetic code architecture.

PubMed

Sánchez, Robersy; Grau, Ricardo

2009-09-01

A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
Retention of nucleic acids in ion-pair reversed-phase high-performance liquid chromatography depends not only on base composition but also on base sequence.

PubMed

Qiao, Jun-Qin; Liang, Chao; Wei, Lan-Chun; Cao, Zhao-Ming; Lian, Hong-Zhen

2016-12-01

The study on nucleic acid retention in ion-pair reversed-phase high-performance liquid chromatography mainly focuses on size-dependence, however, other factors influencing retention behaviors have not been comprehensively clarified up to date. In this present work, the retention behaviors of oligonucleotides and double-stranded DNAs were investigated on silica-based C 18 stationary phase by ion-pair reversed-phase high-performance liquid chromatography. It is found that the retention of oligonucleotides was influenced by base composition and base sequence as well as size, and oligonucleotides prone to self-dimerization have weaker retention than those not prone to self-dimerization but with the same base composition. However, homo-oligonucleotides are suitable for the size-dependent separation as a special case of oligonucleotides. For double-stranded DNAs, the retention is also influenced by base composition and base sequence, as well as size. This may be attributed to the interaction of exposed bases in major or minor grooves with the hydrophobic alky chains of stationary phase. In addition, no specific influence of guanine and cytosine content was confirmed on retention of double-stranded DNAs. Notably, the space effect resulted from the stereostructure of nucleic acids also influences the retention behavior in ion-pair reversed-phase high-performance liquid chromatography. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A rapid, generally applicable method to engineer zinc fingers illustrated by targeting the HIV-1 promoter.

PubMed

Isalan, M; Klug, A; Choo, Y

2001-07-01

DNA-binding domains with predetermined sequence specificity are engineered by selection of zinc finger modules using phage display, allowing the construction of customized transcription factors. Despite remarkable progress in this field, the available protein-engineering methods are deficient in many respects, thus hampering the applicability of the technique. Here we present a rapid and convenient method that can be used to design zinc finger proteins against a variety of DNA-binding sites. This is based on a pair of pre-made zinc finger phage-display libraries, which are used in parallel to select two DNA-binding domains each of which recognizes given 5 base pair sequences, and whose products are recombined to produce a single protein that recognizes a composite (9 base pair) site of predefined sequence. Engineering using this system can be completed in less than two weeks and yields proteins that bind sequence-specifically to DNA with Kd values in the nanomolar range. To illustrate the technique, we have selected seven different proteins to bind various regions of the human immunodeficiency virus 1 (HIV-1) promoter.
Accelerating calculations of RNA secondary structure partition functions using GPUs

PubMed Central

2013-01-01

Background RNA performs many diverse functions in the cell in addition to its role as a messenger of genetic information. These functions depend on its ability to fold to a unique three-dimensional structure determined by the sequence. The conformation of RNA is in part determined by its secondary structure, or the particular set of contacts between pairs of complementary bases. Prediction of the secondary structure of RNA from its sequence is therefore of great interest, but can be computationally expensive. In this work we accelerate computations of base-pair probababilities using parallel graphics processing units (GPUs). Results Calculation of the probabilities of base pairs in RNA secondary structures using nearest-neighbor standard free energy change parameters has been implemented using CUDA to run on hardware with multiprocessor GPUs. A modified set of recursions was introduced, which reduces memory usage by about 25%. GPUs are fastest in single precision, and for some hardware, restricted to single precision. This may introduce significant roundoff error. However, deviations in base-pair probabilities calculated using single precision were found to be negligible compared to those resulting from shifting the nearest-neighbor parameters by a random amount of magnitude similar to their experimental uncertainties. For large sequences running on our particular hardware, the GPU implementation reduces execution time by a factor of close to 60 compared with an optimized serial implementation, and by a factor of 116 compared with the original code. Conclusions Using GPUs can greatly accelerate computation of RNA secondary structure partition functions, allowing calculation of base-pair probabilities for large sequences in a reasonable amount of time, with a negligible compromise in accuracy due to working in single precision. The source code is integrated into the RNAstructure software package and available for download at http://rna.urmc.rochester.edu. PMID:24180434
Cloning and characterization of an abalone (Haliotis discus hannai) actin gene

NASA Astrophysics Data System (ADS)

Ma, Hongming; Xu, Wei; Mai, Kangsen; Liufu, Zhiguo; Chen, Hong

2004-10-01

An actin encoding gene was cloned by using RT-PCR, 3‧ RACE and 5‧ RACE from abalone Haliotis discus hannai. The full length of the gene is 1532 base pairs, which contains a long 3‧ untranslated region of 307 base pairs and 79 base pairs of 5‧ untranslated sequence. The open reading frame encodes 376 amino acid residues. Sequence comparison with those of human and other mollusks showed high conservation among species at amino acid level. The identities was 96%, 97% and 96% respectively compared with Aplysia californica, Biomphalaria glabrata and Homo sapience β-actin. It is also indicated that this actin is more similar to the human cytoplasmic actin (β-actin) than to human muscle actin.
2-Methoxypyridine as a Thymidine Mimic in Watson-Crick Base Pairs of DNA and PNA: Synthesis, Thermal Stability, and NMR Structural Studies.

PubMed

Novosjolova, Irina; Kennedy, Scott D; Rozners, Eriks

2017-11-02

The development of nucleic acid base-pair analogues that use new modes of molecular recognition is important both for fundamental research and practical applications. The goal of this study was to evaluate 2-methoxypyridine as a cationic thymidine mimic in the A-T base pair. The hypothesis was that including protonation in the Watson-Crick base pairing scheme would enhance the thermal stability of the DNA double helix without compromising the sequence selectivity. DNA and peptide nucleic acid (PNA) sequences containing the new 2-methoxypyridine nucleobase (P) were synthesized and studied by using UV thermal melting and NMR spectroscopy. Introduction of P nucleobase caused a loss of thermal stability of ≈10 °C in DNA-DNA duplexes and ≈20 °C in PNA-DNA duplexes over a range of mildly acidic to neutral pH. Despite the decrease in thermal stability, the NMR structural studies showed that P-A formed the expected protonated base pair at pH 4.3. Our study demonstrates the feasibility of cationic unnatural base pairs; however, future optimization of such analogues will be required. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Analyzing ion distributions around DNA: sequence-dependence of potassium ion distributions from microsecond molecular dynamics

PubMed Central

Pasi, Marco; Maddocks, John H.; Lavery, Richard

2015-01-01

Microsecond molecular dynamics simulations of B-DNA oligomers carried out in an aqueous environment with a physiological salt concentration enable us to perform a detailed analysis of how potassium ions interact with the double helix. The oligomers studied contain all 136 distinct tetranucleotides and we are thus able to make a comprehensive analysis of base sequence effects. Using a recently developed curvilinear helicoidal coordinate method we are able to analyze the details of ion populations and densities within the major and minor grooves and in the space surrounding DNA. The results show higher ion populations than have typically been observed in earlier studies and sequence effects that go beyond the nature of individual base pairs or base pair steps. We also show that, in some special cases, ion distributions converge very slowly and, on a microsecond timescale, do not reflect the symmetry of the corresponding base sequence. PMID:25662221
Applications of a Sequence of Points in Teaching Linear Algebra, Numerical Methods and Discrete Mathematics

ERIC Educational Resources Information Center

Shi, Yixun

2009-01-01

Based on a sequence of points and a particular linear transformation generalized from this sequence, two recent papers (E. Mauch and Y. Shi, "Using a sequence of number pairs as an example in teaching mathematics". Math. Comput. Educ., 39 (2005), pp. 198-205; Y. Shi, "Case study projects for college mathematics courses based on a particular…
Genetic diversity of the human head lice, Pediculus humanus capitis, among primary school girls in Saudi Arabia, with reference to their prevalence.

PubMed

Al-Shahrani, Sarah A; Alajmi, Reem A; Ayaad, Tahany H; Al-Shahrani, Mohammed A; Shaurub, El-Sayed H

2017-10-01

The present work aimed at investigating the genetic diversity of the head louse Pediculus humanus capitis (P. humanus capitis) among infested primary school girls at Bisha governorate, Saudi Arabia, based on the sequence of mitochondrial cytochrome b (mt cyt b) gene of 121 P. humanus capitis adults. Additionally, the prevalence of pediculosis capitis was surveyed. The results of sequencing were compared with the sequence of human head lice that are genotyped previously. Phylogenetic tree analysis showed the presence of 100% identity (n = 26) of louse specimens with clade A (prevalent worldwide) of the GenBank data base. Louse individuals (n = 50) showed 99.8% similarity with the same clade A reference having a single base pair difference. Also, a number of 22 louse individuals revealed 99.8% identity with clade B reference (prevalent in North and Central Americas, Europe, and Australia) with individual diversity in two base pairs. Moreover, 14 louse individual sequences revealed 99.4% identity with three base pair differences. It was concluded that moderate pediculosis (~13%) prevailed among the female students of the primary schools. It was age-and hair texture (straight or curly)-dependent. P. humanus capitis prevalence diversity is of clades A and B genotyping.
Poincaré recurrences of DNA sequences

NASA Astrophysics Data System (ADS)

Frahm, K. M.; Shepelyansky, D. L.

2012-01-01

We analyze the statistical properties of Poincaré recurrences of Homo sapiens, mammalian, and other DNA sequences taken from the Ensembl Genome data base with up to 15 billion base pairs. We show that the probability of Poincaré recurrences decays in an algebraic way with the Poincaré exponent β≈4 even if the oscillatory dependence is well pronounced. The correlations between recurrences decay with an exponent ν≈0.6 that leads to an anomalous superdiffusive walk. However, for Homo sapiens sequences, with the largest available statistics, the diffusion coefficient converges to a finite value on distances larger than one million base pairs. We argue that the approach based on Poncaré recurrences determines new proximity features between different species and sheds a new light on their evolution history.
Tobacco chloroplast tRNALys(UUU) gene contains a 2.5-kilobase-pair intron: An open reading frame and a conserved boundary sequence in the intron

PubMed Central

Sugita, Mamoru; Shinozaki, Kazuo; Sugiura, Masahiro

1985-01-01

The nucleotide sequence of a tRNALys(UUU) gene on tobacco (Nicotiana tabacum) chloroplast DNA has been determined. This gene is located 215 base pairs upstream from the gene for the 32,000-dalton thylakoid membrane protein on the same DNA strand and has a 2526-base-pair intron in the anticodon loop. The intron boundary sequence does not follow the G-U/A-G rule but is similar to those of tobacco chloroplast split genes for tRNAGly(UCC) and ribosomal proteins L2 and S12. The intron contains one major open reading frame of 509 codons. The codon usage in the open reading frame resembles those observed in the genes for tobacco chloroplast proteins so far analyzed. The primary transcript of this tRNA gene is 2.7 kilobases long. Images PMID:16593561
Tobacco chloroplast tRNA(UUU) gene contains a 2.5-kilobase-pair intron: An open reading frame and a conserved boundary sequence in the intron.

PubMed

Sugita, M; Shinozaki, K; Sugiura, M

1985-06-01

The nucleotide sequence of a tRNA(Lys)(UUU) gene on tobacco (Nicotiana tabacum) chloroplast DNA has been determined. This gene is located 215 base pairs upstream from the gene for the 32,000-dalton thylakoid membrane protein on the same DNA strand and has a 2526-base-pair intron in the anticodon loop. The intron boundary sequence does not follow the G-U/A-G rule but is similar to those of tobacco chloroplast split genes for tRNA(Gly)(UCC) and ribosomal proteins L2 and S12. The intron contains one major open reading frame of 509 codons. The codon usage in the open reading frame resembles those observed in the genes for tobacco chloroplast proteins so far analyzed. The primary transcript of this tRNA gene is 2.7 kilobases long.
Twin hydroxymethyluracil-A base pair steps define the binding site for the DNA-binding protein TF1.

PubMed

Grove, A; Figueiredo, M L; Galeone, A; Mayol, L; Geiduschek, E P

1997-05-16

The DNA-bending protein TF1 is the Bacillus subtilis bacteriophage SPO1-encoded homolog of the bacterial HU proteins and the Escherichia coli integration host factor. We recently proposed that TF1, which binds with high affinity (Kd was approximately 3 nM) to preferred sites within the hydroxymethyluracil (hmU)-containing phage genome, identifies its binding sites based on sequence-dependent DNA flexibility. Here, we show that two hmU-A base pair steps coinciding with two previously proposed sites of DNA distortion are critical for complex formation. The affinity of TF1 is reduced 10-fold when both of these hmU-A base pair steps are replaced with A-hmU, G-C, or C-G steps; only modest changes in affinity result when substitutions are made at other base pairs of the TF1 binding site. Replacement of all hmU residues with thymine decreases the affinity of TF1 greatly; remarkably, the high affinity is restored when the two hmU-A base pair steps corresponding to previously suggested sites of distortion are reintroduced into otherwise T-containing DNA. T-DNA constructs with 3-base bulges spaced apart by 9 base pairs of duplex also generate nM affinity of TF1. We suggest that twin hmU-A base pair steps located at the proposed sites of distortion are key to target site selection by TF1 and that recognition is based largely, if not entirely, on sequence-dependent DNA flexibility.
Free energy landscape and transition pathways from Watson–Crick to Hoogsteen base pairing in free duplex DNA

PubMed Central

Yang, Changwon; Kim, Eunae; Pak, Youngshang

2015-01-01

Houghton (HG) base pairing plays a central role in the DNA binding of proteins and small ligands. Probing detailed transition mechanism from Watson–Crick (WC) to HG base pair (bp) formation in duplex DNAs is of fundamental importance in terms of revealing intrinsic functions of double helical DNAs beyond their sequence determined functions. We investigated a free energy landscape of a free B-DNA with an adenosine–thymine (A–T) rich sequence to probe its conformational transition pathways from WC to HG base pairing. The free energy landscape was computed with a state-of-art two-dimensional umbrella molecular dynamics simulation at the all-atom level. The present simulation showed that in an isolated duplex DNA, the spontaneous transition from WC to HG bp takes place via multiple pathways. Notably, base flipping into the major and minor grooves was found to play an important role in forming these multiple transition pathways. This finding suggests that naked B-DNA under normal conditions has an inherent ability to form HG bps via spontaneous base opening events. PMID:26250116
Community detection in sequence similarity networks based on attribute clustering

DOE PAGES

Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

2017-07-24

Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Community detection in sequence similarity networks based on attribute clustering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Study of base pair mutations in proline-rich homeodomain (PRH)-DNA complexes using molecular dynamics.

PubMed

Jalili, Seifollah; Karami, Leila; Schofield, Jeremy

2013-06-01

Proline-rich homeodomain (PRH) is a regulatory protein controlling transcription and gene expression processes by binding to the specific sequence of DNA, especially to the sequence 5'-TAATNN-3'. The impact of base pair mutations on the binding between the PRH protein and DNA is investigated using molecular dynamics and free energy simulations to identify DNA sequences that form stable complexes with PRH. Three 20-ns molecular dynamics simulations (PRH-TAATTG, PRH-TAATTA and PRH-TAATGG complexes) in explicit solvent water were performed to investigate three complexes structurally. Structural analysis shows that the native TAATTG sequence forms a complex that is more stable than complexes with base pair mutations. It is also observed that upon mutation, the number and occupancy of the direct and water-mediated hydrogen bonds decrease. Free energy calculations performed with the thermodynamic integration method predict relative binding free energies of 0.64 and 2 kcal/mol for GC to AT and TA to GC mutations, respectively, suggesting that among the three DNA sequences, the PRH-TAATTG complex is more stable than the two mutated complexes. In addition, it is demonstrated that the stability of the PRH-TAATTA complex is greater than that of the PRH-TAATGG complex.
Determinants of Base-Pair Substitution Patterns Revealed by Whole-Genome Sequencing of DNA Mismatch Repair Defective Escherichia coli.

PubMed

Foster, Patricia L; Niccum, Brittany A; Popodi, Ellen; Townes, Jesse P; Lee, Heewook; MohammedIsmail, Wazim; Tang, Haixu

2018-06-15

Mismatch repair (MMR) is a major contributor to replication fidelity, but its impact varies with sequence context and the nature of the mismatch. Mutation accumulation experiments followed by whole-genome sequencing of MMR-defective E. coli strains yielded ≈30,000 base-pair substitutions, revealing mutational patterns across the entire chromosome. The base-pair substitution spectrum was dominated by A:T > G:C transitions, which occurred predominantly at the center base of 5'N A C3'+5'G T N3' triplets. Surprisingly, growth on minimal medium or at low temperature attenuated these mutations. Mononucleotide runs were also hotspots for base-pair substitutions, and the rate at which these occurred increased with run length. Comparison with ≈2000 base-pair substitutions accumulated in MMR-proficient strains revealed that both kinds of hotspots appeared in the wild-type spectrum and so are likely to be sites of frequent replication errors. In MMR-defective strains transitions were strand biased, occurring twice as often when A and C rather than T and G were on the lagging-strand template. Loss of nucleotide diphosphate kinase increases the cellular concentration of dCTP, which resulted in increased rates of mutations due to misinsertion of C opposite A and T. In an mmr ndk double mutant strain, these mutations were more frequent when the template A and T were on the leading strand, suggesting that lagging-strand synthesis was more error-prone or less well corrected by proofreading than was leading strand synthesis. Copyright © 2018, Genetics.
Statistical and linguistic features of DNA sequences

NASA Technical Reports Server (NTRS)

Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
Recognition of T·G mismatched base pairs in DNA by stacked imidazole-containing polyamides: surface plasmon resonance and circular dichroism studies

PubMed Central

Lacy, Eilyn R.; Cox, Kari K.; Wilson, W. David; Lee, Moses

2002-01-01

An imidazole-containing polyamide trimer, f-ImImIm, where f is a formamido group, was recently found using NMR methods to recognize T·G mismatched base pairs. In order to characterize in detail the T·G recognition affinity and specificity of imidazole-containing polyamides, f-ImIm, f-ImImIm and f-PyImIm were synthesized. The kinetics and thermodynamics for the polyamides binding to Watson–Crick and mismatched (containing one or two T·G, A·G or G·G mismatched base pairs) hairpin oligonucleotides were determined by surface plasmon resonance and circular dichroism (CD) methods. f-ImImIm binds significantly more strongly to the T·G mismatch-containing oligonucleotides than to the sequences with other mismatched or with Watson–Crick base pairs. Compared with the Watson–Crick CCGG sequence, f-ImImIm associates more slowly with DNAs containing T·G mismatches in place of one or two C·G base pairs and, more importantly, the dissociation rate from the T·G oligonucleotides is very slow (small kd). These results clearly demonstrate the binding selectivity and enhanced affinity of side-by-side imidazole/imidazole pairings for T·G mismatches and show that the affinity and specificity increase arise from much lower kd values with the T·G mismatched duplexes. CD titration studies of f-ImImIm complexes with T·G mismatched sequences produce strong induced bands at ∼330 nm with clear isodichroic points, in support of a single minor groove complex. CD DNA bands suggest that the complexes remain in the B conformation. PMID:11937638

A rule of seven in Watson-Crick base-pairing of mismatched sequences.

PubMed

Cisse, Ibrahim I; Kim, Hajin; Ha, Taekjip

2012-05-13

Sequence recognition through base-pairing is essential for DNA repair and gene regulation, but the basic rules governing this process remain elusive. In particular, the kinetics of annealing between two imperfectly matched strands is not well characterized, despite its potential importance in nucleic acid-based biotechnologies and gene silencing. Here we use single-molecule fluorescence to visualize the multiple annealing and melting reactions of two untethered strands inside a porous vesicle, allowing us to precisely quantify the annealing and melting rates. The data as a function of mismatch position suggest that seven contiguous base pairs are needed for rapid annealing of DNA and RNA. This phenomenological rule of seven may underlie the requirement for seven nucleotides of complementarity to seed gene silencing by small noncoding RNA and may help guide performance improvement in DNA- and RNA-based bio- and nanotechnologies, in which off-target effects can be detrimental.
Understanding the differences between genome sequences of Escherichia coli B strains REL606 and BL21(DE3), and comparison of the closely related E. coli B and K-12 genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Studier, F.W.; Daegelen, P.; Lenski, R. E.

2009-12-01

Each difference between the genome sequences of Escherichia coli B strains REL606 and BL21(DE3) can be interpreted in light of known laboratory manipulations plus a gene conversion between ribosomal RNA operons. Two treatments with 1-methyl-3-nitro-1-nitrosoguanidine in the REL606 lineage produced at least 93 single-base-pair mutations ({approx} 90% GC-to-AT transitions) and 3 single-base-pair GC deletions. Two UV treatments in the BL21(DE3) lineage produced only 4 single-base-pair mutations but 16 large deletions. P1 transductions from K-12 into the two B lineages produced 317 single-base-pair differences and 9 insertions or deletions, reflecting differences between B DNA in BL21(DE3) and integrated restriction fragments ofmore » K-12 DNA inherited by REL606. Two sites showed selective enrichment of spontaneous mutations. No unselected spontaneous single-base-pair mutations were evident. The genome sequences revealed that a progenitor of REL606 had been misidentified, explaining initially perplexing differences. Limited sequencing of other B strains defined characteristic properties of B and allowed assembly of the inferred genome of the ancestral B of Delbrueck and Luria. Comparison of the B and K-12 genomes shows that more than half of the 3793 proteins of their basic genomes are predicted to be identical, although {approx} 310 appear to be functional in either B or K-12 but not in both. The ancestral basic genome appears to have had {approx} 4039 coding sequences occupying {approx} 4.0 Mbp. Repeated horizontal transfer from diverged Escherichia coli genomes and homologous recombination may explain the observed variable distribution of single-base-pair differences. Fifteen sites are occupied by phage-related elements, but only six by comparable elements at the same site. More than 50 sites are occupied by IS elements in both B and K, 16 in common, and likely founding IS elements are identified. A signature of widespread cryptic phage P4-type mobile elements was identified. Complex deletions (dense clusters of small deletions and substitutions) apparently removed nonessential genes from {approx} 30 sites in the basic genomes.« less
Use of low-coverage, large-insert, short-read data for rapid and accurate generation of enhanced-quality draft Pseudomonas genome sequences.

PubMed

O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S

2011-01-01

Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
A gp41-based heteroduplex mobility assay provides rapid and accurate assessment of intrasubtype epidemiological linkage in HIV type 1 heterosexual transmission Pairs.

PubMed

Manigart, Olivier; Boeras, Debrah I; Karita, Etienne; Hawkins, Paulina A; Vwalika, Cheswa; Makombe, Nathan; Mulenga, Joseph; Derdeyn, Cynthia A; Allen, Susan; Hunter, Eric

2012-12-01

A critical step in HIV-1 transmission studies is the rapid and accurate identification of epidemiologically linked transmission pairs. To date, this has been accomplished by comparison of polymerase chain reaction (PCR)-amplified nucleotide sequences from potential transmission pairs, which can be cost-prohibitive for use in resource-limited settings. Here we describe a rapid, cost-effective approach to determine transmission linkage based on the heteroduplex mobility assay (HMA), and validate this approach by comparison to nucleotide sequencing. A total of 102 HIV-1-infected Zambian and Rwandan couples, with known linkage, were analyzed by gp41-HMA. A 400-base pair fragment within the envelope gp41 region of the HIV proviral genome was PCR amplified and HMA was applied to both partners' amplicons separately (autologous) and as a mixture (heterologous). If the diversity between gp41 sequences was low (<5%), a homoduplex was observed upon gel electrophoresis and the transmission was characterized as having occurred between partners (linked). If a new heteroduplex formed, within the heterologous migration, the transmission was determined to be unlinked. Initial blind validation of gp-41 HMA demonstrated 90% concordance between HMA and sequencing with 100% concordance in the case of linked transmissions. Following validation, 25 newly infected partners in Kigali and 12 in Lusaka were evaluated prospectively using both HMA and nucleotide sequences. Concordant results were obtained in all but one case (97.3%). The gp41-HMA technique is a reliable and feasible tool to detect linked transmissions in the field. All identified unlinked results should be confirmed by sequence analyses.
Early diagnosis of a Mexican variant of Papaya meleira virus (PMeV-Mx) by RT-PCR.

PubMed

Zamudio-Moreno, E; Ramirez-Prado, J H; Moreno-Valenzuela, O A; Lopez-Ochoa, L A

2015-02-06

Papaya meleira disease was identified in Brazil in the 1980s. The disease is caused by a double-stranded RNA virus known as Papaya meleira virus (PMeV), which has also been recently reported in Mexico. However, previously reported PMeV primers failed to diagnose the Mexican form of the disease. A genomic approach was used to identify sequences of the Mexican virus isolate, referred here to as PMeV-Mx, to develop a diagnostic method. A mini cDNA library was generated using total RNA from the latex of fruits; this RNA was also sequenced using the Illumina platform. Sequences corresponding to the previously reported 669-base pair sequence for PMeV from Brazil (PMeV-Br) were identified within the PMeV-Mx genome, exhibiting 79-92% identity with PMeV-Br. In addition, a new sequence of 1154-base pairs encoding a putative RNA-dependent RNA polymerase was identified in PMeV-Mx. Primers designed against this sequence detected both virus isolates, 2 amplicons of 173 and 491 base pairs from PMeV-Br and PMeV-Mx, and shared 100 and 98% identity, respectively. PMeV-Mx was found in the latex of fruits, in seedlings, and in the leaves, flowers, petioles, and seeds of mature plants. PMeV-Mx was more abundant in the latex of fruits than in the leaves. The limit of detection of the CB38/CB39 primer pair was 1 fg and 1 pg using total RNA extracted from the latex of fruits and from seedlings, respectively. A sensitive and early diagnosis protocol was developed; this method will enable the certification of seeds and seedlings prior to transplantation to the field.
ComplexContact: a web server for inter-protein contact prediction using deep learning.

PubMed

Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo

2018-05-22

ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.
Energy hyperspace for stacking interaction in AU/AU dinucleotide step: Dispersion-corrected density functional theory study.

PubMed

Mukherjee, Sanchita; Kailasam, Senthilkumar; Bansal, Manju; Bhattacharyya, Dhananjay

2014-01-01

Double helical structures of DNA and RNA are mostly determined by base pair stacking interactions, which give them the base sequence-directed features, such as small roll values for the purine-pyrimidine steps. Earlier attempts to characterize stacking interactions were mostly restricted to calculations on fiber diffraction geometries or optimized structure using ab initio calculations lacking variation in geometry to comment on rather unusual large roll values observed in AU/AU base pair step in crystal structures of RNA double helices. We have generated stacking energy hyperspace by modeling geometries with variations along the important degrees of freedom, roll, and slide, which were chosen via statistical analysis as maximally sequence dependent. Corresponding energy contours were constructed by several quantum chemical methods including dispersion corrections. This analysis established the most suitable methods for stacked base pair systems despite the limitation imparted by number of atom in a base pair step to employ very high level of theory. All the methods predict negative roll value and near-zero slide to be most favorable for the purine-pyrimidine steps, in agreement with Calladine's steric clash based rule. Successive base pairs in RNA are always linked by sugar-phosphate backbone with C3'-endo sugars and this demands C1'-C1' distance of about 5.4 Å along the chains. Consideration of an energy penalty term for deviation of C1'-C1' distance from the mean value, to the recent DFT-D functionals, specifically ωB97X-D appears to predict reliable energy contour for AU/AU step. Such distance-based penalty improves energy contours for the other purine-pyrimidine sequences also. © 2013 Wiley Periodicals, Inc. Biopolymers 101: 107-120, 2014. Copyright © 2013 Wiley Periodicals, Inc.
Mismatch and G-Stack Modulated Probe Signals on SNP Microarrays

PubMed Central

Binder, Hans; Fasold, Mario; Glomb, Torsten

2009-01-01

Background Single nucleotide polymorphism (SNP) arrays are important tools widely used for genotyping and copy number estimation. This technology utilizes the specific affinity of fragmented DNA for binding to surface-attached oligonucleotide DNA probes. We analyze the variability of the probe signals of Affymetrix GeneChip SNP arrays as a function of the probe sequence to identify relevant sequence motifs which potentially cause systematic biases of genotyping and copy number estimates. Methodology/Principal Findings The probe design of GeneChip SNP arrays enables us to disentangle different sources of intensity modulations such as the number of mismatches per duplex, matched and mismatched base pairings including nearest and next-nearest neighbors and their position along the probe sequence. The effect of probe sequence was estimated in terms of triple-motifs with central matches and mismatches which include all 256 combinations of possible base pairings. The probe/target interactions on the chip can be decomposed into nearest neighbor contributions which correlate well with free energy terms of DNA/DNA-interactions in solution. The effect of mismatches is about twice as large as that of canonical pairings. Runs of guanines (G) and the particular type of mismatched pairings formed in cross-allelic probe/target duplexes constitute sources of systematic biases of the probe signals with consequences for genotyping and copy number estimates. The poly-G effect seems to be related to the crowded arrangement of probes which facilitates complex formation of neighboring probes with at minimum three adjacent G's in their sequence. Conclusions The applied method of “triple-averaging” represents a model-free approach to estimate the mean intensity contributions of different sequence motifs which can be applied in calibration algorithms to correct signal values for sequence effects. Rules for appropriate sequence corrections are suggested. PMID:19924253
Statistical Evaluation of the Rodin–Ohno Hypothesis: Sense/Antisense Coding of Ancestral Class I and II Aminoacyl-tRNA Synthetases

PubMed Central

Chandrasekaran, Srinivas Niranj; Yardimci, Galip Gürkan; Erdogan, Ozgün; Roach, Jeffrey; Carter, Charles W.

2013-01-01

We tested the idea that ancestral class I and II aminoacyl-tRNA synthetases arose on opposite strands of the same gene. We assembled excerpted 94-residue Urgenes for class I tryptophanyl-tRNA synthetase (TrpRS) and class II Histidyl-tRNA synthetase (HisRS) from a diverse group of species, by identifying and catenating three blocks coding for secondary structures that position the most highly conserved, active-site residues. The codon middle-base pairing frequency was 0.35 ± 0.0002 in all-by-all sense/antisense alignments for 211 TrpRS and 207 HisRS sequences, compared with frequencies between 0.22 ± 0.0009 and 0.27 ± 0.0005 for eight different representations of the null hypothesis. Clustering algorithms demonstrate further that profiles of middle-base pairing in the synthetase antisense alignments are correlated along the sequences from one species-pair to another, whereas this is not the case for similar operations on sets representing the null hypothesis. Most probable reconstructed sequences for ancestral nodes of maximum likelihood trees show that middle-base pairing frequency increases to approximately 0.42 ± 0.002 as bacterial trees approach their roots; ancestral nodes from trees including archaeal sequences show a less pronounced increase. Thus, contemporary and reconstructed sequences all validate important bioinformatic predictions based on descent from opposite strands of the same ancestral gene. They further provide novel evidence for the hypothesis that bacteria lie closer than archaea to the origin of translation. Moreover, the inverse polarity of genetic coding, together with a priori α-helix propensities suggest that in-frame coding on opposite strands leads to similar secondary structures with opposite polarity, as observed in TrpRS and HisRS crystal structures. PMID:23576570
The cyc1-11 mutation in yeast reverts by recombination with a nonallelic gene: composite genes determining the iso-cytochromes c.

PubMed Central

Ernst, J F; Stewart, J W; Sherman, F

1981-01-01

DNA sequence analysis of a cloned fragment directly established that the cyc1-11 mutation of iso-1-cytochrome c in the yeast Saccharomyces cerevisiae is a two-base-pair substitution that changes the CCA proline codon at amino acid position 76 to a UAA nonsense codon. Analysis of 11 revertant proteins and one cloned revertant gene showed that reversion of the cyc1-11 mutation can occur in three ways: a single base-pair substitution, which produces a serine replacement at position 76; recombination with the nonallelic CYC7 gene of iso-2-cytochrome c, which causes replacement of a segment in the cyc1-11 gene by the corresponding segment of the CYC7 gene; and either a two-base-pair substitution or recombination with the CYC7 gene, which causes the formation of the normal iso-1-cytochrome c sequence. These results demonstrate the occurrence of low frequencies of recombination between nonallelic genes having extensive but not complete homology. The formation of composite genes that share sequences from nonallelic genes may be an evolutionary mechanism for producing protein diversities and for maintaining identical sequences at different loci. Images PMID:6273865
Elliptic net and its cryptographic application

NASA Astrophysics Data System (ADS)

Muslim, Norliana; Said, Mohamad Rushdan Md

2017-11-01

Elliptic net is a generalization of elliptic divisibility sequence and in cryptography field, most cryptographic pairings that are based on elliptic curve such as Tate pairing can be improved by applying elliptic nets algorithm. The elliptic net is constructed by using n dimensional array of values in rational number satisfying nonlinear recurrence relations that arise from elliptic divisibility sequences. The two main properties hold in the recurrence relations are for all positive integers m>n, hm +nhm -n=hm +1hm -1hn2-hn +1hn -1hm2 and hn divides hm whenever n divides m. In this research, we discuss elliptic divisibility sequence associated with elliptic nets based on cryptographic perspective and its possible research direction.
Crystal structure and sequence-dependent conformation of the A.G mispaired oligonucleotide d(CGCAAGCTGGCG).

PubMed Central

Webster, G D; Sanderson, M R; Skelly, J V; Neidle, S; Swann, P F; Li, B F; Tickle, I J

1990-01-01

The crystal structure of the dodecanucleotide d(CGCAAGCTGGCG) has been determined to a resolution of 2.5 A and refined to an R factor of 19.3% for 1710 reflections. The sequence crystallizes as a B-type double helix, with two G(anti).A(syn) base pairs. These are stabilized by three-center hydrogen bonds to pyrimidines that induce perturbations in base-pair geometry. The central AGCT region of the helix has a wide (greater than 6 A) minor groove. PMID:2395870
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

PubMed

Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

2004-09-09

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Nucleotide sequence and transcriptional start site of the Methylobacterium organophilum XX methanol dehydrogenase structural gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Machlin, S.M.; Hanson, R.S.

The nucleotide sequence of a cloned 2.5-kilobase-pair SmaI fragment containing the methanol dehydrogenase (MDH) structural gene from Methylobacterium organophilum XX was determined. A single open reading frame with a coding capacity of 626 amino acids (molecular weight, 66,000) was identified on one stand, and N-terminal sequencing of purified MDH revealed that 27 of these residues constituted a putative signal peptide. Primer extension mapping of in vivo transcripts indicated that the start of mRNA synthesis was 160 to 170 base pairs upstream of the ATG codon. Northern (RNA) blot analysis further demonstrated that the transcript was 2.1 kilobase pairs in lengthmore » and therefore appeared to encode only MDH.« less
MRPrimer: a MapReduce-based method for the thorough design of valid and ranked primers for PCR

PubMed Central

Kim, Hyerin; Kang, NaNa; Chon, Kang-Wook; Kim, Seonho; Lee, NaHye; Koo, JaeHyung; Kim, Min-Soo

2015-01-01

Primer design is a fundamental technique that is widely used for polymerase chain reaction (PCR). Although many methods have been proposed for primer design, they require a great deal of manual effort to generate feasible and valid primers, including homology tests on off-target sequences using BLAST-like tools. That approach is inconvenient for many target sequences of quantitative PCR (qPCR) due to considering the same stringent and allele-invariant constraints. To address this issue, we propose an entirely new method called MRPrimer that can design all feasible and valid primer pairs existing in a DNA database at once, while simultaneously checking a multitude of filtering constraints and validating primer specificity. Furthermore, MRPrimer suggests the best primer pair for each target sequence, based on a ranking method. Through qPCR analysis using 343 primer pairs and the corresponding sequencing and comparative analyses, we showed that the primer pairs designed by MRPrimer are very stable and effective for qPCR. In addition, MRPrimer is computationally efficient and scalable and therefore useful for quickly constructing an entire collection of feasible and valid primers for frequently updated databases like RefSeq. Furthermore, we suggest that MRPrimer can be utilized conveniently for experiments requiring primer design, especially real-time qPCR. PMID:26109350
Sequence-dependent nucleosome sliding in rotation-coupled and uncoupled modes revealed by molecular simulations

PubMed Central

Tan, Cheng; Takada, Shoji

2017-01-01

While nucleosome positioning on eukaryotic genome play important roles for genetic regulation, molecular mechanisms of nucleosome positioning and sliding along DNA are not well understood. Here we investigated thermally-activated spontaneous nucleosome sliding mechanisms developing and applying a coarse-grained molecular simulation method that incorporates both long-range electrostatic and short-range hydrogen-bond interactions between histone octamer and DNA. The simulations revealed two distinct sliding modes depending on the nucleosomal DNA sequence. A uniform DNA sequence showed frequent sliding with one base pair step in a rotation-coupled manner, akin to screw-like motions. On the contrary, a strong positioning sequence, the so-called 601 sequence, exhibits rare, abrupt transitions of five and ten base pair steps without rotation. Moreover, we evaluated the importance of hydrogen bond interactions on the sliding mode, finding that strong and weak bonds favor respectively the rotation-coupled and -uncoupled sliding movements. PMID:29194442
PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data.

PubMed

Chiu, Kuo Ping; Wong, Chee-Hong; Chen, Qiongyu; Ariyaratne, Pramila; Ooi, Hong Sain; Wei, Chia-Lin; Sung, Wing-Kin Ken; Ruan, Yijun

2006-08-25

We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable. We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the Project Manager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping. The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.
Nucleotide sequence analysis establishes the role of endogenous murine leukemia virus DNA segments in formation of recombinant mink cell focus-forming murine leukemia viruses.

PubMed Central

Khan, A S

1984-01-01

The sequence of 363 nucleotides near the 3' end of the pol gene and 564 nucleotides from the 5' terminus of the env gene in an endogenous murine leukemia viral (MuLV) DNA segment, cloned from AKR/J mouse DNA and designated as A-12, was obtained. For comparison, the nucleotide sequence in an analogous portion of AKR mink cell focus-forming (MCF) 247 MuLV provirus was also determined. Sequence features unique to MCF247 MuLV DNA in the 3' pol and 5' env regions were identified by comparison with nucleotide sequences in analogous regions of NFS -Th-1 xenotropic and AKR ecotropic MuLV proviruses. These included (i) an insertion of 12 base pairs encoding four amino acids located 60 base pairs from the 3' terminus of the pol gene and immediately preceding the env gene, (ii) the deletion of 12 base pairs (encoding four amino acids) and the insertion of 3 base pairs (encoding one amino acid) in the 5' portion of the env gene, and (iii) single base substitutions resulting in 2 MCF247 -specific amino acids in the 3' pol and 23 in the 5' env regions. Nucleotide sequence comparison involving the 3' pol and 5' env regions of AKR MCF247 , NFS xenotropic, and AKR ecotropic MuLV proviruses with the cloned endogenous MuLV DNA indicated that MCF247 proviral DNA sequences were conserved in the cloned endogenous MuLV proviral segment. In fact, total nucleotide sequence identity existed between the endogenous MuLV DNA and the MCF247 MuLV provirus in the 3' portion of the pol gene. In the 5' env region, only 4 of 564 nucleotides were different, resulting in three amino acid changes between AKR MCF247 MuLV DNA and the endogenous MuLV DNA present in clone A-12. In addition, nucleotide sequence comparison indicated that Moloney-and Friend-MCF MuLVs were also highly related in the 3' pol and 5' env regions to the cloned endogenous MuLV DNA. These results establish the role of endogenous MuLV DNA segments in generation of recombinant MCF viruses. PMID:6328017
End-to-end distance and contour length distribution functions of DNA helices

NASA Astrophysics Data System (ADS)

Zoli, Marco

2018-06-01

I present a computational method to evaluate the end-to-end and the contour length distribution functions of short DNA molecules described by a mesoscopic Hamiltonian. The method generates a large statistical ensemble of possible configurations for each dimer in the sequence, selects the global equilibrium twist conformation for the molecule, and determines the average base pair distances along the molecule backbone. Integrating over the base pair radial and angular fluctuations, I derive the room temperature distribution functions as a function of the sequence length. The obtained values for the most probable end-to-end distance and contour length distance, providing a measure of the global molecule size, are used to examine the DNA flexibility at short length scales. It is found that, also in molecules with less than ˜60 base pairs, coiled configurations maintain a large statistical weight and, consistently, the persistence lengths may be much smaller than in kilo-base DNA.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Szulik, Marta W.; Pallan, Pradeep S.; Nocek, Boguslaw

5-Hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC) form during active demethylation of 5-methylcytosine (5mC) and are implicated in epigenetic regulation of the genome. They are differentially processed by thymine DNA glycosylase (TDG), an enzyme involved in active demethylation of 5mC. Three modified Dickerson–Drew dodecamer (DDD) sequences, amenable to crystallographic and spectroscopic analyses and containing the 5'-CG-3' sequence associated with genomic cytosine methylation, containing 5hmC, 5fC, or 5caC placed site-specifically into the 5'-T 8X 9G 10-3' sequence of the DDD, were compared. The presence of 5caC at the X9 base increased the stability of the DDD, whereas 5hmC or 5fC didmore » not. Both 5hmC and 5fC increased imino proton exchange rates and calculated rate constants for base pair opening at the neighboring base pair A 5:T 8, whereas 5caC did not. At the oxidized base pair G 4:X 9, 5fC exhibited an increase in the imino proton exchange rate and the calculated k op. In all cases, minimal effects to imino proton exchange rates occurred at the neighboring base pair C 3:G 10. No evidence was observed for imino tautomerization, accompanied by wobble base pairing, for 5hmC, 5fC, or 5caC when positioned at base pair G 4:X 9; each favored Watson–Crick base pairing. However, both 5fC and 5caC exhibited intranucleobase hydrogen bonding between their formyl or carboxyl oxygens, respectively, and the adjacent cytosine N 4 exocyclic amines. The lesion-specific differences observed in the DDD may be implicated in recognition of 5hmC, 5fC, or 5caC in DNA by TDG. Furthermore, they do not correlate with differential excision of 5hmC, 5fC, or 5caC by TDG, which may be mediated by differences in transition states of the enzyme-bound complexes.« less

Sequence Effect on the Formation of DNA Minidumbbells.

PubMed

Liu, Yuan; Lam, Sik Lok

2017-11-16

The DNA minidumbbell (MDB) is a recently identified non-B structure. The reported MDBs contain two TTTA, CCTG, or CTTG type II loops. At present, the knowledge and understanding of the sequence criteria for MDB formation are still limited. In this study, we performed a systematic high-resolution nuclear magnetic resonance (NMR) and native gel study to investigate the effect of sequence variations in tandem repeats on the formation of MDBs. Our NMR results reveal the importance of hydrogen bonds, base-base stacking, and hydrophobic interactions from each of the participating residues. We conclude that in the MDBs formed by tandem repeats, C-G loop-closing base pairs are more stabilizing than T-A loop-closing base pairs, and thymine residues in both the second and third loop positions are more stabilizing than cytosine residues. The results from this study enrich our knowledge on the sequence criteria for the formation of MDBs, paving a path for better exploring their potential roles in biological systems and DNA nanotechnology.
Characterization of a tandemly repeated DNA sequence family originally derived by retroposition of tRNA(Glu) in the newt.

PubMed

Nagahashi, S; Endoh, H; Suzuki, Y; Okada, N

1991-11-20

A previous report from this laboratory showed that in vitro transcription of total genomic DNA of the newt Cynopus pyrrhogaster resulted in a discrete sized 8 S RNA, which represented highly repetitive and transcribable sequences with a glutamic acid tRNA-like structure in the newt genome. We isolated four independent clones from a newt genomic library and determined the complete sequences of three 2000 to 2400 base-pair PstI fragments spanning the 8 S RNA gene. The glutamic acid tRNA-related segment in the 8 S RNA gene contains the CCA sequence expected as the 3' terminus of a tRNA molecule. Further, the 11 nucleotides located 13 nucleotides upstream from one of the two transcription initiation sites of the 8 S RNA were found to be repeated in the region upstream from the termination site, suggesting that the original unit, which is shorter than the 8 S RNA, was retrotransposed via cDNA intermediates from the PolIII transcript. In the upstream region of the 8 S RNA gene, a 360 nucleotide unit containing the glutamic acid tRNA-related segment was found to be duplicated (clones NE1 and NE10) or triplicated (clone NE3). Except for the difference in the number of the 360 nucleotide unit, the three sequences of the 2000 to 2400 base-pair PstI fragment were essentially the same with only a few mutations and minor deletions. Inverse polymerase chain reaction and sequence determination of the products, together with a Southern hybridization experiment, demonstrated that the family consists of a tandemly repeated unit of 3300, 3700 or 4100 base-pairs. Thus during evolution, this family in the newt was created by retroposition via cDNA intermediates, followed by duplication or triplication of the 360 nucleotide unit and multiplication of the 3300 to 4100 base-pair region at the DNA level.
Reversed-phase ion-pair liquid chromatography method for purification of duplex DNA with single base pair resolution

PubMed Central

Wysoczynski, Christina L.; Roemer, Sarah C.; Dostal, Vishantie; Barkley, Robert M.; Churchill, Mair E. A.; Malarkey, Christopher S.

2013-01-01

Obtaining quantities of highly pure duplex DNA is a bottleneck in the biophysical analysis of protein–DNA complexes. In traditional DNA purification methods, the individual cognate DNA strands are purified separately before annealing to form DNA duplexes. This approach works well for palindromic sequences, in which top and bottom strands are identical and duplex formation is typically complete. However, in cases where the DNA is non-palindromic, excess of single-stranded DNA must be removed through additional purification steps to prevent it from interfering in further experiments. Here we describe and apply a novel reversed-phase ion-pair liquid chromatography purification method for double-stranded DNA ranging in lengths from 17 to 51 bp. Both palindromic and non-palindromic DNA can be readily purified. This method has the unique ability to separate blunt double-stranded DNA from pre-attenuated (n-1, n-2, etc) synthesis products, and from DNA duplexes with single base pair overhangs. Additionally, palindromic DNA sequences with only minor differences in the central spacer sequence of the DNA can be separated, and the purified DNA is suitable for co-crystallization of protein–DNA complexes. Thus, double-stranded ion-pair liquid chromatography is a useful approach for duplex DNA purification for many applications. PMID:24013567
Estimating Genomic Distance from DNA Sequence Location in Cell Nuclei by a Random Walk Model

NASA Astrophysics Data System (ADS)

van den Engh, Ger; Sachs, Rainer; Trask, Barbara J.

1992-09-01

The folding of chromatin in interphase cell nuclei was studied by fluorescent in situ hybridization with pairs of unique DNA sequence probes. The sites of DNA sequences separated by 100 to 2000 kilobase pairs (kbp) are distributed in interphase chromatin according to a random walk model. This model provides the basis for calculating the spacing of sequences along the linear DNA molecule from interphase distance measurements. An interphase mapping strategy based on this model was tested with 13 probes from a 4-megabase pair (Mbp) region of chromosome 4 containing the Huntington disease locus. The results confirmed the locations of the probes and showed that the remaining gap in the published maps of this region is negligible in size. Interphase distance measurements should facilitate construction of chromosome maps with an average marker density of one per 100 kbp, approximately ten times greater than that achieved by hybridization to metaphase chromosomes.
Brain cDNA clone for human cholinesterase

DOE Office of Scientific and Technical Information (OSTI.GOV)

McTiernan, C.; Adkins, S.; Chatonnet, A.

1987-10-01

A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
Sequence-dependent effects in drug-DNA interaction: the crystal structure of Hoechst 33258 bound to the d(CGCAAATTTGCG)2 duplex.

PubMed Central

Spink, N; Brown, D G; Skelly, J V; Neidle, S

1994-01-01

The bis-benzimidazole drug Hoechst 33258 has been co-crystallized with the dodecanucleotide sequence d(CGCAAATTTGCG)2. The structure has been solved by molecular replacement and refined to an R factor of 18.5% for 2125 reflections collected on a Xentronics area detector. The drug is bound in the minor groove, at the five base-pair site 5'-ATTTG and is in a unique orientation. This is displaced by one base pair in the 5' direction compared to previously-determined structures of this drug with the sequence d(CGCGAATTCGCG)2. Reasons for this difference in behaviour are discussed in terms of several sequence-dependent structural features of the DNA, with particular reference to differences in propeller twist and minor-groove width. Images PMID:7515488
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fleischmann, R.D.; Adams, M.D.; White, O.

1995-07-28

An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.
Compressive Sensing for Radar and Radar Sensor Networks

DTIC Science & Technology

2013-12-02

Zero Correlation Zone Sequence Pair Sets for MIMO Radar Inspired by recent advances in MIMO radar, we apply orthogonal phase coded waveforms to MIMO ...radar system in order to gain better range resolution and target direction finding performance [2]. We provide and investigate a generalized MIMO radar...ZCZ) sequence-Pair Set (ZCZPS). We also study the MIMO radar ambiguity function of the system using phase coded waveforms, based on which we analyze
Synthetic Biology Parts for the Storage of Increased Genetic Information in Cells.

PubMed

Morris, Sydney E; Feldman, Aaron W; Romesberg, Floyd E

2017-10-20

To bestow cells with novel forms and functions, the goal of synthetic biology, we have developed the unnatural nucleoside triphosphates dNaMTP and dTPT3TP, which form an unnatural base pair (UBP) and expand the genetic alphabet. While the UBP may be retained in the DNA of a living cell, its retention is sequence-dependent. We now report a steady-state kinetic characterization of the rate with which the Klenow fragment of E. coli DNA polymerase I synthesizes the UBP and its mispairs in a variety of sequence contexts. Correct UBP synthesis is as efficient as for a natural base pair, except in one sequence context, and in vitro performance is correlated with in vivo performance. The data elucidate the determinants of efficient UBP synthesis, show that the dNaM-dTPT3 UBP is the first generally recognized natural-like base pair, and importantly, demonstrate that dNaMTP and dTPT3TP are well optimized and standardized parts for the expansion of the genetic alphabet.
NMR studies of DNA oligomers and their interactions with minor groove binding ligands

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fagan, Patricia A.

1996-05-01

The cationic peptide ligands distamycin and netropsin bind noncovalently to the minor groove of DNA. The binding site, orientation, stoichiometry, and qualitative affinity of distamycin binding to several short DNA oligomers were investigated by NMR spectroscopy. The oligomers studied contain A,T-rich or I,C-rich binding sites, where I = 2-desaminodeoxyguanosine. I•C base pairs are functional analogs of A•T base pairs in the minor groove. The different behaviors exhibited by distamycin and netropsin binding to various DNA sequences suggested that these ligands are sensitive probes of DNA structure. For sites of five or more base pairs, distamycin can form 1:1 or 2:1more » ligand:DNA complexes. Cooperativity in distamycin binding is low in sites such as AAAAA which has narrow minor grooves, and is higher in sites with wider minor grooves such as ATATAT. The distamycin binding and base pair opening lifetimes of I,C-containing DNA oligomers suggest that the I,C minor groove is structurally different from the A,T minor groove. Molecules which direct chemistry to a specific DNA sequence could be used as antiviral compounds, diagnostic probes, or molecular biology tools. The author studied two ligands in which reactive groups were tethered to a distamycin to increase the sequence specificity of the reactive agent.« less
Free energy landscape and transition pathways from Watson-Crick to Hoogsteen base pairing in free duplex DNA.

PubMed

Yang, Changwon; Kim, Eunae; Pak, Youngshang

2015-09-18

Houghton (HG) base pairing plays a central role in the DNA binding of proteins and small ligands. Probing detailed transition mechanism from Watson-Crick (WC) to HG base pair (bp) formation in duplex DNAs is of fundamental importance in terms of revealing intrinsic functions of double helical DNAs beyond their sequence determined functions. We investigated a free energy landscape of a free B-DNA with an adenosine-thymine (A-T) rich sequence to probe its conformational transition pathways from WC to HG base pairing. The free energy landscape was computed with a state-of-art two-dimensional umbrella molecular dynamics simulation at the all-atom level. The present simulation showed that in an isolated duplex DNA, the spontaneous transition from WC to HG bp takes place via multiple pathways. Notably, base flipping into the major and minor grooves was found to play an important role in forming these multiple transition pathways. This finding suggests that naked B-DNA under normal conditions has an inherent ability to form HG bps via spontaneous base opening events. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Platinum(II)-Oligonucleotide Coordination Based Aptasensor for Simple and Selective Detection of Platinum Compounds.

PubMed

Cai, Sheng; Tian, Xueke; Sun, Lianli; Hu, Haihong; Zheng, Shirui; Jiang, Huidi; Yu, Lushan; Zeng, Su

2015-10-20

Wide use of platinum-based chemotherapeutic regimens for the treatment for carcinoma calls for a simple and selective detection of platinum compound in biological samples. On the basis of the platinum(II)-base pair coordination, a novel type of aptameric platform for platinum detection has been introduced. This chemiluminescence (CL) aptasensor consists of a designed streptavidin (SA) aptamer sequence in which several base pairs were replaced by G-G mismatches. Only in the presence of platinum, coordination occurs between the platinum and G-G base pairs as opposed to the hydrogen-bonded G-C base pairs, which leads to SA aptamer sequence activation, resulting in their binding to SA coated magnetic beads. These Pt-DNA coordination events were monitored by a simple and direct luminol-peroxide CL reaction through horseradish peroxidase (HRP) catalysis with a strong chemiluminescence emission. The validated ranges of quantification were 0.12-240 μM with a limit of detection of 60 nM and selectivity over other metal ions. This assay was also successfully used in urine sample determination. It will be a promising candidate for the detection of platinum in biomedical and environmental samples.
Analysis of HIV-1 intersubtype recombination breakpoints suggests region with high pairing probability may be a more fundamental factor than sequence similarity affecting HIV-1 recombination.

PubMed

Jia, Lei; Li, Lin; Gui, Tao; Liu, Siyang; Li, Hanping; Han, Jingwan; Guo, Wei; Liu, Yongjian; Li, Jingyun

2016-09-21

With increasing data on HIV-1, a more relevant molecular model describing mechanism details of HIV-1 genetic recombination usually requires upgrades. Currently an incomplete structural understanding of the copy choice mechanism along with several other issues in the field that lack elucidation led us to perform an analysis of the correlation between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarity to further explore structural mechanisms. Near full length sequences of URFs from Asia, Europe, and Africa (one sequence/patient), and representative sequences of worldwide CRFs were retrieved from the Los Alamos HIV database. Their recombination patterns were analyzed by jpHMM in detail. Then the relationships between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarities were investigated. Pearson correlation test showed that all URF groups and the CRF group exhibit the same breakpoint distribution pattern. Additionally, the Wilcoxon two-sample test indicated a significant and inexplicable limitation of recombination in regions with high pairing probability. These regions have been found to be strongly conserved across distinct biological states (i.e., strong intersubtype similarity), and genetic similarity has been determined to be a very important factor promoting recombination. Thus, the results revealed an unexpected disagreement between intersubtype similarity and breakpoint distribution, which were further confirmed by genetic similarity analysis. Our analysis reveals a critical conflict between results from natural HIV-1 isolates and those from HIV-1-based assay vectors in which genetic similarity has been shown to be a very critical factor promoting recombination. These results indicate the region with high-pairing probabilities may be a more fundamental factor affecting HIV-1 recombination than sequence similarity in natural HIV-1 infections. Our findings will be relevant in furthering the understanding of HIV-1 recombination mechanisms.
DNA sequence alignment by microhomology sampling during homologous recombination

PubMed Central

Qi, Zhi; Redding, Sy; Lee, Ja Yil; Gibb, Bryan; Kwon, YoungHo; Niu, Hengyao; Gaines, William A.; Sung, Patrick

2015-01-01

Summary Homologous recombination (HR) mediates the exchange of genetic information between sister or homologous chromatids. During HR, members of the RecA/Rad51 family of recombinases must somehow search through vast quantities of DNA sequence to align and pair ssDNA with a homologous dsDNA template. Here we use single-molecule imaging to visualize Rad51 as it aligns and pairs homologous DNA sequences in real-time. We show that Rad51 uses a length-based recognition mechanism while interrogating dsDNA, enabling robust kinetic selection of 8-nucleotide (nt) tracts of microhomology, which kinetically confines the search to sites with a high probability of being a homologous target. Successful pairing with a 9th nucleotide coincides with an additional reduction in binding free energy and subsequent strand exchange occurs in precise 3-nt steps, reflecting the base triplet organization of the presynaptic complex. These findings provide crucial new insights into the physical and evolutionary underpinnings of DNA recombination. PMID:25684365
Evaluation of six primer pairs targeting the nuclear rRNA operon for characterization of arbuscular mycorrhizal fungal (AMF) communities using 454 pyrosequencing.

PubMed

Van Geel, Maarten; Busschaert, Pieter; Honnay, Olivier; Lievens, Bart

2014-11-01

In the last few years, 454 pyrosequencing-based analysis of arbuscular mycorrhizal fungal (AMF; Glomeromycota) communities has tremendously increased our knowledge of the distribution and diversity of AMF. Nonetheless, comparing results between different studies is difficult, as different target genes (or regions thereof) and primer combinations, with potentially dissimilar specificities and efficacies, are being utilized. In this study we evaluated six primer pairs that have previously been used in AMF studies (NS31-AM1, AMV4.5NF-AMDGR, AML1-AML2, NS31-AML2, FLR3-LSUmBr and Glo454-NDL22) for their use in 454 pyrosequencing based on both an in silico approach and 454 pyrosequencing of AMF communities from apple tree roots. Primers were evaluated in terms of (i) in silico coverage of Glomeromycota fungi, (ii) the number of high-quality sequences obtained, (iii) selectivity for AMF species, (iv) reproducibility and (v) ability to accurately describe AMF communities. We show that primer pairs AMV4.5NF-AMDGR, AML1-AML2 and NS31-AML2 outperformed the other tested primer pairs in terms of number of Glomeromycota reads (AMF specificity and coverage). Additionally, these primer pairs were found to have no or only few mismatches to AMF sequences and were able to consistently describe AMF communities from apple roots. However, whereas most high-quality AMF sequences were obtained for AMV4.5NF-AMDGR, our results also suggest that this primer pair favored amplification of Glomeraceae sequences at the expense of Ambisporaceae, Claroideoglomeraceae and Paraglomeraceae sequences. Furthermore, we demonstrate the complementary specificity of AMV4.5NF-AMDGR with AML1-AML2, and of AMV4.5NF-AMDGR with NS31-AML2, making these primer combinations highly suitable for tandem use in covering the diversity of AMF communities. Copyright © 2014 Elsevier B.V. All rights reserved.
Detailed analysis of stem I and its 5' and 3' neighbor regions in the trans-acting HDV ribozyme.

PubMed Central

Nishikawa, F; Roy, M; Fauzi, H; Nishikawa, S

1999-01-01

To determine the stem I structure of the human hepatitis delta virus (HDV) ribozyme, which is related to the substrate sequence in the trans -acting system, we kinetically studied stem I length and sequences. Stem I extension from 7 to 8 or 9 bp caused a loss of activity and a low amount of active complex with 9 bp in the trans -acting system. In a previous report, we presented cleavage in a 6 bp stem I. The observed reaction rates indicate that the original 7 bp stem I is in the most favorable location for catalytic reaction among the possible 6-8 bp stems. To test base specificity, we replaced the original GC-rich sequence in stem I with AU-rich sequences containing six AU or UA base pairs with the natural +1G.U wobble base pair at the cleavage site. The cis -acting AU-rich molecules demonstrated similar catalytic activity to that of the wild-type. In trans -acting molecules, due to stem I instability, reaction efficiency strongly depended on the concentration of the ribozyme-substrate complex and reaction temperature. Multiple turnover was observed at 37 degreesC, strongly suggesting that stem I has no base specificity and more efficient activity can be expected under multiple turnover conditions by substituting several UA or AU base pairs into stem I. We also studied the substrate damaging sequences linked to both ends of stem I for its development in therapeutic applications and confirmed the functions of the unique structure. PMID:9862958
Quantitation of base substitutions in eukaryotic 5S rRNA: selection for the maintenance of RNA secondary structure.

PubMed

Curtiss, W C; Vournakis, J N

1984-01-01

Eukaryotic 5S rRNA sequences from 34 diverse species were compared by the following method: (1) The sequences were aligned; (2) the positions of substitutions were located by comparison of all possible pairs of sequences; (3) the substitution sites were mapped to an assumed general base pairing model; and (4) the R-Y model of base stacking was used to study stacking pattern relationships in the structure. An analysis of the sequence and structure variability in each region of the molecule is presented. It was found that the degree of base substitution varies over a wide range, from absolute conservation to occurrence of over 90% of the possible observable substitutions. The substitutions are located primarily in stem regions of the 5S rRNA secondary structure. More than 88% of the substitutions in helical regions maintain base pairing. The disruptive substitutions are primarily located at the edges of helical regions, resulting in shortening of the helical regions and lengthening of the adjacent nonpaired regions. Base stacking patterns determined by the R-Y model are mapped onto the general secondary structure. Intrastrand and interstrand stacking could stabilize alternative coaxial structures and limit the conformational flexibility of nonpaired regions. Two short contiguous regions are 100% conserved in all species. This may reflect evolutionary constraints imposed at the DNA level by the requirement for binding of a 5S gene transcription initiation factor during gene expression.
Database of non-canonical base pairs found in known RNA structures

NASA Technical Reports Server (NTRS)

Nagaswamy, U.; Voss, N.; Zhang, Z.; Fox, G. E.

2000-01-01

Atomic resolution RNA structures are being published at an increasing rate. It is common to find a modest number of non-canonical base pairs in these structures in addition to the usual Watson-Crick pairs. This database summarizes the occurrence of these rare base pairs in accordance with standard nomenclature. The database, http://prion.bchs.uh.edu/, contains information such as sequence context, sugar pucker conformation, anti / syn base conformations, chemical shift, p K (a)values, melting temperature and free energy. Of the 29 anticipated pairs with two or more hydrogen bonds, 20 have been encountered to date. In addition, four unexpected pairs with two hydrogen bonds have been reported bringing the total to 24. Single hydrogen bond versions of five of the expected geometries have been encountered among the single hydrogen bond interactions. In addition, 18 different types of base triplets have been encountered, each of which involves three to six hydrogen bonds. The vast majority of the rare base pairs are antiparallel with the bases in the anti configuration relative to the ribose. The most common are the GU wobble, the Sheared GA pair, the Reverse Hoogsteen pair and the GA imino pair.
Small nuclear RNA U2 is base-paired to heterogeneous nuclear RNA.

PubMed

Calvet, J P; Meyer, L M; Pederson, T

1982-07-30

Eukaryotic cells contain a set of low molecular weight nuclear RNA's. One of the more abundant of these is termed U2 RNA. The possibility that U2 RNA is hydrogen-bonded to complementary sequences in other nuclear RNA's was investigated. Cultured human (HeLa) cells were treated with a psoralen derivative that cross-links RNA chains that are base-paired with one another. High molecular weight heterogeneous nuclear RNA was isolated under denaturing conditions, and the psoralen cross-links were reversed. Electrophoresis of the released RNA and hybridization with a human cloned U2 DNA probe revealed that U2 is hydrogen-bonded to complementary sequences in heterogeneous nuclear RNA in vivo. In contrast, U2 RNA is not base-paired with nucleolar RNA, which contains the precursors of ribosomal RNA. The results suggest that U2 RNA participates in messenger RNA processing in the nucleus.
Negatively supercoiled simian virus 40 DNA contains Z-DNA segments within transcriptional enhancer sequences

NASA Technical Reports Server (NTRS)

Nordheim, A.; Rich, A.

1983-01-01

Three 8-base pair (bp) segments of alternating purine-pyrimidine from the simian virus 40 enhancer region form Z-DNA on negative supercoiling; minichromosome DNase I-hypersensitive sites determined by others bracket these three segments. A survey of transcriptional enhancer sequences reveals a pattern of potential Z-DNA-forming regions which occur in pairs 50-80 bp apart. This may influence local chromatin structure and may be related to transcriptional activation.

MRPrimer: a MapReduce-based method for the thorough design of valid and ranked primers for PCR.

PubMed

Kim, Hyerin; Kang, NaNa; Chon, Kang-Wook; Kim, Seonho; Lee, NaHye; Koo, JaeHyung; Kim, Min-Soo

2015-11-16

Primer design is a fundamental technique that is widely used for polymerase chain reaction (PCR). Although many methods have been proposed for primer design, they require a great deal of manual effort to generate feasible and valid primers, including homology tests on off-target sequences using BLAST-like tools. That approach is inconvenient for many target sequences of quantitative PCR (qPCR) due to considering the same stringent and allele-invariant constraints. To address this issue, we propose an entirely new method called MRPrimer that can design all feasible and valid primer pairs existing in a DNA database at once, while simultaneously checking a multitude of filtering constraints and validating primer specificity. Furthermore, MRPrimer suggests the best primer pair for each target sequence, based on a ranking method. Through qPCR analysis using 343 primer pairs and the corresponding sequencing and comparative analyses, we showed that the primer pairs designed by MRPrimer are very stable and effective for qPCR. In addition, MRPrimer is computationally efficient and scalable and therefore useful for quickly constructing an entire collection of feasible and valid primers for frequently updated databases like RefSeq. Furthermore, we suggest that MRPrimer can be utilized conveniently for experiments requiring primer design, especially real-time qPCR. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool.

PubMed

Jérôme, Mariette; Noirot, Céline; Klopp, Christophe

2011-05-26

Roche 454 pyrosequencing platform is often considered the most versatile of the Next Generation Sequencing technology platforms, permitting the sequencing of large genomes, the analysis of variations or the study of transcriptomes. A recent reported bias leads to the production of multiple reads for a unique DNA fragment in a random manner within a run. This bias has a direct impact on the quality of the measurement of the representation of the fragments using the reads. Other cleaning steps are usually performed on the reads before assembly or alignment. PyroCleaner is a software module intended to clean 454 pyrosequencing reads in order to ease the assembly process. This program is a free software and is distributed under the terms of the GNU General Public License as published by the Free Software Foundation. It implements several filters using criteria such as read duplication, length, complexity, base-pair quality and number of undetermined bases. It also permits to clean flowgram files (.sff) of paired-end sequences generating on one hand validated paired-ends file and the other hand single read file. Read cleaning has always been an important step in sequence analysis. The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning. It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.
The CGTCA sequence motif is essential for biological activity of the vasoactive intestinal peptide gene cAMP-regulated enhancer.

PubMed Central

Fink, J S; Verhave, M; Kasper, S; Tsukada, T; Mandel, G; Goodman, R H

1988-01-01

cAMP-regulated transcription of the human vasoactive intestinal peptide gene is dependent upon a 17-base-pair DNA element located 70 base pairs upstream from the transcriptional initiation site. This element is similar to sequences in other genes known to be regulated by cAMP and to sequences in several viral enhancers. We have demonstrated that the vasoactive intestinal peptide regulatory element is an enhancer that depends upon the integrity of two CGTCA sequence motifs for biological activity. Mutations in either of the CGTCA motifs diminish the ability of the element to respond to cAMP. Enhancers containing the CGTCA motif from the somatostatin and adenovirus genes compete for binding of nuclear proteins from C6 glioma and PC12 cells to the vasoactive intestinal peptide enhancer, suggesting that CGTCA-containing enhancers interact with similar transacting factors. Images PMID:2842787
Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions

PubMed Central

Sükösd, Zsuzsanna; Swenson, M. Shel; Kjems, Jørgen; Heitsch, Christine E.

2013-01-01

Recent advances in RNA structure determination include using data from high-throughput probing experiments to improve thermodynamic prediction accuracy. We evaluate the extent and nature of improvements in data-directed predictions for a diverse set of 16S/18S ribosomal sequences using a stochastic model of experimental SHAPE data. The average accuracy for 1000 data-directed predictions always improves over the original minimum free energy (MFE) structure. However, the amount of improvement varies with the sequence, exhibiting a correlation with MFE accuracy. Further analysis of this correlation shows that accurate MFE base pairs are typically preserved in a data-directed prediction, whereas inaccurate ones are not. Thus, the positive predictive value of common base pairs is consistently higher than the directed prediction accuracy. Finally, we confirm sequence dependencies in the directability of thermodynamic predictions and investigate the potential for greater accuracy improvements in the worst performing test sequence. PMID:23325843
Enantiospecific recognition of DNA sequences by a proflavine Tröger base.

PubMed

Bailly, C; Laine, W; Demeunynck, M; Lhomme, J

2000-07-05

The DNA interaction of a chiral Tröger base derived from proflavine was investigated by DNA melting temperature measurements and complementary biochemical assays. DNase I footprinting experiments demonstrate that the binding of the proflavine-based Tröger base is both enantio- and sequence-specific. The (+)-isomer poorly interacts with DNA in a non-sequence-selective fashion. In sharp contrast, the corresponding (-)-isomer recognizes preferentially certain DNA sequences containing both A. T and G. C base pairs, such as the motifs 5'-GTT. AAC and 5'-ATGA. TCAT. This is the first experimental demonstration that acridine-type Tröger bases can be used for enantiospecific recognition of DNA sequences. Copyright 2000 Academic Press.
New powerful statistics for alignment-free sequence comparison under a pattern transfer model.

PubMed

Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S; Sun, Fengzhu

2011-09-07

Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. Copyright © 2011 Elsevier Ltd. All rights reserved.
New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model

PubMed Central

Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu

2011-01-01

Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298
Whole genome sequencing reveals mycobacterial microevolution among concurrent isolates from sputum and blood in HIV infected TB patients.

PubMed

Ssengooba, Willy; de Jong, Bouke C; Joloba, Moses L; Cobelens, Frank G; Meehan, Conor J

2016-08-05

In the context of advanced immunosuppression, M. tuberculosis is known to cause detectable mycobacteremia. However, little is known about the intra-patient mycobacterial microevolution and the direction of seeding between the sputum and blood compartments. From a diagnostic study of HIV-infected TB patients, 51 pairs of concurrent blood and sputum M. tuberculosis isolates from the same patient were available. In a previous analysis, we identified a subset with genotypic concordance, based on spoligotyping and 24 locus MIRU-VNTR. These paired isolates with identical genotypes were analyzed by whole genome sequencing and phylogenetic analysis. Of the 25 concordant pairs (49 % of the 51 paired isolates), 15 (60 %) remained viable for extraction of high quality DNA for whole genome sequencing. Two patient pairs were excluded due to poor quality sequence reads. The median CD4 cell count was 32 (IQR; 16-101)/mm(3) and ten (77 %) patients were on ART. No drug resistance mutations were identified in any of the sequences analyzed. Three (23.1 %) of 13 patients had SNPs separating paired isolates from blood and sputum compartments, indicating evidence of microevolution. Using a phylogenetic approach to identify the ancestral compartment, in two (15 %) patients the blood isolate was ancestral to the sputum isolate, in one (8 %) it was the opposite, and ten (77 %) of the pairs were identical. Among HIV-infected patients with poor cellular immunity, infection with multiple strains of M. tuberculosis was found in half of the patients. In those patients with identical strains, whole genome sequencing indicated that M. tuberculosis intra-patient microevolution does occur in a few patients, yet did not reveal a consistent direction of spread between sputum and blood. This suggests that these compartments are highly connected and potentially seed each other repeatedly.
Phenotypic H-Antigen Typing by Mass Spectrometry Combined with Genetic Typing of H Antigens, O Antigens, and Toxins by Whole-Genome Sequencing Enhances Identification of Escherichia coli Isolates.

PubMed

Cheng, Keding; Chui, Huixia; Domish, Larissa; Sloan, Angela; Hernandez, Drexler; McCorrister, Stuart; Robinson, Alyssia; Walker, Matthew; Peterson, Lorea A M; Majcher, Miles; Ratnam, Sam; Haldane, David J M; Bekal, Sadjia; Wylie, John; Chui, Linda; Tyler, Shaun; Xu, Bianli; Reimer, Aleisha; Nadon, Celine; Knox, J David; Wang, Gehua

2016-08-01

Mass spectrometry-based phenotypic H-antigen typing (MS-H) combined with whole-genome-sequencing-based genetic identification of H antigens, O antigens, and toxins (WGS-HOT) was used to type 60 clinical Escherichia coli isolates, 43 of which were previously identified as nonmotile, H type undetermined, or O rough by serotyping or having shown discordant MS-H and serotyping results. Whole-genome sequencing confirmed that MS-H was able to provide more accurate data regarding H antigen expression than serotyping. Further, enhanced and more confident O antigen identification resulted from gene cluster based typing in combination with conventional typing based on the gene pair comprising wzx and wzy and that comprising wzm and wzt The O antigen was identified in 94.6% of the isolates when the two genetic O typing approaches (gene pair and gene cluster) were used in conjunction, in comparison to 78.6% when the gene pair database was used alone. In addition, 98.2% of the isolates showed the existence of genes for various toxins and/or virulence factors, among which verotoxins (Shiga toxin 1 and/or Shiga toxin 2) were 100% concordant with conventional PCR based testing results. With more applications of mass spectrometry and whole-genome sequencing in clinical microbiology laboratories, this combined phenotypic and genetic typing platform (MS-H plus WGS-HOT) should be ideal for pathogenic E. coli typing. Copyright © 2016 Cheng et al.
Estimating Exceptionally Rare Germline and Somatic Mutation Frequencies via Next Generation Sequencing

PubMed Central

Yoon, Song-Ro; Arnheim, Norman; Calabrese, Peter

2016-01-01

We used targeted next generation deep-sequencing (Safe Sequencing System) to measure ultra-rare de novo mutation frequencies in the human male germline by attaching a unique identifier code to each target DNA molecule. Segments from three different human genes (FGFR3, MECP2 and PTPN11) were studied. Regardless of the gene segment, the particular testis donor or the 73 different testis pieces used, the frequencies for any one of the six different mutation types were consistent. Averaging over the C>T/G>A and G>T/C>A mutation types the background mutation frequency was 2.6x10-5 per base pair, while for the four other mutation types the average background frequency was lower at 1.5x10-6 per base pair. These rates far exceed the well documented human genome average frequency per base pair (~10−8) suggesting a non-biological explanation for our data. By computational modeling and a new experimental procedure to distinguish between pre-mutagenic lesion base mismatches and a fully mutated base pair in the original DNA molecule, we argue that most of the base-dependent variation in background frequency is due to a mixture of deamination and oxidation during the first two PCR cycles. Finally, we looked at a previously studied disease mutation in the PTPN11 gene and could easily distinguish true mutations from the SSS background. We also discuss the limits and possibilities of this and other methods to measure exceptionally rare mutation frequencies, and we present calculations for other scientists seeking to design their own such experiments. PMID:27341568
Sequencing and assembly of the 22-gb loblolly pine genome.

PubMed

Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H

2014-03-01

Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
Development of Pineapple Microsatellite Markers and Germplasm Genetic Diversity Analysis

PubMed Central

Tong, Helin; Chen, You; Wang, Jingyi; Chen, Yeyuan; Sun, Guangming; He, Junhu; Wu, Yaoting

2013-01-01

Two methods were used to develop pineapple microsatellite markers. Genomic library-based SSR development: using selectively amplified microsatellite assay, 86 sequences were generated from pineapple genomic library. 91 (96.8%) of the 94 Simple Sequence Repeat (SSR) loci were dinucleotide repeats (39 AC/GT repeats and 52 GA/TC repeats, accounting for 42.9% and 57.1%, resp.), and the other three were mononucleotide repeats. Thirty-six pairs of SSR primers were designed; 24 of them generated clear bands of expected sizes, and 13 of them showed polymorphism. EST-based SSR development: 5659 pineapple EST sequences obtained from NCBI were analyzed; among 1397 nonredundant EST sequences, 843 were found containing 1110 SSR loci (217 of them contained more than one SSR locus). Frequency of SSRs in pineapple EST sequences is 1SSR/3.73 kb, and 44 types were found. Mononucleotide, dinucleotide, and trinucleotide repeats dominate, accounting for 95.6% in total. AG/CT and AGC/GCT were the dominant type of dinucleotide and trinucleotide repeats, accounting for 83.5% and 24.1%, respectively. Thirty pairs of primers were designed for each of randomly selected 30 sequences; 26 of them generated clear and reproducible bands, and 22 of them showed polymorphism. Eighteen pairs of primers obtained by the one or the other of the two methods above that showed polymorphism were selected to carry out germplasm genetic diversity analysis for 48 breeds of pineapple; similarity coefficients of these breeds were between 0.59 and 1.00, and they can be divided into four groups accordingly. Amplification products of five SSR markers were extracted and sequenced, corresponding repeat loci were found and locus mutations are mainly in copy number of repeats and base mutations in the flanking region. PMID:24024187
Elucidating the 16S rRNA 3' boundaries and defining optimal SD/aSD pairing in Escherichia coli and Bacillus subtilis using RNA-Seq data.

PubMed

Wei, Yulong; Silke, Jordan R; Xia, Xuhua

2017-12-15

Bacterial translation initiation is influenced by base pairing between the Shine-Dalgarno (SD) sequence in the 5' UTR of mRNA and the anti-SD (aSD) sequence at the free 3' end of the 16S rRNA (3' TAIL) due to: 1) the SD/aSD sequence binding location and 2) SD/aSD binding affinity. In order to understand what makes an SD/aSD interaction optimal, we must define: 1) terminus of the 3' TAIL and 2) extent of the core aSD sequence within the 3' TAIL. Our approach to characterize these components in Escherichia coli and Bacillus subtilis involves 1) mapping the 3' boundary of the mature 16S rRNA using high-throughput RNA sequencing (RNA-Seq), and 2) identifying the segment within the 3' TAIL that is strongly preferred in SD/aSD pairing. Using RNA-Seq data, we resolve previous discrepancies in the reported 3' TAIL in B. subtilis and recovered the established 3' TAIL in E. coli. Furthermore, we extend previous studies to suggest that both highly and lowly expressed genes favor SD sequences with intermediate binding affinity, but this trend is exclusive to SD sequences that complement the core aSD sequences defined herein.
Stacked-unstacked equilibrium at the nick site of DNA.

PubMed

Protozanova, Ekaterina; Yakovchuk, Peter; Frank-Kamenetskii, Maxim D

2004-09-17

Stability of duplex DNA with respect to separation of complementary strands is crucial for DNA executing its major functions in the cell and it also plays a central role in major biotechnology applications of DNA: DNA sequencing, polymerase chain reaction, and DNA microarrays. Two types of interaction are well known to contribute to DNA stability: stacking between adjacent base-pairs and pairing between complementary bases. However, their contribution into the duplex stability is yet to be determined. Now we fill this fundamental gap in our knowledge of the DNA double helix. We have prepared a series of 32, 300 bp-long DNA fragments with solitary nicks in the same position differing only in base-pairs flanking the nick. Electrophoretic mobility of these fragments in the gel has been studied. Assuming the equilibrium between stacked and unstacked conformations at the nick site, all 32 stacking free energy parameters have been obtained. Only ten of them are essential and they govern the stacking interactions between adjacent base-pairs in intact DNA double helix. A full set of DNA stacking parameters has been determined for the first time. From these data and from a well-known dependence of DNA melting temperature on G.C content, the contribution of base-pairing into duplex stability has been estimated. The obtained energy parameters of the DNA double helix are of paramount importance for understanding sequence-dependent DNA flexibility and for numerous biotechnology applications.
Line segment confidence region-based string matching method for map conflation

NASA Astrophysics Data System (ADS)

Huh, Yong; Yang, Sungchul; Ga, Chillo; Yu, Kiyun; Shi, Wenzhong

2013-04-01

In this paper, a method to detect corresponding point pairs between polygon object pairs with a string matching method based on a confidence region model of a line segment is proposed. The optimal point edit sequence to convert the contour of a target object into that of a reference object was found by the string matching method which minimizes its total error cost, and the corresponding point pairs were derived from the edit sequence. Because a significant amount of apparent positional discrepancies between corresponding objects are caused by spatial uncertainty and their confidence region models of line segments are therefore used in the above matching process, the proposed method obtained a high F-measure for finding matching pairs. We applied this method for built-up area polygon objects in a cadastral map and a topographical map. Regardless of their different mapping and representation rules and spatial uncertainties, the proposed method with a confidence level at 0.95 showed a matching result with an F-measure of 0.894.
Molecular recognition of DNA base pairs by the formamido/pyrrole and formamido/imidazole pairings in stacked polyamides.

PubMed

Buchmueller, Karen L; Staples, Andrew M; Uthe, Peter B; Howard, Cameron M; Pacheco, Kimberly A O; Cox, Kari K; Henry, James A; Bailey, Suzanna L; Horick, Sarah M; Nguyen, Binh; Wilson, W David; Lee, Moses

2005-01-01

Polyamides containing an N-terminal formamido (f) group bind to the minor groove of DNA as staggered, antiparallel dimers in a sequence-specific manner. The formamido group increases the affinity and binding site size, and it promotes the molecules to stack in a staggered fashion thereby pairing itself with either a pyrrole (Py) or an imidazole (Im). There has not been a systematic study on the DNA recognition properties of the f/Py and f/Im terminal pairings. These pairings were analyzed here in the context of f-ImPyPy, f-ImPyIm, f-PyPyPy and f-PyPyIm, which contain the central pairing modes, -ImPy- and -PyPy-. The specificity of these triamides towards symmetrical recognition sites allowed for the f/Py and f/Im terminal pairings to be directly compared by SPR, CD and DeltaT (M) experiments. The f/Py pairing, when placed next to the -ImPy- or -PyPy- central pairings, prefers A/T and T/A base pairs to G/C base pairs, suggesting that f/Py has similar DNA recognition specificity to Py/Py. With -ImPy- central pairings, f/Im prefers C/G base pairs (>10 times) to the other Watson-Crick base pairs; therefore, f/Im behaves like the Py/Im pair. However, the f/Im pairing is not selective for the C/G base pair when placed next to the -PyPy- central pairings.
Statistical properties of DNA sequences

NASA Technical Reports Server (NTRS)

Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

1995-01-01

We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Using NMR and molecular dynamics to link structure and dynamics effects of the universal base 8-aza, 7-deaza, N8 linked adenosine analog

PubMed Central

Spring-Connell, Alexander M.; Evich, Marina G.; Debelak, Harald; Seela, Frank; Germann, Markus W.

2016-01-01

A truly universal nucleobase enables a host of novel applications such as simplified templates for PCR primers, randomized sequencing and DNA based devices. A universal base must pair indiscriminately to each of the canonical bases with little or preferably no destabilization of the overall duplex. In reality, many candidates either destabilize the duplex or do not base pair indiscriminatingly. The novel base 8-aza-7-deazaadenine (pyrazolo[3,4-d]pyrimidin- 4-amine) N8-(2′deoxyribonucleoside), a deoxyadenosine analog (UB), pairs with each of the natural DNA bases with little sequence preference. We have utilized NMR complemented with molecular dynamic calculations to characterize the structure and dynamics of a UB incorporated into a DNA duplex. The UB participates in base stacking with little to no perturbation of the local structure yet forms an unusual base pair that samples multiple conformations. These local dynamics result in the complete disappearance of a single UB proton resonance under native conditions. Accommodation of the UB is additionally stabilized via heightened backbone conformational sampling. NMR combined with various computational techniques has allowed for a comprehensive characterization of both structural and dynamic effects of the UB in a DNA duplex and underlines that the UB as a strong candidate for universal base applications. PMID:27566150
Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

PubMed

Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

2013-01-01

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Enhancing genome assemblies by integrating non-sequence based data

PubMed Central

2011-01-01

Introduction Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. Methods The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Results Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone. Conclusions We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska. While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses. PMID:21554765

Enhancing genome assemblies by integrating non-sequence based data.

PubMed

Heider, Thomas N; Lindsay, James; Wang, Chenwei; O'Neill, Rachel J; Pask, Andrew J

2011-05-28

Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone. We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska. While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Okutsu, N.; Shimamura, K.; Shimizu, E.

To elucidate the effect of radicals on DNA base pairs, we investigated the attacking mechanism of OH and H radicals to the G-C and A-T base pairs, using the density functional theory (DFT) calculations in water approximated by the continuum solvation model. The DFT calculations revealed that the OH radical abstracts the hydrogen atom of a NH{sub 2} group of G or A base and induces a tautomeric reaction for an A-T base pair more significantly than for a G-C base pair. On the other hand, the H radical prefers to bind to the Cytosine NH{sub 2} group of G-Cmore » base pair and induce a tautomeric reaction from G-C to G*-C*, whose activation free energy is considerably small (−0.1 kcal/mol) in comparison with that (42.9 kcal/mol) for the reaction of an A-T base pair. Accordingly, our DFT calculations elucidated that OH and H radicals have a significant effect on A-T and G-C base pairs, respectively. This finding will be useful for predicting the effect of radiation on the genetic information recorded in the base sequences of DNA duplexes.« less
Genetic analysis of 430 Chinese Cynodon dactylon accessions using sequence-related amplified polymorphism markers.

PubMed

Huang, Chunqiong; Liu, Guodao; Bai, Changjun; Wang, Wenqiang

2014-10-21

Although Cynodon dactylon (C. dactylon) is widely distributed in China, information on its genetic diversity within the germplasm pool is limited. The objective of this study was to reveal the genetic variation and relationships of 430 C. dactylon accessions collected from 22 Chinese provinces using sequence-related amplified polymorphism (SRAP) markers. Fifteen primer pairs were used to amplify specific C. dactylon genomic sequences. A total of 481 SRAP fragments were generated, with fragment sizes ranging from 260-1800 base pairs (bp). Genetic similarity coefficients (GSC) among the 430 accessions averaged 0.72 and ranged from 0.53-0.96. Cluster analysis conducted by two methods, namely the unweighted pair-group method with arithmetic averages (UPGMA) and principle coordinate analysis (PCoA), separated the accessions into eight distinct groups. Our findings verify that Chinese C. dactylon germplasms have rich genetic diversity, which is an excellent basis for C. dactylon breeding for new cultivars.
Microbial Metagenomics: Beyond the Genome

NASA Astrophysics Data System (ADS)

Gilbert, Jack A.; Dupont, Christopher L.

2011-01-01

Metagenomics literally means “beyond the genome.” Marine microbial metagenomic databases presently comprise ˜400 billion base pairs of DNA, only ˜3% of that found in 1 ml of seawater. Very soon a trillion-base-pair sequence run will be feasible, so it is time to reflect on what we have learned from metagenomics. We review the impact of metagenomics on our understanding of marine microbial communities. We consider the studies facilitated by data generated through the Global Ocean Sampling expedition, as well as the revolution wrought at the individual laboratory level through next generation sequencing technologies. We review recent studies and discoveries since 2008, provide a discussion of bioinformatic analyses, including conceptual pipelines and sequence annotation and predict the future of metagenomics, with suggestions of collaborative community studies tailored toward answering some of the fundamental questions in marine microbial ecology.
MSuPDA: A Memory Efficient Algorithm for Sequence Alignment.

PubMed

Khan, Mohammad Ibrahim; Kamal, Md Sarwar; Chowdhury, Linkon

2016-03-01

Space complexity is a million dollar question in DNA sequence alignments. In this regard, memory saving under pushdown automata can help to reduce the occupied spaces in computer memory. Our proposed process is that anchor seed (AS) will be selected from given data set of nucleotide base pairs for local sequence alignment. Quick splitting techniques will separate the AS from all the DNA genome segments. Selected AS will be placed to pushdown automata's (PDA) input unit. Whole DNA genome segments will be placed into PDA's stack. AS from input unit will be matched with the DNA genome segments from stack of PDA. Match, mismatch and indel of nucleotides will be popped from the stack under the control unit of pushdown automata. During the POP operation on stack, it will free the memory cell occupied by the nucleotide base pair.
Molecular recognition of DNA base pairs by the formamido/pyrrole and formamido/imidazole pairings in stacked polyamides

PubMed Central

Buchmueller, Karen L.; Staples, Andrew M.; Uthe, Peter B.; Howard, Cameron M.; Pacheco, Kimberly A. O.; Cox, Kari K.; Henry, James A.; Bailey, Suzanna L.; Horick, Sarah M.; Nguyen, Binh; Wilson, W. David; Lee, Moses

2005-01-01

Polyamides containing an N-terminal formamido (f) group bind to the minor groove of DNA as staggered, antiparallel dimers in a sequence-specific manner. The formamido group increases the affinity and binding site size, and it promotes the molecules to stack in a staggered fashion thereby pairing itself with either a pyrrole (Py) or an imidazole (Im). There has not been a systematic study on the DNA recognition properties of the f/Py and f/Im terminal pairings. These pairings were analyzed here in the context of f-ImPyPy, f-ImPyIm, f-PyPyPy and f-PyPyIm, which contain the central pairing modes, –ImPy– and –PyPy–. The specificity of these triamides towards symmetrical recognition sites allowed for the f/Py and f/Im terminal pairings to be directly compared by SPR, CD and ΔTM experiments. The f/Py pairing, when placed next to the –ImPy– or –PyPy– central pairings, prefers A/T and T/A base pairs to G/C base pairs, suggesting that f/Py has similar DNA recognition specificity to Py/Py. With –ImPy– central pairings, f/Im prefers C/G base pairs (>10 times) to the other Watson–Crick base pairs; therefore, f/Im behaves like the Py/Im pair. However, the f/Im pairing is not selective for the C/G base pair when placed next to the –PyPy– central pairings. PMID:15703305
The use of molecular dynamics simulations to evaluate the DNA sequence-selectivity of G-A cross-linking PBD-duocarmycin dimers.

PubMed

Jackson, Paul J M; Rahman, Khondaker M; Thurston, David E

2017-01-01

The pyrrolobenzodiazepine (PBD) and duocarmycin families are DNA-interactive agents that covalently bond to guanine (G) and adenine (A) bases, respectively, and that have been joined together to create synthetic dimers capable of cross-linking G-G, A-A, and G-A bases. Three G-A alkylating dimers have been reported in publications to date, with defined DNA-binding sites proposed for two of them. In this study we have used molecular dynamics simulations to elucidate preferred DNA-binding sites for the three published molecular types. For the PBD-CPI dimer UTA-6026 (1), our simulations correctly predicted its favoured binding site (i.e., 5'-C(G)AATTA-3') as identified by DNA cleavage studies. However, for the PBD-CI molecule ('Compound 11', 3), we were unable to reconcile the results of our simulations with the reported preferred cross-linking sequence (5'-ATTTTCC(G)-3'). We found that the molecule is too short to span the five base pairs between the A and G bases as claimed, but should target instead a sequence such as 5'-ATTTC(G)-3' with two less base pairs between the reacting G and A residues. Our simulation results for this hybrid dimer are also in accord with the very low interstrand cross-linking and in vitro cytotoxicity activities reported for it. Although a preferred cross-linking sequence was not reported for the third hybrid dimer ('27eS', 2), our simulations predict that it should span two base pairs between covalently reacting G and A bases (e.g., 5'-GTAT(A)-3'). Copyright Â© 2016. Published by Elsevier Ltd.
Apolipoprotein B-52 mutation associated with hypobetalipoproteinemia is compatible with a misaligned pairing deletion mechanism.

PubMed

Groenewegen, W A; Krul, E S; Schonfeld, G

1993-06-01

We have identified a new truncation of apoB in a large kindred with hypobetalipoproteinemia that arose by an ambiguous deletion of one of four different groups of base-pairs. Eleven affected members of the kindred had total cholesterols (C) of 114 +/- 28, LDL-Cs of 46 +/- 21, and apoBs of 47 +/- 25 (all in mg/dl, mean +/- SD). These levels were lower (P < 0.0001) than in 15 unaffected relatives. On Western blotting, apoB-100 and a second major band corresponding to apoB-52 were seen in the affected individuals. The majority of the plasma apoB-52 was associated with a smaller than normal low density lipoprotein (LDL) particle. The molecular basis for this apoB-52 truncation is a 5-bp deletion, converting the sequence between cDNA nucleotide 7276 and 7283 from 5'-AAGTTAAG-3' into the mutant sequence 5'-AAG-3'. This results in a frameshift starting at amino acid residue 2357 and a termination codon at amino acid residue 2362. Deletion of one of four different groups of five consecutive bases, i.e., AAGTT, AGTTA, GTTAA, and TTAAG, all result in the same mutant sequence. Thus, the precise deletion is ambiguous. We propose that a misaligned pairing mechanism involving repeat sequences is compatible with this deletion mutation. We have noted similar ambiguous deletions associated with apoB-37, apoB-40, and a number of single base deletions and some may also be explained by a misaligned pairing mechanism. Small ambiguous deletions appear to constitute a major proportion of the apoB gene mutation spectrum suggesting that it may be a suitable model for studying the mechanisms of such mutations.
The 5S ribosomal RNAs of Paracoccus denitrificans and Prochloron

NASA Technical Reports Server (NTRS)

Mackay, R. M.; Salgado, D.; Bonen, L.; Doolittle, W. F.; Stackebrandt, E.

1982-01-01

The nucleotide sequences of the 5S rRNAs of Paracoccus denitrificans and Prochloron sp. are presented, along with the demonstrated phylogenetic relationships of P. denitrificans with purple nonsulfur bacteria, and of Prochloron with cyanobacteria. Structural findings include the following: (1) helix II in both models is much shorter than in other eubacteria, (2) a base-pair has been deleted from helix IV of P. denitrificans 5S, and (3) Prochloron 5S has the potential to form four base-pairs between residues. Also covered are the differences between pairs of sequences in P. denitrificans, Prochloron, wheat mitochondion, spinach chloroplast, and nine diverse eubacteria. Findings include the observation that Prochloron 5S rRNA is much more similar to the 5S of the cyanobacterium Anacystis nidulans (25 percent difference) than either are to any of the other nine eubacterial 5S rRNAs.
Accelerated probabilistic inference of RNA structure evolution

PubMed Central

Holmes, Ian

2005-01-01

Background Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. Results We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. Conclusion A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License. PMID:15790387
MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples.

PubMed

Malekpour, Seyed Amir; Pezeshk, Hamid; Sadeghi, Mehdi

2018-03-05

Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.
Base pairing and base mis-pairing in nucleic acids

NASA Technical Reports Server (NTRS)

Wang, A. H. J.; Rich, A.

1986-01-01

In recent years we have learned that DNA is conformationally active. It can exist in a number of different stable conformations including both right-handed and left-handed forms. Using single crystal X-ray diffraction analysis we are able to discover not only additional conformations of the nucleic acids but also different types of hydrogen bonded base-base interactions. Although Watson-Crick base pairings are the predominant type of interaction in double helical DNA, they are not the only types. Recently, we have been able to examine mismatching of guanine-thymine base pairs in left-handed Z-DNA at atomic resolution (1A). A minimum amount of distortion of the sugar phosphate backbone is found in the G x T pairing in which the bases are held together by two hydrogen bonds in the wobble pairing interaction. Because of the high resolution of the analysis we can visualize water molecules which fill in to accommodate the other hydrogen bonding positions in the bases which are not used in the base-base interactions. Studies on other DNA oligomers have revealed that other types of non-Watson-Crick hydrogen bonding interactions can occur. In the structure of a DNA octamer with the sequence d(GCGTACGC) complexed to an antibiotic triostin A, it was found that the two central AT base pairs are held together by Hoogsteen rather than Watson-Crick base pairs. Similarly, the G x C base pairs at the ends are also Hoogsteen rather than Watson-Crick pairing. Hoogsteen base pairs make a modified helix which is distinct from the Watson-Crick double helix.
Environmental assessment for the proposed construction and operation of a Genome Sequencing Facility in Building 64 at Lawrence Berkeley Laboratory, Berkeley, California

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1995-04-01

This document is an Environmental Assessment (EA) for a proposed project to modify 14,900 square feet of an existing building (Building 64) at Lawrence Berkeley Laboratory (LBL) to operate as a Genome Sequencing Facility. This EA addresses the potential environmental impacts from the proposed modifications to Building 64 and operation of the Genome Sequencing Facility. The proposed action is to modify Building 64 to provide space and equipment allowing LBL to demonstrate that the Directed DNA Sequencing Strategy can be scaled up from the current level of 750,000 base pairs per year to a facility that produces over 6,000,000 basemore » pairs per year, while still retaining its efficiency.« less
Differentiating founder and chronic HIV envelope sequences

PubMed Central

Maher, Stephen; Mota, Talia; Suzuki, Kazuo; Kelleher, Anthony D.

2017-01-01

Significant progress has been made in characterizing broadly neutralizing antibodies against the HIV envelope glycoprotein Env, but an effective vaccine has proven elusive. Vaccine development would be facilitated if common features of early founder virus required for transmission could be identified. Here we employ a combination of bioinformatic and operations research methods to determine the most prevalent features that distinguish 78 subtype B and 55 subtype C founder Env sequences from an equal number of chronic sequences. There were a number of equivalent optimal networks (based on the fewest covarying amino acid (AA) pairs or a measure of maximal covariance) that separated founders from chronics: 13 pairs for subtype B and 75 for subtype C. Every subtype B optimal solution contained the founder pairs 178–346 Asn-Val, 232–236 Thr-Ser, 240–340 Lys-Lys, 279–315 Asp-Lys, 291–792 Ala-Ile, 322–347 Asp-Thr, 535–620 Leu-Asp, 742–837 Arg-Phe, and 750–836 Asp-Ile; the most common optimal pairs for subtype C were 644–781 Lys-Ala (74 of 75 networks), 133–287 Ala-Gln (73/75) and 307–337 Ile-Gln (73/75). No pair was present in all optimal subtype C solutions highlighting the difficulty in targeting transmission with a single vaccine strain. Relative to the size of its domain (0.35% of Env), the α4β7 binding site occurred most frequently among optimal pairs, especially for subtype C: 4.2% of optimal pairs (1.2% for subtype B). Early sequences from 5 subtype B pre-seroconverters each exhibited at least one clone containing an optimal feature 553–624 (Ser-Asn), 724–747 (Arg-Arg), or 46–293 (Arg-Glu). PMID:28187204
The crystal structure of an oligo(U):pre-mRNA duplex from a trypanosome RNA editing substrate

PubMed Central

Mooers, Blaine H.M.; Singh, Amritanshu

2011-01-01

Guide RNAs bind antiparallel to their target pre-mRNAs to form editing substrates in reaction cycles that insert or delete uridylates (Us) in most mitochondrial transcripts of trypanosomes. The 5′ end of each guide RNA has an anchor sequence that binds to the pre-mRNA by base-pair complementarity. The template sequence in the middle of the guide RNA directs the editing reactions. The 3′ ends of most guide RNAs have ∼15 contiguous Us that bind to the purine-rich unedited pre-mRNA upstream of the editing site. The resulting U-helix is rich in G·U wobble base pairs. To gain insights into the structure of the U-helix, we crystallized 8 bp of the U-helix in one editing substrate for the A6 mRNA of Trypanosoma brucei. The fragment provides three samples of the 5′-AGA-3′/5′-UUU-3′ base-pair triple. The fusion of two identical U-helices head-to-head promoted crystallization. We obtained X-ray diffraction data with a resolution limit of 1.37 Å. The U-helix had low and high twist angles before and after each G·U wobble base pair; this variation was partly due to shearing of the wobble base pairs as revealed in comparisons with a crystal structure of a 16-nt RNA with all Watson–Crick base pairs. Both crystal structures had wider major grooves at the junction between the poly(U) and polypurine tracts. This junction mimics the junction between the template helix and the U-helix in RNA-editing substrates and may be a site of major groove invasion by RNA editing proteins. PMID:21878548
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

PubMed

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.
DNA barcoding Indian freshwater fishes.

PubMed

Lakra, Wazir Singh; Singh, M; Goswami, Mukunda; Gopalakrishnan, A; Lal, K K; Mohindra, V; Sarkar, U K; Punia, P P; Singh, K V; Bhatt, J P; Ayyappan, S

2016-11-01

DNA barcoding is a promising technique for species identification using a short mitochondrial DNA sequence of cytochrome c oxidase I (COI) gene. In the present study, DNA barcodes were generated from 72 species of freshwater fish covering the Orders Cypriniformes, Siluriformes, Perciformes, Synbranchiformes, and Osteoglossiformes representing 50 genera and 19 families. All the samples were collected from diverse sites except the species endemic to a particular location. Species were represented by multiple specimens in the great majority of the barcoded species. A total of 284 COI sequences were generated. After amplification and sequencing of 700 base pair fragment of COI, primers were trimmed which invariably generated a 655 base pair barcode sequence. The average Kimura two-parameter (K2P) distances within-species, genera, families, and orders were 0.40%, 9.60%, 13.10%, and 17.16%, respectively. DNA barcode discriminated congeneric species without any confusion. The study strongly validated the efficiency of COI as an ideal marker for DNA barcoding of Indian freshwater fishes.
Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

PubMed

Khedkar, Supriya; Seshasayee, Aswin Sai Narain

2016-06-01

Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. Copyright © 2016 Khedkar and Seshasayee.
SELEX and SHAPE reveal that sequence motifs and an extended hairpin in the 5' portion of Turnip crinkle virus satellite RNA C mediate fitness in plants.

PubMed

Bayne, Charlie F; Widawski, Max E; Gao, Feng; Masab, Mohammed H; Chattopadhyay, Maitreyi; Murawski, Allison M; Sansevere, Robert M; Lerner, Bryan D; Castillo, Rinaldys J; Griesman, Trevor; Fu, Jiantao; Hibben, Jennifer K; Garcia-Perez, Alma D; Simon, Anne E; Kushner, David B

2018-07-01

Noncoding RNAs use their sequence and/or structure to mediate function(s). The 5' portion (166 nt) of the 356-nt noncoding satellite RNA C (satC) of Turnip crinkle virus (TCV) was previously modeled to contain a central region with two stem-loops (H6 and H7) and a large connecting hairpin (H2). We now report that in vivo functional selection (SELEX) experiments assessing sequence/structure requirements in H2, H6, and H7 reveal that H6 loop sequence motifs were recovered at nonrandom rates and only some residues are proposed to base-pair with accessible complementary sequences within the 5' central region. In vitro SHAPE of SELEX winners indicates that the central region is heavily base-paired, such that along with the lower stem and H2 region, one extensive hairpin exists composing the entire 5' region. As these SELEX winners are highly fit, these characteristics facilitate satRNA amplification in association with TCV in plants. Copyright © 2018 Elsevier Inc. All rights reserved.
Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

PubMed Central

Khedkar, Supriya; Seshasayee, Aswin Sai Narain

2016-01-01

Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. PMID:27172194

Effect of genome sequence on the force-induced unzipping of a DNA molecule.

PubMed

Singh, N; Singh, Y

2006-02-01

We considered a dsDNA polymer in which distribution of bases are random at the base pair level but ordered at a length of 18 base pairs and calculated its force elongation behaviour in the constant extension ensemble. The unzipping force F(y) vs. extension y is found to have a series of maxima and minima. By changing base pairs at selected places in the molecule we calculated the change in F(y) curve and found that the change in the value of force is of the order of few pN and the range of the effect depending on the temperature, can spread over several base pairs. We have also discussed briefly how to calculate in the constant force ensemble a pause or a jump in the extension-time curve from the knowledge of F(y).
Chemical probes of the conformation of DNA modified by cis-diamminedichloroplatinum(II)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marrot, L.; Leng, M.

The purpose of this work was to analyze at the nucleotide level the distortions induced by the binding of cis-diamminedichloroplatinum(II) (cis-DDP) to DNA by means of chemical probes. In order to test the chemical probes, experiments were first carried out on two platinated oligonucleotides. It has been verified by circular dichroism and gel electrophoresis that the binding of cis-DDP to an AG or to a GTG site within a double-stranded oligonucleotide distorts the double helix. The reactivity of the oligonucleotide platinated at the GTG site with chloroacetaldehyde, diethyl pyrocarbonate, and osmium tetraoxide, respectively, suggests a local denaturation of the doublemore » helix. The 5'G residue and the T residue within the adduct are no longer paired, while the 3'G residue is paired. The double helix is more distorted (but not denatured) at the 5' side of the adduct than at the 3' side. The reactivities of the chemical probes with six platinated DNA restriction fragments show that even at a relatively high level of platination only a few base pairs are unpaired but the double helix is largely distorted. No local denaturation has been detected at the GG sites separated from the nearest GG or AG sites by at least three base pairs. The AG sites separated from the nearest AG or GG sites by at least three base pairs do not denature the double helix locally when they are in the sequences puAG/pyTC. It is suggested that the distortion within these sequences is induced by adducts located further away along the DNA fragments, these sequences not being the major sites for the binding of cis-DDP.« less
Next generation sequencing applications for microRNA biomarker discovery in toxicological studies

EPA Science Inventory

Next Generation Sequencing (NGS) technology will be reviewed for its base pair resolution, wide dynamic range, and insights into the genome and transcriptome, with special focus upon the biomarker potential of microRNAs (miRNAs). The first part of this presentation reviews commo...
Crystallization and preliminary X-ray diffraction analysis of the Bacillus subtilis replication termination protein in complex with the 37-base-pair TerI-binding site

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vivian, J. P.; Porter, C.; Wilce, J. A.

2006-11-01

A preparation of replication terminator protein (RTP) of B. subtilis and a 37-base-pair TerI sequence (comprising two binding sites for RTP) has been purified and crystallized. The replication terminator protein (RTP) of Bacillus subtilis binds to specific DNA sequences that halt the progression of the replisome in a polar manner. These terminator complexes flank a defined region of the chromosome into which they allow replication forks to enter but not exit. Forcing the fusion of replication forks in a specific zone is thought to allow the coordination of post-replicative processes. The functional terminator complex comprises two homodimers each of 29more » kDa bound to overlapping binding sites. A preparation of RTP and a 37-base-pair TerI sequence (comprising two binding sites for RTP) has been purified and crystallized. A data set to 3.9 Å resolution with 97.0% completeness and an R{sub sym} of 12% was collected from a single flash-cooled crystal using synchrotron radiation. The diffraction data are consistent with space group P622, with unit-cell parameters a = b = 118.8, c = 142.6 Å.« less
Optical properties and electronic transitions of DNA oligonucleotides as a function of composition and stacking sequence.

PubMed

Schimelman, Jacob B; Dryden, Daniel M; Poudel, Lokendra; Krawiec, Katherine E; Ma, Yingfang; Podgornik, Rudolf; Parsegian, V Adrian; Denoyer, Linda K; Ching, Wai-Yim; Steinmetz, Nicole F; French, Roger H

2015-02-14

The role of base pair composition and stacking sequence in the optical properties and electronic transitions of DNA is of fundamental interest. We present and compare the optical properties of DNA oligonucleotides (AT)10, (AT)5(GC)5, and (AT-GC)5 using both ab initio methods and UV-vis molar absorbance measurements. Our data indicate a strong dependence of both the position and intensity of UV absorbance features on oligonucleotide composition and stacking sequence. The partial densities of states for each oligonucleotide indicate that the valence band edge arises from a feature associated with the PO4(3-) complex anion, and the conduction band edge arises from anti-bonding states in DNA base pairs. The results show a strong correspondence between the ab initio and experimentally determined optical properties. These results highlight the benefit of full spectral analysis of DNA, as opposed to reductive methods that consider only the 260 nm absorbance (A260) or simple purity ratios, such as A260/A230 or A260/A280, and suggest that the slope of the absorption edge onset may provide a useful metric for the degree of base pair stacking in DNA. These insights may prove useful for applications in biology, bioelectronics, and mesoscale self-assembly.
Breathing dynamics based parameter sensitivity analysis of hetero-polymeric DNA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Talukder, Srijeeta; Sen, Shrabani; Chaudhury, Pinaki, E-mail: pinakc@rediffmail.com

We study the parameter sensitivity of hetero-polymeric DNA within the purview of DNA breathing dynamics. The degree of correlation between the mean bubble size and the model parameters is estimated for this purpose for three different DNA sequences. The analysis leads us to a better understanding of the sequence dependent nature of the breathing dynamics of hetero-polymeric DNA. Out of the 14 model parameters for DNA stability in the statistical Poland-Scheraga approach, the hydrogen bond interaction ε{sub hb}(AT) for an AT base pair and the ring factor ξ turn out to be the most sensitive parameters. In addition, the stackingmore » interaction ε{sub st}(TA-TA) for an TA-TA nearest neighbor pair of base-pairs is found to be the most sensitive one among all stacking interactions. Moreover, we also establish that the nature of stacking interaction has a deciding effect on the DNA breathing dynamics, not the number of times a particular stacking interaction appears in a sequence. We show that the sensitivity analysis can be used as an effective measure to guide a stochastic optimization technique to find the kinetic rate constants related to the dynamics as opposed to the case where the rate constants are measured using the conventional unbiased way of optimization.« less
MSuPDA: A memory efficient algorithm for sequence alignment.

PubMed

Khan, Mohammad Ibrahim; Kamal, Md Sarwar; Chowdhury, Linkon

2015-01-16

Space complexity is a million dollar question in DNA sequence alignments. In this regards, MSuPDA (Memory Saving under Pushdown Automata) can help to reduce the occupied spaces in computer memory. Our proposed process is that Anchor Seed (AS) will be selected from given data set of Nucleotides base pairs for local sequence alignment. Quick Splitting (QS) techniques will separate the Anchor Seed from all the DNA genome segments. Selected Anchor Seed will be placed to pushdown Automata's (PDA) input unit. Whole DNA genome segments will be placed into PDA's stack. Anchor Seed from input unit will be matched with the DNA genome segments from stack of PDA. Whatever matches, mismatches or Indel, of Nucleotides will be POP from the stack under the control of control unit of Pushdown Automata. During the POP operation on stack it will free the memory cell occupied by the Nucleotide base pair.
Genetic code, hamming distance and stochastic matrices.

PubMed

He, Matthew X; Petoukhov, Sergei V; Ricci, Paolo E

2004-09-01

In this paper we use the Gray code representation of the genetic code C=00, U=10, G=11 and A=01 (C pairs with G, A pairs with U) to generate a sequence of genetic code-based matrices. In connection with these code-based matrices, we use the Hamming distance to generate a sequence of numerical matrices. We then further investigate the properties of the numerical matrices and show that they are doubly stochastic and symmetric. We determine the frequency distributions of the Hamming distances, building blocks of the matrices, decomposition and iterations of matrices. We present an explicit decomposition formula for the genetic code-based matrix in terms of permutation matrices, which provides a hypercube representation of the genetic code. It is also observed that there is a Hamiltonian cycle in a genetic code-based hypercube.
Spectra library assisted de novo peptide sequencing for HCD and ETD spectra pairs.

PubMed

Yan, Yan; Zhang, Kaizhong

2016-12-23

De novo peptide sequencing via tandem mass spectrometry (MS/MS) has been developed rapidly in recent years. With the use of spectra pairs from the same peptide under different fragmentation modes, performance of de novo sequencing is greatly improved. Currently, with large amount of spectra sequenced everyday, spectra libraries containing tens of thousands of annotated experimental MS/MS spectra become available. These libraries provide information of the spectra properties, thus have the potential to be used with de novo sequencing to improve its performance. In this study, an improved de novo sequencing method assisted with spectra library is proposed. It uses spectra libraries as training datasets and introduces significant scores of the features used in our previous de novo sequencing method for HCD and ETD spectra pairs. Two pairs of HCD and ETD spectral datasets were used to test the performance of the proposed method and our previous method. The results show that this proposed method achieves better sequencing accuracy with higher ranked correct sequences and less computational time. This paper proposed an advanced de novo sequencing method for HCD and ETD spectra pair and used information from spectra libraries and significant improved previous similar methods.
SOPRA: Scaffolding algorithm for paired reads via statistical optimization.

PubMed

Dayarian, Adel; Michael, Todd P; Sengupta, Anirvan M

2010-06-24

High throughput sequencing (HTS) platforms produce gigabases of short read (<100 bp) data per run. While these short reads are adequate for resequencing applications, de novo assembly of moderate size genomes from such reads remains a significant challenge. These limitations could be partially overcome by utilizing mate pair technology, which provides pairs of short reads separated by a known distance along the genome. We have developed SOPRA, a tool designed to exploit the mate pair/paired-end information for assembly of short reads. The main focus of the algorithm is selecting a sufficiently large subset of simultaneously satisfiable mate pair constraints to achieve a balance between the size and the quality of the output scaffolds. Scaffold assembly is presented as an optimization problem for variables associated with vertices and with edges of the contig connectivity graph. Vertices of this graph are individual contigs with edges drawn between contigs connected by mate pairs. Similar graph problems have been invoked in the context of shotgun sequencing and scaffold building for previous generation of sequencing projects. However, given the error-prone nature of HTS data and the fundamental limitations from the shortness of the reads, the ad hoc greedy algorithms used in the earlier studies are likely to lead to poor quality results in the current context. SOPRA circumvents this problem by treating all the constraints on equal footing for solving the optimization problem, the solution itself indicating the problematic constraints (chimeric/repetitive contigs, etc.) to be removed. The process of solving and removing of constraints is iterated till one reaches a core set of consistent constraints. For SOLiD sequencer data, SOPRA uses a dynamic programming approach to robustly translate the color-space assembly to base-space. For assessing the quality of an assembly, we report the no-match/mismatch error rate as well as the rates of various rearrangement errors. Applying SOPRA to real data from bacterial genomes, we were able to assemble contigs into scaffolds of significant length (N50 up to 200 Kb) with very few errors introduced in the process. In general, the methodology presented here will allow better scaffold assemblies of any type of mate pair sequencing data.
PRISE2: software for designing sequence-selective PCR primers and probes.

PubMed

Huang, Yu-Ting; Yang, Jiue-in; Chrobak, Marek; Borneman, James

2014-09-25

PRISE2 is a new software tool for designing sequence-selective PCR primers and probes. To achieve high level of selectivity, PRISE2 allows the user to specify a collection of target sequences that the primers are supposed to amplify, as well as non-target sequences that should not be amplified. The program emphasizes primer selectivity on the 3' end, which is crucial for selective amplification of conserved sequences such as rRNA genes. In PRISE2, users can specify desired properties of primers, including length, GC content, and others. They can interactively manipulate the list of candidate primers, to choose primer pairs that are best suited for their needs. A similar process is used to add probes to selected primer pairs. More advanced features include, for example, the capability to define a custom mismatch penalty function. PRISE2 is equipped with a graphical, user-friendly interface, and it runs on Windows, Macintosh or Linux machines. PRISE2 has been tested on two very similar strains of the fungus Dactylella oviparasitica, and it was able to create highly selective primers and probes for each of them, demonstrating the ability to create useful sequence-selective assays. PRISE2 is a user-friendly, interactive software package that can be used to design high-quality selective primers for PCR experiments. In addition to choosing primers, users have an option to add a probe to any selected primer pair, enabling design of Taqman and other primer-probe based assays. PRISE2 can also be used to design probes for FISH and other hybridization-based assays.
Complete chloroplast genome sequence and comparative analysis of loblolly pine (Pinus taeda L.) with related species

PubMed Central

Khan, Abdul Latif; Khan, Muhammad Aaqil; Shahzad, Raheem; Lubna; Kang, Sang Mo; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

2018-01-01

Pinaceae, the largest family of conifers, has a diversified organization of chloroplast (cp) genomes with two typical highly reduced inverted repeats (IRs). In the current study, we determined the complete sequence of the cp genome of an economically and ecologically important conifer tree, the loblolly pine (Pinus taeda L.), using Illumina paired-end sequencing and compared the sequence with those of other pine species. The results revealed a genome size of 121,531 base pairs (bp) containing a pair of 830-bp IR regions, distinguished by a small single copy (42,258 bp) and large single copy (77,614 bp) region. The chloroplast genome of P. taeda encodes 120 genes, comprising 81 protein-coding genes, four ribosomal RNA genes, and 35 tRNA genes, with 151 randomly distributed microsatellites. Approximately 6 palindromic, 34 forward, and 22 tandem repeats were found in the P. taeda cp genome. Whole cp genome comparison with those of other Pinus species exhibited an overall high degree of sequence similarity, with some divergence in intergenic spacers. Higher and lower numbers of indels and single-nucleotide polymorphism substitutions were observed relative to P. contorta and P. monophylla, respectively. Phylogenomic analyses based on the complete genome sequence revealed that 60 shared genes generated trees with the same topologies, and P. taeda was closely related to P. contorta in the subgenus Pinus. Thus, the complete P. taeda genome provided valuable resources for population and evolutionary studies of gymnosperms and can be used to identify related species. PMID:29596414
Sequence specificity of mutagen-nucleic acid complexes in solution: intercalation and mutagen-base pair overlap geometries for proflavine binding to dC-dC-dG-dG and dG-dG-dC-dC self-complementary duplexes.

PubMed

Patel, D J; Canuel, L L

1977-07-01

The complex formed between the mutagen proflavine and the dC-dC-dG-dG and dG-dG-dC-dC self-complementary tetranucleotide duplexes has been monitored by proton high resolution nuclear magnetic resonance spectroscopy in 0.1 M phosphate solution at high nucleotide/drug ratios. The large upfield shifts (0.5 to 0.85 ppm) observed at all the proflavine ring nonexchangeable protons on complex formation are consistent with intercalation of the mutagen between base pairs of the tetranucleotide duplex. We have proposed an approximate overlap geometry between the proflavine ring and nearest neighbor base pairs at the intercalation site from a comparison between experimental shifts and those calculated for various stacking orientations. We have compared the binding of actinomycin D, propidium diiodide, and proflavine to self-complementary tetranucleotide sequences dC-dC-dG-dG and dG-dG-dC-dC by UV absorbance changes in the drug bands between 400 and 500 nm. Actinomycin D exhibits a pronounced specificity for sequences with dG-dC sites (dG-dG-dC-dC), while propidium diiodide and proflavine exhibit a specificity for sequences with dC-dG sites (dC-dC-dG-dG). Actinomycin D binds more strongly than propidium diiodide and proflavine to dC-dG-dC-dG (contains dC-dG and dG-dC binding sites), indicative of the additional stabilization from hydrogen bonding and hydrophobic interactions between the pentapeptide lactone rings of actinomycin D and the base pair edges and sugar-phosphate backbone of the tetranucleotide duplex.
Sequence specificity of mutagen-nucleic acid complexes in solution: Intercalation and mutagen-base pair overlap geometries for proflavine binding to dC-dC-dG-dG and dG-dG-dC-dC self-complementary duplexes

PubMed Central

Patel, Dinshaw J.; Canuel, Lita L.

1977-01-01

The complex formed between the mutagen proflavine and the dC-dC-dG-dG and dG-dG-dC-dC self-complementary tetranucleotide duplexes has been monitored by proton high resolution nuclear magnetic resonance spectroscopy in 0.1 M phosphate solution at high nucleotide/drug ratios. The large upfield shifts (0.5 to 0.85 ppm) observed at all the proflavine ring nonexchangeable protons on complex formation are consistent with intercalation of the mutagen between base pairs of the tetranucleotide duplex. We have proposed an approximate overlap geometry between the proflavine ring and nearest neighbor base pairs at the intercalation site from a comparison between experimental shifts and those calculated for various stacking orientations. We have compared the binding of actinomycin D, propidium diiodide, and proflavine to self-complementary tetranucleotide sequences dC-dC-dG-dG and dG-dG-dC-dC by UV absorbance changes in the drug bands between 400 and 500 nm. Actinomycin D exhibits a pronounced specificity for sequences with dG-dC sites (dG-dG-dC-dC), while propidium diiodide and proflavine exhibit a specificity for sequences with dC-dG sites (dC-dC-dG-dG). Actinomycin D binds more strongly than propidium diiodide and proflavine to dC-dG-dC-dG (contains dC-dG and dG-dC binding sites), indicative of the additional stabilization from hydrogen bonding and hydrophobic interactions between the pentapeptide lactone rings of actinomycin D and the base pair edges and sugar-phosphate backbone of the tetranucleotide duplex. PMID:268613
Phylogenetic species identification in Rattus highlights rapid radiation and morphological similarity of New Guinean species.

PubMed

Robins, Judith H; Tintinger, Vernon; Aplin, Ken P; Hingston, Melanie; Matisoo-Smith, Elizabeth; Penny, David; Lavery, Shane D

2014-01-01

The genus Rattus is highly speciose, the taxonomy is complex, and individuals are often difficult to identify to the species level. Previous studies have demonstrated the usefulness of phylogenetic approaches to identification in Rattus but some species, especially among the endemics of the New Guinean region, showed poor resolution. Possible reasons for this are simple misidentification, incomplete gene lineage sorting, hybridization, and phylogenetically distinct lineages that are unrecognised taxonomically. To assess these explanations we analysed 217 samples, representing nominally 25 Rattus species, collected in New Guinea, Asia, Australia and the Pacific. To reduce misidentification problems we sequenced museum specimens from earlier morphological studies and recently collected tissues from samples with associated voucher specimens. We also reassessed vouchers from previously sequenced specimens. We inferred combined and separate phylogenies from two mitochondrial DNA regions comprising 550 base pair D-loop sequences and both long (655 base pair) and short (150 base pair) cytochrome oxidase I sequences. Our phylogenetic species identification for 17 species was consistent with morphological designations and current taxonomy thus reinforcing the usefulness of this approach. We reduced misidentifications and consequently the number of polyphyletic species in our phylogenies but the New Guinean Rattus clades still exhibited considerable complexity. Only three of our eight New Guinean species were monophyletic. We found good evidence for either incomplete mitochondrial lineage sorting or hybridization between species within two pairs, R. leucopus/R. cf. verecundus and R. steini/R. praetor. Additionally, our results showed that R. praetor, R. niobe and R. verecundus each likely encompass more than one species. Our study clearly points to the need for a revised taxonomy of the rats of New Guinea, based on broader sampling and informed by both morphology and phylogenetics. The remaining taxonomic complexity highlights the recent and rapid radiation of Rattus in the Australo-Papuan region.
Phylogenetic Species Identification in Rattus Highlights Rapid Radiation and Morphological Similarity of New Guinean Species

PubMed Central

Robins, Judith H.; Tintinger, Vernon; Aplin, Ken P.; Hingston, Melanie; Matisoo-Smith, Elizabeth; Penny, David; Lavery, Shane D.

2014-01-01

The genus Rattus is highly speciose, the taxonomy is complex, and individuals are often difficult to identify to the species level. Previous studies have demonstrated the usefulness of phylogenetic approaches to identification in Rattus but some species, especially among the endemics of the New Guinean region, showed poor resolution. Possible reasons for this are simple misidentification, incomplete gene lineage sorting, hybridization, and phylogenetically distinct lineages that are unrecognised taxonomically. To assess these explanations we analysed 217 samples, representing nominally 25 Rattus species, collected in New Guinea, Asia, Australia and the Pacific. To reduce misidentification problems we sequenced museum specimens from earlier morphological studies and recently collected tissues from samples with associated voucher specimens. We also reassessed vouchers from previously sequenced specimens. We inferred combined and separate phylogenies from two mitochondrial DNA regions comprising 550 base pair D-loop sequences and both long (655 base pair) and short (150 base pair) cytochrome oxidase I sequences. Our phylogenetic species identification for 17 species was consistent with morphological designations and current taxonomy thus reinforcing the usefulness of this approach. We reduced misidentifications and consequently the number of polyphyletic species in our phylogenies but the New Guinean Rattus clades still exhibited considerable complexity. Only three of our eight New Guinean species were monophyletic. We found good evidence for either incomplete mitochondrial lineage sorting or hybridization between species within two pairs, R. leucopus/R. cf. verecundus and R. steini/R. praetor. Additionally, our results showed that R. praetor, R. niobe and R. verecundus each likely encompass more than one species. Our study clearly points to the need for a revised taxonomy of the rats of New Guinea, based on broader sampling and informed by both morphology and phylogenetics. The remaining taxonomic complexity highlights the recent and rapid radiation of Rattus in the Australo-Papuan region. PMID:24865350
A Three-Dimensional RNA Motif in Potato spindle tuber viroid Mediates Trafficking from Palisade Mesophyll to Spongy Mesophyll in Nicotiana benthamiana[W

PubMed Central

Takeda, Ryuta; Petrov, Anton I.; Leontis, Neocles B.; Ding, Biao

2011-01-01

Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5′-CGA-3′...5′-GAC-3′ flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes. PMID:21258006
A three-dimensional RNA motif in Potato spindle tuber viroid mediates trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana.

PubMed

Takeda, Ryuta; Petrov, Anton I; Leontis, Neocles B; Ding, Biao

2011-01-01

Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5'-CGA-3'...5'-GAC-3' flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes.
Pseudomonas sp. strain CA5 (a selenite-reducing bacterium) 16S rRNA gene complete sequence. National Institute of Health, National Center for Biotechnology Information, GenBank sequence. Accession FJ422810.1.

USDA-ARS?s Scientific Manuscript database

This study used 1321 base pair 16S rRNA gene sequence methods to confirm the phylogenetic position of a soil isolate as a bacterium belonging to the genus Pesudomonas sp. Morphological, biochemical characteristics, and fatty acid profiles are consistent with the 16S rRNA gene sequence identification...
Prediction of RNA secondary structures: from theory to models and real molecules

NASA Astrophysics Data System (ADS)

Schuster, Peter

2006-05-01

RNA secondary structures are derived from RNA sequences, which are strings built form the natural four letter nucleotide alphabet, {AUGC}. These coarse-grained structures, in turn, are tantamount to constrained strings over a three letter alphabet. Hence, the secondary structures are discrete objects and the number of sequences always exceeds the number of structures. The sequences built from two letter alphabets form perfect structures when the nucleotides can form a base pair, as is the case with {GC} or {AU}, but the relation between the sequences and structures differs strongly from the four letter alphabet. A comprehensive theory of RNA structure is presented, which is based on the concepts of sequence space and shape space, being a space of structures. It sets the stage for modelling processes in ensembles of RNA molecules like evolutionary optimization or kinetic folding as dynamical phenomena guided by mappings between the two spaces. The number of minimum free energy (mfe) structures is always smaller than the number of sequences, even for two letter alphabets. Folding of RNA molecules into mfe energy structures constitutes a non-invertible mapping from sequence space onto shape space. The preimage of a structure in sequence space is defined as its neutral network. Similarly the set of suboptimal structures is the preimage of a sequence in shape space. This set represents the conformation space of a given sequence. The evolutionary optimization of structures in populations is a process taking place in sequence space, whereas kinetic folding occurs in molecular ensembles that optimize free energy in conformation space. Efficient folding algorithms based on dynamic programming are available for the prediction of secondary structures for given sequences. The inverse problem, the computation of sequences for predefined structures, is an important tool for the design of RNA molecules with tailored properties. Simultaneous folding or cofolding of two or more RNA molecules can be modelled readily at the secondary structure level and allows prediction of the most stable (mfe) conformations of complexes together with suboptimal states. Cofolding algorithms are important tools for efficient and highly specific primer design in the polymerase chain reaction (PCR) and help to explain the mechanisms of small interference RNA (si-RNA) molecules in gene regulation. The evolutionary optimization of RNA structures is illustrated by the search for a target structure and mimics aptamer selection in evolutionary biotechnology. It occurs typically in steps consisting of short adaptive phases interrupted by long epochs of little or no obvious progress in optimization. During these quasi-stationary epochs the populations are essentially confined to neutral networks where they search for sequences that allow a continuation of the adaptive process. Modelling RNA evolution as a simultaneous process in sequence and shape space provides answers to questions of the optimal population size and mutation rates. Kinetic folding is a stochastic process in conformation space. Exact solutions are derived by direct simulation in the form of trajectory sampling or by solving the master equation. The exact solutions can be approximated straightforwardly by Arrhenius kinetics on barrier trees, which represent simplified versions of conformational energy landscapes. The existence of at least one sequence forming any arbitrarily chosen pair of structures is granted by the intersection theorem. Folding kinetics is the key to understanding and designing multistable RNA molecules or RNA switches. These RNAs form two or more long lived conformations, and conformational changes occur either spontaneously or are induced through binding of small molecules or other biopolymers. RNA switches are found in nature where they act as elements in genetic and metabolic regulation. The reliability of RNA secondary structure prediction is limited by the accuracy with which the empirical parameters can be determined and by principal deficiencies, for example by the lack of energy contributions resulting from tertiary interactions. In addition, native structures may be determined by folding kinetics rather than by thermodynamics. We address the first problem by considering base pair probabilities or base pairing entropies, which are derived from the partition function of conformations. A high base pair probability corresponding to a low pairing entropy is taken as an indicator of a high reliability of prediction. Pseudoknots are discussed as an example of a tertiary interaction that is highly important for RNA function. Moreover, pseudoknot formation is readily incorporated into structure prediction algorithms. Some examples of experimental data on RNA secondary structures that are readily explained using the landscape concept are presented. They deal with (i) properties of RNA molecules with random sequences, (ii) RNA molecules from restricted alphabets, (iii) existence of neutral networks, (iv) shape space covering, (v) riboswitches and (vi) evolution of non-coding RNAs as an example of evolution restricted to neutral networks.

Electrostatics Explains the Position-Dependent Effect of G⋅U Wobble Base Pairs on the Affinity of RNA Kissing Complexes.

PubMed

Abi-Ghanem, Josephine; Rabin, Clémence; Porrini, Massimiliano; Dausse, Eric; Toulmé, Jean-Jacques; Gabelica, Valérie

2017-10-06

In the RNA realm, non-Watson-Crick base pairs are abundant and can affect both the RNA 3D structure and its function. Here, we investigated the formation of RNA kissing complexes in which the loop-loop interaction is modulated by non-Watson-Crick pairs. Mass spectrometry, surface plasmon resonance, and UV-melting experiments show that the G⋅U wobble base pair favors kissing complex formation only when placed at specific positions. We tried to rationalize this effect by molecular modeling, including molecular mechanics Poisson-Boltzmann surface area (MMPBSA) thermodynamics calculations and PBSA calculations of the electrostatic potential surfaces. Modeling reveals that the G⋅U stabilization is due to a specific electrostatic environment defined by the base pairs of the entire loop-loop region. The loop is not symmetric, and therefore the identity and position of each base pair matters. Predicting and visualizing the electrostatic environment created by a given sequence can help to design specific kissing complexes with high affinity, for potential therapeutic, nanotechnology or analytical applications. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
De novo peptide sequencing using CID and HCD spectra pairs.

PubMed

Yan, Yan; Kusalik, Anthony J; Wu, Fang-Xiang

2016-10-01

In tandem mass spectrometry (MS/MS), there are several different fragmentation techniques possible, including, collision-induced dissociation (CID) higher energy collisional dissociation (HCD), electron-capture dissociation (ECD), and electron transfer dissociation (ETD). When using pairs of spectra for de novo peptide sequencing, the most popular methods are designed for CID (or HCD) and ECD (or ETD) spectra because of the complementarity between them. Less attention has been paid to the use of CID and HCD spectra pairs. In this study, a new de novo peptide sequencing method is proposed for these spectra pairs. This method includes a CID and HCD spectra merging criterion and a parent mass correction step, along with improvements to our previously proposed algorithm for sequencing merged spectra. Three pairs of spectral datasets were used to investigate and compare the performance of the proposed method with other existing methods designed for single spectrum (HCD or CID) sequencing. Experimental results showed that full-length peptide sequencing accuracy was increased significantly by using spectra pairs in the proposed method, with the highest accuracy reaching 81.31%. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Highly Stable Double-Stranded DNA Containing Sequential Silver(I)-Mediated 7-Deazaadenine/Thymine Watson-Crick Base Pairs.

PubMed

Santamaría-Díaz, Noelia; Méndez-Arriaga, José M; Salas, Juan M; Galindo, Miguel A

2016-05-17

The oligonucleotide d(TX)9 , which consists of an octadecamer sequence with alternating non-canonical 7-deazaadenine (X) and canonical thymine (T) as the nucleobases, was synthesized and shown to hybridize into double-stranded DNA through the formation of hydrogen-bonded Watson-Crick base pairs. dsDNA with metal-mediated base pairs was then obtained by selectively replacing W-C hydrogen bonds by coordination bonds to central silver(I) ions. The oligonucleotide I adopts a duplex structure in the absence of Ag(+) ions, and its stability is significantly enhanced in the presence of Ag(+) ions while its double-helix structure is retained. Temperature-dependent UV spectroscopy, circular dichroism spectroscopy, and ESI mass spectrometry were used to confirm the selective formation of the silver(I)-mediated base pairs. This strategy could become useful for preparing stable metallo-DNA-based nanostructures. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sequence periodicity in nucleosomal DNA and intrinsic curvature.

PubMed

Nair, T Murlidharan

2010-05-17

Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.
Sensitivity to structure in action sequences: An infant event-related potential study.

PubMed

Monroy, Claire D; Gerson, Sarah A; Domínguez-Martínez, Estefanía; Kaduk, Katharina; Hunnius, Sabine; Reid, Vincent

2017-05-06

Infants are sensitive to structure and patterns within continuous streams of sensory input. This sensitivity relies on statistical learning, the ability to detect predictable regularities in spatial and temporal sequences. Recent evidence has shown that infants can detect statistical regularities in action sequences they observe, but little is known about the neural process that give rise to this ability. In the current experiment, we combined electroencephalography (EEG) with eye-tracking to identify electrophysiological markers that indicate whether 8-11-month-old infants detect violations to learned regularities in action sequences, and to relate these markers to behavioral measures of anticipation during learning. In a learning phase, infants observed an actor performing a sequence featuring two deterministic pairs embedded within an otherwise random sequence. Thus, the first action of each pair was predictive of what would occur next. One of the pairs caused an action-effect, whereas the second did not. In a subsequent test phase, infants observed another sequence that included deviant pairs, violating the previously observed action pairs. Event-related potential (ERP) responses were analyzed and compared between the deviant and the original action pairs. Findings reveal that infants demonstrated a greater Negative central (Nc) ERP response to the deviant actions for the pair that caused the action-effect, which was consistent with their visual anticipations during the learning phase. Findings are discussed in terms of the neural and behavioral processes underlying perception and learning of structured action sequences. Copyright © 2017 Elsevier Ltd. All rights reserved.
Nearly complete rRNA genes assembled from across the metazoan animals: effects of more taxa, a structure-based alignment, and paired-sites evolutionary models on phylogeny reconstruction.

PubMed

Mallatt, Jon; Craig, Catherine Waggoner; Yoder, Matthew J

2010-04-01

This study (1) uses nearly complete rRNA-gene sequences from across Metazoa (197 taxa) to reconstruct animal phylogeny; (2) presents a highly annotated, manual alignment of these sequences with special reference to rRNA features including paired sites (http://purl.oclc.org/NET/rRNA/Metazoan_alignment) and (3) tests, after eliminating as few disruptive, rogue sequences as possible, if a likelihood framework can recover the main metazoan clades. We found that systematic elimination of approximately 6% of the sequences, including the divergent or unstably placed sequences of cephalopods, arrowworm, symphylan and pauropod myriapods, and of myzostomid and nemertodermatid worms, led to a tree that supported Ecdysozoa, Lophotrochozoa, Protostomia, and Bilateria. Deuterostomia, however, was never recovered, because the rRNA of urochordates goes (nonsignificantly) near the base of the Bilateria. Counterintuitively, when we modeled the evolution of the paired sites, phylogenetic resolution was not increased over traditional tree-building models that assume all sites in rRNA evolve independently. The rRNA genes of non-bilaterians contain a higher % AT than do those of most bilaterians. The rRNA genes of Acoela and Myzostomida were found to be secondarily shortened, AT-enriched, and highly modified, throwing some doubt on the location of these worms at the base of Bilateria in the rRNA tree--especially myzostomids, which other evidence suggests are annelids instead. Other findings are marsupial-with-placental mammals, arrowworms in Ecdysozoa (well supported here but contradicted by morphology), and Placozoa as sister to Cnidaria. Finally, despite the difficulties, the rRNA-gene trees are in strong concordance with trees derived from multiple protein-coding genes in supporting the new animal phylogeny. (c) 2009 Elsevier Inc. All rights reserved.
Identification and characterization of gene-based SSR markers in date palm (Phoenix dactylifera L.).

PubMed

Zhao, Yongli; Williams, Roxanne; Prakash, C S; He, Guohao

2012-12-15

Date palm (Phoenix dactylifera L.) is an important tree in the Middle East and North Africa due to the nutritional value of its fruit. Molecular Breeding would accelerate genetic improvement of fruit tree through marker assisted selection. However, the lack of molecular markers in date palm restricts the application of molecular breeding. In this study, we analyzed 28,889 EST sequences from the date palm genome database to identify simple-sequence repeats (SSRs) and to develop gene-based markers, i.e. expressed sequence tag-SSRs (EST-SSRs). We identified 4,609 ESTs as containing SSRs, among which, trinucleotide motifs (69.7%) were the most common, followed by tetranucleotide (10.4%) and dinucleotide motifs (9.6%). The motif AG (85.7%) was most abundant in dinucleotides, while motifs AGG (26.8%), AAG (19.3%), and AGC (16.1%) were most common among trinucleotides. A total of 4,967 primer pairs were designed for EST-SSR markers from the computational data. In a follow up laboratory study, we tested a sample of 20 random selected primer pairs for amplification and polymorphism detection using genomic DNA from date palm cultivars. Nearly one-third of these primer pairs detected DNA polymorphism to differentiate the twelve date palm cultivars used. Functional categorization of EST sequences containing SSRs revealed that 3,108 (67.4%) of such ESTs had homology with known proteins. Date palm EST sequences exhibits a good resource for developing gene-based markers. These genic markers identified in our study may provide a valuable genetic and genomic tool for further genetic research and varietal development in date palm, such as diversity study, QTL mapping, and molecular breeding.
Thermodynamics of triple helix formation: spectrophotometric studies on the d(A)10.2d(T)10 and d(C+3T4C+3).d(G3A4G3).d(C3T4C3) triple helices.

PubMed Central

Pilch, D S; Brousseau, R; Shafer, R H

1990-01-01

We have stabilized the d(A)10.2d(T)10 and d(C+LT4C+3).d(G3A4G3).d(C3T4C3) triple helices with either NaCl or MgCl2 at pH 5.5. UV mixing curves demonstrate a 1:2 stoichiometry of purine to pyrimidine strands under the appropriate conditions of pH and ionic strength. Circular dichroic titrations suggest a possible sequence-independent spectral signature for triplex formation. Thermal denaturation profiles indicate the initial loss of the third strand followed by dissociation of the underlying duplex with increasing temperature. Depending on the base sequence and ionic conditions, the binding affinity of the third strand for the duplex at 25 degrees C is two to five orders of magnitude lower than that of the two strands forming the duplex. Thermodynamic parameters for triplex formation were determined for both sequences in the presence of 50 mM MgCl2 and/or 2.0 M NaCl. Hoogsteen base pairs are 0.22-0.64 kcal/mole less stable than Watson-Crick base pairs, depending on ionic conditions and base composition. C+.G and T.A Hoogsteen base pairs appear to have similar stability in the presence of Mg2+ ions at low pH. PMID:2216768
Development of cleaved amplified polymorphic sequence markers and a CAPS-based genetic linkage map in watermelon (Citrullus lanatus [Thunb.] Matsum. and Nakai) constructed using whole-genome re-sequencing data

PubMed Central

Liu, Shi; Gao, Peng; Zhu, Qianglong; Luan, Feishi; Davis, Angela R.; Wang, Xiaolu

2016-01-01

Cleaved amplified polymorphic sequence (CAPS) markers are useful tools for detecting single nucleotide polymorphisms (SNPs). This study detected and converted SNP sites into CAPS markers based on high-throughput re-sequencing data in watermelon, for linkage map construction and quantitative trait locus (QTL) analysis. Two inbred lines, Cream of Saskatchewan (COS) and LSW-177 had been re-sequenced and analyzed by Perl self-compiled script for CAPS marker development. 88.7% and 78.5% of the assembled sequences of the two parental materials could map to the reference watermelon genome, respectively. Comparative assembled genome data analysis provided 225,693 and 19,268 SNPs and indels between the two materials. 532 pairs of CAPS markers were designed with 16 restriction enzymes, among which 271 pairs of primers gave distinct bands of the expected length and polymorphic bands, via PCR and enzyme digestion, with a polymorphic rate of 50.94%. Using the new CAPS markers, an initial CAPS-based genetic linkage map was constructed with the F2 population, spanning 1836.51 cM with 11 linkage groups and 301 markers. 12 QTLs were detected related to fruit flesh color, length, width, shape index, and brix content. These newly CAPS markers will be a valuable resource for breeding programs and genetic studies of watermelon. PMID:27162496
Genetic Analysis of 430 Chinese Cynodon dactylon Accessions Using Sequence-Related Amplified Polymorphism Markers

PubMed Central

Huang, Chunqiong; Liu, Guodao; Bai, Changjun; Wang, Wenqiang

2014-01-01

Although Cynodon dactylon (C. dactylon) is widely distributed in China, information on its genetic diversity within the germplasm pool is limited. The objective of this study was to reveal the genetic variation and relationships of 430 C. dactylon accessions collected from 22 Chinese provinces using sequence-related amplified polymorphism (SRAP) markers. Fifteen primer pairs were used to amplify specific C. dactylon genomic sequences. A total of 481 SRAP fragments were generated, with fragment sizes ranging from 260–1800 base pairs (bp). Genetic similarity coefficients (GSC) among the 430 accessions averaged 0.72 and ranged from 0.53–0.96. Cluster analysis conducted by two methods, namely the unweighted pair-group method with arithmetic averages (UPGMA) and principle coordinate analysis (PCoA), separated the accessions into eight distinct groups. Our findings verify that Chinese C. dactylon germplasms have rich genetic diversity, which is an excellent basis for C. dactylon breeding for new cultivars. PMID:25338051
Genetic differentiation in proboscis monkeys--A reanalysis.

PubMed

Nijman, Vincent

2016-01-01

Ogata and Seino [Zoo Biol, 2015, 34:76-79] sequenced the mitochondrial D-loop of five proboscis monkeys Nasalis larvatus from Yokahama Zoo, Japan, that were imported from Surabaya Zoo, Indonesia. They compared their sequences with those of 16 proboscis monkeys from Sabah, Malaysia, and on the basis of a haplotype network analysis of 256 base pairs concluded that the northern Malaysian and southern Indonesian populations of proboscis monkeys are genetically differentiated. I provide information on the origin of the Indonesian proboscis monkeys, showing that they were the first-generation offspring of wild-caught individuals from the Pulau Kaget Strict Nature Reserve in the province of South Kalimantan. Using a phylogenetic approach and adding additional sequences from Indonesia and Malaysia, I reanalyzed their data, and found no support for a north-south divide. Instead the resulting tree based on 433 base pairs sequences show two strongly supported clades, both containing individuals from Indonesia and Malaysia. Work on captive individuals, as reported by Ogata and Seino, can aid in developing appropriate markers and techniques, but to obtain a more complete understanding of the genetic diversity and differentiation of wild proboscis monkeys, more detailed geographic sampling from all over Borneo is needed. © 2015 Wiley Periodicals, Inc.
A tale of two sequences: microRNA-target chimeric reads.

PubMed

Broughton, James P; Pasquinelli, Amy E

2016-04-04

In animals, a functional interaction between a microRNA (miRNA) and its target RNA requires only partial base pairing. The limited number of base pair interactions required for miRNA targeting provides miRNAs with broad regulatory potential and also makes target prediction challenging. Computational approaches to target prediction have focused on identifying miRNA target sites based on known sequence features that are important for canonical targeting and may miss non-canonical targets. Current state-of-the-art experimental approaches, such as CLIP-seq (cross-linking immunoprecipitation with sequencing), PAR-CLIP (photoactivatable-ribonucleoside-enhanced CLIP), and iCLIP (individual-nucleotide resolution CLIP), require inference of which miRNA is bound at each site. Recently, the development of methods to ligate miRNAs to their target RNAs during the preparation of sequencing libraries has provided a new tool for the identification of miRNA target sites. The chimeric, or hybrid, miRNA-target reads that are produced by these methods unambiguously identify the miRNA bound at a specific target site. The information provided by these chimeric reads has revealed extensive non-canonical interactions between miRNAs and their target mRNAs, and identified many novel interactions between miRNAs and noncoding RNAs.
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.

PubMed

Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H

2013-12-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

PubMed Central

Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.

2013-01-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704
Sequence-dependent response of DNA to torsional stress: a potential biological regulation mechanism.

PubMed

Reymer, Anna; Zakrzewska, Krystyna; Lavery, Richard

2018-02-28

Torsional restraints on DNA change in time and space during the life of the cell and are an integral part of processes such as gene expression, DNA repair and packaging. The mechanical behavior of DNA under torsional stress has been studied on a mesoscopic scale, but little is known concerning its response at the level of individual base pairs and the effects of base pair composition. To answer this question, we have developed a geometrical restraint that can accurately control the total twist of a DNA segment during all-atom molecular dynamics simulations. By applying this restraint to four different DNA oligomers, we are able to show that DNA responds to both under- and overtwisting in a very heterogeneous manner. Certain base pair steps, in specific sequence environments, are able to absorb most of the torsional stress, leaving other steps close to their relaxed conformation. This heterogeneity also affects the local torsional modulus of DNA. These findings suggest that modifying torsional stress on DNA could act as a modulator for protein binding via the heterogeneous changes in local DNA structure.
Sequence-dependent response of DNA to torsional stress: a potential biological regulation mechanism

PubMed Central

Reymer, Anna; Zakrzewska, Krystyna; Lavery, Richard

2018-01-01

Abstract Torsional restraints on DNA change in time and space during the life of the cell and are an integral part of processes such as gene expression, DNA repair and packaging. The mechanical behavior of DNA under torsional stress has been studied on a mesoscopic scale, but little is known concerning its response at the level of individual base pairs and the effects of base pair composition. To answer this question, we have developed a geometrical restraint that can accurately control the total twist of a DNA segment during all-atom molecular dynamics simulations. By applying this restraint to four different DNA oligomers, we are able to show that DNA responds to both under- and overtwisting in a very heterogeneous manner. Certain base pair steps, in specific sequence environments, are able to absorb most of the torsional stress, leaving other steps close to their relaxed conformation. This heterogeneity also affects the local torsional modulus of DNA. These findings suggest that modifying torsional stress on DNA could act as a modulator for protein binding via the heterogeneous changes in local DNA structure. PMID:29267977
HIV-1 sequence variation between isolates from mother-infant transmission pairs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wike, C.M.; Daniels, M.R.; Furtado, M.

1991-12-31

To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants` isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between eachmore » linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.« less
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis

PubMed Central

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611
Charge transport properties of DNA aperiodic molecule: The role of interbase hopping in Watson-Crick base pair

NASA Astrophysics Data System (ADS)

Sinurat, E. N.; Yudiarsah, E.

2017-07-01

The charge transport properties of DNA aperiodic molecule has been studied by considering various interbase hopping parameter on Watson-Crick base pair. 32 base pairs long double-stranded DNA aperiodic model with sequence GCTAGTACGTGACGTAGCTAGGATATGCCTGA on one chain and its complement on the other chain is used. Transfer matrix method has been used to calculate transmission probabilities, for determining I-V characteristic using Landauer Büttiker formula. DNA molecule is modeled using tight binding hamiltonian combined with the theory of Slater-Koster. The result show, the increment of Watson-Crick hopping value leads to the transmission probabilities and current of DNA aperiodic molecule increases.
Nucleic acid constructs containing orthogonal site selective recombinases (OSSRs)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gilmore, Joshua M.; Anderson, J. Christopher; Dueber, John E.

The present invention provides for a recombinant nucleic acid comprising a nucleotide sequence comprising a plurality of constructs, wherein each construct independently comprises a nucleotide sequence of interest flanked by a pair of recombinase recognition sequences. Each pair of recombinase recognition sequences is recognized by a distinct recombinase. Optionally, each construct can, independently, further comprise one or more genes encoding a recombinase capable of recognizing the pair of recombinase recognition sequences of the construct. The recombinase can be an orthogonal (non-cross reacting), site-selective recombinase (OSSR).

Mitochondrial genome sequence and expression profiling for the legume pod borer Maruca vitrata (Lepidoptera: Crambidae)

USDA-ARS?s Scientific Manuscript database

We report on the assembly of the 14,146 base pairs (bp) near complete mitochondrial sequencing of the legume pod borer (LPB), Maruca vitrata (Lepidoptera: Crambidae), which was used to estimate divergence and relationships within the lepidopteran lineage. Arrangement and orientation of 13 protein c...
Transposon Tn10 contains two structural genes with opposite polarity between tetA and IS10R.

PubMed Central

Schollmeier, K; Hillen, W

1984-01-01

The nucleotide sequence of the central part of Tn10 has been determined from the rightmost HindIII site to IS10R. This sequence contains two open reading frames with opposite polarity. The in vivo transcription start points in this sequence have been determined by S1 mapping. These results define one minor and two major promoters. The transcription starts of the two major promoters are only 18 base pairs apart, and the transcripts show different polarity and overlap by 18 base pairs. The nucleotide sequence reveals two regions with palindromic symmetry which may serve as operators. Their possible involvement in the regulation of transcription of both genes is discussed. Taken together these results allow for a maximal coding capacity of 138 amino acids directed toward IS10R and 197 amino acids directed toward tetA. The possible function of these gene products is discussed. The accompanying article (Braus et al., J. Bacteriol. 160:504-509, 1984) presents evidence that these genes are expressed. Images PMID:6094471
Mesoscopic modeling of DNA denaturation rates: Sequence dependence and experimental comparison

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dahlen, Oda, E-mail: oda.dahlen@ntnu.no; Erp, Titus S. van, E-mail: titus.van.erp@ntnu.no

Using rare event simulation techniques, we calculated DNA denaturation rate constants for a range of sequences and temperatures for the Peyrard-Bishop-Dauxois (PBD) model with two different parameter sets. We studied a larger variety of sequences compared to previous studies that only consider DNA homopolymers and DNA sequences containing an equal amount of weak AT- and strong GC-base pairs. Our results show that, contrary to previous findings, an even distribution of the strong GC-base pairs does not always result in the fastest possible denaturation. In addition, we applied an adaptation of the PBD model to study hairpin denaturation for which experimentalmore » data are available. This is the first quantitative study in which dynamical results from the mesoscopic PBD model have been compared with experiments. Our results show that present parameterized models, although giving good results regarding thermodynamic properties, overestimate denaturation rates by orders of magnitude. We believe that our dynamical approach is, therefore, an important tool for verifying DNA models and for developing next generation models that have higher predictive power than present ones.« less
Evidence That Intergenic Spacer Repeats of Drosophila Melanogaster Rrna Genes Function as X-Y Pairing Sites in Male Meiosis, and a General Model for Achiasmatic Pairing

PubMed Central

McKee, B. D.; Habera, L.; Vrana, J. A.

1992-01-01

In Drosophila melanogaster males, X-Y meiotic chromosome pairing is mediated by the nucleolus organizers (NOs) which are located in the X heterochromatin (Xh) and near the Y centromere. Deficiencies for Xh disrupt X-Y meiotic pairing and cause high frequencies of X-Y nondisjunction. Insertion of cloned rRNA genes on an Xh(-) chromosome partially restores normal X-Y pairing and disjunction. To map the sequences within an inserted, X-linked rRNA gene responsible for stimulating X-Y pairing, partial deletions were generated by P element-mediated destabilization of the insert. Complete deletions of the rRNA transcription unit did not interfere with the ability to stimulate X-Y pairing as long as most of the intergenic spacer (IGS) remained. Within groups of deletions that lacked the entire transcription unit and differed only in length of residual IGS material, pairing ability was proportional to the dose of 240-bp intergenic spacer repeats. Deletions of the complete rRNA transcription unit or of the 28S sequences alone blocked nucleolus formation, as determined by binding of an antinucleolar antibody, yet did not interfere with pairing ability, suggesting that X-Y pairing may not be mechanistically related to nucleolus formation. A model for achiasmatic pairing in Drosophila males based upon the combined action of topoisomerase I and a strand transferase is proposed. PMID:1330825
SP-Designer: a user-friendly program for designing species-specific primer pairs from DNA sequence alignments.

PubMed

Villard, Pierre; Malausa, Thibaut

2013-07-01

SP-Designer is an open-source program providing a user-friendly tool for the design of specific PCR primer pairs from a DNA sequence alignment containing sequences from various taxa. SP-Designer selects PCR primer pairs for the amplification of DNA from a target species on the basis of several criteria: (i) primer specificity, as assessed by interspecific sequence polymorphism in the annealing regions, (ii) the biochemical characteristics of the primers and (iii) the intended PCR conditions. SP-Designer generates tables, detailing the primer pair and PCR characteristics, and a FASTA file locating the primer sequences in the original sequence alignment. SP-Designer is Windows-compatible and freely available from http://www2.sophia.inra.fr/urih/sophia_mart/sp_designer/info_sp_designer.php. © 2013 John Wiley & Sons Ltd.
Solid-phase proximity ligation assays for individual or parallel protein analyses with readout via real-time PCR or sequencing.

PubMed

Nong, Rachel Yuan; Wu, Di; Yan, Junhong; Hammond, Maria; Gu, Gucci Jijuan; Kamali-Moghaddam, Masood; Landegren, Ulf; Darmanis, Spyros

2013-06-01

Solid-phase proximity ligation assays share properties with the classical sandwich immunoassays for protein detection. The proteins captured via antibodies on solid supports are, however, detected not by single antibodies with detectable functions, but by pairs of antibodies with attached DNA strands. Upon recognition by these sets of three antibodies, pairs of DNA strands brought in proximity are joined by ligation. The ligated reporter DNA strands are then detected via methods such as real-time PCR or next-generation sequencing (NGS). We describe how to construct assays that can offer improved detection specificity by virtue of recognition by three antibodies, as well as enhanced sensitivity owing to reduced background and amplified detection. Finally, we also illustrate how the assays can be applied for parallel detection of proteins, taking advantage of the oligonucleotide ligation step to avoid background problems that might arise with multiplexing. The protocol for the singleplex solid-phase proximity ligation assay takes ~5 h. The multiplex version of the assay takes 7-8 h depending on whether quantitative PCR (qPCR) or sequencing is used as the readout. The time for the sequencing-based protocol includes the library preparation but not the actual sequencing, as times may vary based on the choice of sequencing platform.
Morphological and molecular differentiation of Staphylocystis clydesengeri n. sp. (Cestoda, Hymenolepididae) from the vagrant shrew, Sorex vagrans (Soricomorpha, Soricidae), in North America.

PubMed

Tkach, Vasyl V; Makarikov, Arseny A; Kinsella, John M

2013-01-01

Staphylocystis clydesengeri n. sp. is described from shrews Sorex vagrans in Montana and Washington, United States. It differs from the only previously known North American representative of the genus, S. schilleri, in having more numerous (37-42 vs. 22-30) and larger (39-44 microm vs. 27-30 microm) rostellar hooks. The two species also differ in several other important characters such as relative length of the cirrus pouch, position of gonads and shape of mature proglottides. Morphological differentiation of the new species from all previously known Palearctic species of Staphylocystis from Sorex is also provided. Differentiation from Staphylocystis parasitic in crocidurine shrews is not provided due to the high level of specificity among shrew hymenolepidids to the host genera and much greater levels of sequence divergence between Staphylocystis from the two groups of shrews. Molecular differentiation based on 2,800 base pair long sequences of nuclear ribosomal RNA (complete ITS region and partial 28S region), 663 base pair long sequences of mitochondrial nad1 gene and 542 base pair long sequences of mitochondrial ribosomal 16S gene strongly support the status of Staphylocystis clydesengeri n. sp. Relative utility of the DNA fragments used in this study for reliable differentiation among closely related species of mammalian hymenolepidids is discussed. Nuclear ribosomal RNA region appears to be too conserved for this purpose. Use of at least one mitochondrial gene in addition to nuclear ribosomal RNA or without it, is recommended. Vampirolepis novosibirskiensis Sawada & Kobayashi, 1994 is transferred to Staphylocystis as a junior synonym of S. furcara (Stieda, 1862). Rodentolepis gnoskei Greiman & Tkach, 2012 is transferred to Pararodentolepis Makarikov and Gulyaev, 2009 as a new combination Pararodentolepis gnoskei (Greiman & Tkach, 2012) n. comb.
PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction

PubMed Central

Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.

2008-01-01

A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu. PMID:18304945
Modeling DNA bubble formation at the atomic scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beleva, V; Rasmussen, K. O.; Garcia, A. E.

We describe the fluctuations of double stranded DNA molecules using a minimalist Go model over a wide range of temperatures. Minimalist models allow us to describe, at the atomic level, the opening and formation of bubbles in DNA double helices. This model includes all the geometrical constraints in helix melting imposed by the 3D structure of the molecule. The DNA forms melted bubbles within double helices. These bubbles form and break as a function of time. The equilibrium average number of broken base pairs shows a sharp change as a function of T. We observe a temperature profile of sequencemore » dependent bubble formation similar to those measured by Zeng et al. Long nuclei acid molecules melt partially through the formations of bubbles. It is known that CG rich sequences melt at higher temperatures than AT rich sequences. The melting temperature, however, is not solely determined by the CG content, but by the sequence through base stacking and solvent interactions. Recently, models that incorporate the sequence and nonlinear dynamics of DNA double strands have shown that DNA exhibits a very rich dynamics. Recent extensions of the Bishop-Peyrard model show that fluctuations in the DNA structure lead to opening in localized regions, and that these regions in the DNA are associated with transcription initiation sites. 1D and 2D models of DNA may contain enough information about stacking and base pairing interactions, but lack the coupling between twisting, bending and base pair opening imposed by the double helical structure of DNA that all atom models easily describe. However, the complexity of the energy function used in all atom simulations (including solvent, ions, etc) does not allow for the description of DNA folding/unfolding events that occur in the microsecond time scale.« less
Conservation of an Intact vif Gene of Human Immunodeficiency Virus Type 1 during Maternal-Fetal Transmission

PubMed Central

Yedavalli, Venkat R. K.; Chappey, Colombe; Matala, Erik; Ahmad, Nafees

1998-01-01

The human immunodeficiency virus type 1 (HIV-1) vif gene is conserved among most lentiviruses, suggesting that vif is important for natural infection. To determine whether an intact vif gene is positively selected during mother-to-infant transmission, we analyzed vif sequences from five infected mother-infant pairs following perinatal transmission. The coding potential of the vif open reading frame directly derived from uncultured peripheral blood mononuclear cell DNA was maintained in most of the 78,912 bp sequenced. We found that 123 of the 137 clones analyzed showed an 89.8% frequency of intact vif open reading frames. There was a low degree of heterogeneity of vif genes within mothers, within infants, and between epidemiologically linked mother-infant pairs. The distances between vif sequences were greater in epidemiologically unlinked individuals than in epidemiologically linked mother-infant pairs. Furthermore, the epidemiologically linked mother-infant pair vif sequences displayed similar patterns that were not seen in vif sequences from epidemiologically unlinked individuals. The functional domains, including the two cysteines at positions 114 and 133, a serine phosphorylation site at position 144, and the C-terminal basic amino acids essential for vif protein function, were highly conserved in most of the sequences. Phylogenetic analyses of 137 mother-infant pair vif sequences and 187 other available vif sequences from HIV-1 databases revealed distinct clusters for vif sequences from each mother-infant pair and for other vif sequences. Taken together, these findings suggest that vif plays an important role in HIV-1 infection and replication in mothers and their perinatally infected infants. PMID:9445004
Easy design of colorimetric logic gates based on nonnatural base pairing and controlled assembly of gold nanoparticles.

PubMed

Zhang, Li; Wang, Zhong-Xia; Liang, Ru-Ping; Qiu, Jian-Ding

2013-07-16

Utilizing the principles of metal-ion-mediated base pairs (C-Ag-C and T-Hg-T), the pH-sensitive conformational transition of C-rich DNA strand, and the ligand-exchange process triggered by DL-dithiothreitol (DTT), a system of colorimetric logic gates (YES, AND, INHIBIT, and XOR) can be rationally constructed based on the aggregation of the DNA-modified Au NPs. The proposed logic operation system is simple, which consists of only T-/C-rich DNA-modified Au NPs, and it is unnecessary to exquisitely design and alter the DNA sequence for different multiple molecular logic operations. The nonnatural base pairing combined with unique optical properties of Au NPs promises great potential in multiplexed ion sensing, molecular-scale computers, and other computational logic devices.
Synthesis and Properties of Size-expanded DNAs: Toward Designed, Functional Genetic Systems

PubMed Central

Krueger, Andrew T.; Lu, Haige; Lee, Alex H. F.; Kool, Eric T.

2008-01-01

We describe the design, synthesis, and properties of DNA-like molecules in which the base pairs are expanded by benzo homologation. The resulting size-expanded genetic helices are called xDNA (“expanded DNA”) and yDNA (“wide DNA”). The large component bases are fluorescent, and they display high stacking affinity. When singly substituted into natural DNA, they are destabilizing because the benzo-expanded base pair size is too large for the natural helix. However, when all base pairs are expanded, xDNA and yDNA form highly stable, sequence-selective double helices. The size-expanded DNAs are candidates for components of new, functioning genetic systems. In addition, the fluorescence of expanded DNA bases makes them potentially useful in probing nucleic acids. PMID:17309194
A high-throughput assay for the comprehensive profiling of DNA ligase fidelity

PubMed Central

Lohman, Gregory J. S.; Bauer, Robert J.; Nichols, Nicole M.; Mazzola, Laurie; Bybee, Joanna; Rivizzigno, Danielle; Cantin, Elizabeth; Evans, Thomas C.

2016-01-01

DNA ligases have broad application in molecular biology, from traditional cloning methods to modern synthetic biology and molecular diagnostics protocols. Ligation-based detection of polynucleotide sequences can be achieved by the ligation of probe oligonucleotides when annealed to a complementary target sequence. In order to achieve a high sensitivity and low background, the ligase must efficiently join correctly base-paired substrates, while discriminating against the ligation of substrates containing even one mismatched base pair. In the current study, we report the use of capillary electrophoresis to rapidly generate mismatch fidelity profiles that interrogate all 256 possible base-pair combinations at a ligation junction in a single experiment. Rapid screening of ligase fidelity in a 96-well plate format has allowed the study of ligase fidelity in unprecedented depth. As an example of this new method, herein we report the ligation fidelity of Thermus thermophilus DNA ligase at a range of temperatures, buffer pH and monovalent cation strength. This screen allows the selection of reaction conditions that maximize fidelity without sacrificing activity, while generating a profile of specific mismatches that ligate detectably under each set of conditions. PMID:26365241
A high-throughput assay for the comprehensive profiling of DNA ligase fidelity.

PubMed

Lohman, Gregory J S; Bauer, Robert J; Nichols, Nicole M; Mazzola, Laurie; Bybee, Joanna; Rivizzigno, Danielle; Cantin, Elizabeth; Evans, Thomas C

2016-01-29

DNA ligases have broad application in molecular biology, from traditional cloning methods to modern synthetic biology and molecular diagnostics protocols. Ligation-based detection of polynucleotide sequences can be achieved by the ligation of probe oligonucleotides when annealed to a complementary target sequence. In order to achieve a high sensitivity and low background, the ligase must efficiently join correctly base-paired substrates, while discriminating against the ligation of substrates containing even one mismatched base pair. In the current study, we report the use of capillary electrophoresis to rapidly generate mismatch fidelity profiles that interrogate all 256 possible base-pair combinations at a ligation junction in a single experiment. Rapid screening of ligase fidelity in a 96-well plate format has allowed the study of ligase fidelity in unprecedented depth. As an example of this new method, herein we report the ligation fidelity of Thermus thermophilus DNA ligase at a range of temperatures, buffer pH and monovalent cation strength. This screen allows the selection of reaction conditions that maximize fidelity without sacrificing activity, while generating a profile of specific mismatches that ligate detectably under each set of conditions. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Capturing RNA Folding Free Energy with Coarse-Grained Molecular Dynamics Simulations

PubMed Central

Bell, David R.; Cheng, Sara Y.; Salazar, Heber; Ren, Pengyu

2017-01-01

We introduce a coarse-grained RNA model for molecular dynamics simulations, RACER (RnA CoarsE-gRained). RACER achieves accurate native structure prediction for a number of RNAs (average RMSD of 2.93 Å) and the sequence-specific variation of free energy is in excellent agreement with experimentally measured stabilities (R2 = 0.93). Using RACER, we identified hydrogen-bonding (or base pairing), base stacking, and electrostatic interactions as essential driving forces for RNA folding. Also, we found that separating pairing vs. stacking interactions allowed RACER to distinguish folded vs. unfolded states. In RACER, base pairing and stacking interactions each provide an approximate stability of 3–4 kcal/mol for an A-form helix. RACER was developed based on PDB structural statistics and experimental thermodynamic data. In contrast with previous work, RACER implements a novel effective vdW potential energy function, which led us to re-parameterize hydrogen bond and electrostatic potential energy functions. Further, RACER is validated and optimized using a simulated annealing protocol to generate potential energy vs. RMSD landscapes. Finally, RACER is tested using extensive equilibrium pulling simulations (0.86 ms total) on eleven RNA sequences (hairpins and duplexes). PMID:28393861
Distortions induced in double-stranded oligonucleotides by the binding of cis- or trans-diammine-dichloroplatinum(II) to the d(GTG) sequence.

PubMed Central

Anin, M F; Leng, M

1990-01-01

Conformational changes induced in double-stranded oligonucleotides by the binding of trans- or cis-diamminedichloro platinum(II) to the d(GTG) sequence have been characterized by means of melting temperatures, electrophoretic migrations in non-denaturing polyacrylamide gels, reactivities with the artificial nuclease Phenanthroline-copper and with chemical probes. The cis-platinum adduct behaves more as a centre of directed bend than as a hinge joint, the induced bend angle being of the order of 25-30 degrees. The double helix is locally denatured over 2 base pairs (corresponding to the platinated 5'G residue and the central T residue) and is distorted over 4-5 base pairs. The trans-platinum adduct behaves also more as a centre of directed bend than as a hinge joint, the induced bend angle being of the order of 60 degrees. The double helix is locally denatured over 4 base pairs (corresponding to the immediately 5'T residue adjacent to the adduct and to the three base residues of the adduct). Both the cis- and trans-platinum adducts decrease the thermal stability of the double helix. Images PMID:2388824
Automatic Configuration of Programmable Logic Controller Emulators

DTIC Science & Technology

2015-03-01

25 11 Example tree generated using UPGMA [Edw13] . . . . . . . . . . . . . . . . . . . . 33 12 Example sequence alignment for two... UPGMA Unweighted Pair Group Method with Arithmetic Mean URL uniform resource locator VM virtual machine XML Extensible Markup Language xx List of...appearance in the ses- sion, and then they are clustered again using Unweighted Pair Group Method with Arithmetic Mean ( UPGMA ) with a distance matrix based
The presence of ancient human T-cell lymphotropic virus type I provirus DNA in an Andean mummy.

PubMed

Li, H C; Fujiyoshi, T; Lou, H; Yashiki, S; Sonoda, S; Cartier, L; Nunez, L; Munoz, I; Horai, S; Tajima, K

1999-12-01

The worldwide geographic and ethnic clustering of patients with diseases related to human T-cell lymphotropic virus type I (HTLV-I) may be explained by the natural history of HTLV-I infection. The genetic characteristics of indigenous people in the Andes are similar to those of the Japanese, and HTLV-I is generally detected in both groups. To clarify the common origin of HTLV-I in Asia and the Andes, we analyzed HTLV-I provirus DNA from Andean mummies about 1,500 years old. Two of 104 mummy bone marrow specimens yielded a band of human beta-globin gene DNA 110 base pairs in length, and one of these two produced bands of HTLV-I-pX (open reading frame encoding p40x, p27x) and HTLV-I-LTR (long terminal repeat) gene DNA 159 base pairs and 157 base pairs in length, respectively. The nucleotide sequences of ancient HTLV-I-pX and HTLV-I-LTR clones isolated from mummy bone marrow were similar to those in contemporary Andeans and Japanese, although there was microheterogeneity in the sequences of some mummy DNA clones. This result provides evidence that HTLV-I was carried with ancient Mongoloids to the Andes before the Colonial era. Analysis of ancient HTLV-I sequences could be a useful tool for studying the history of human retroviral infection as well as human prehistoric migration.
Ancient HTLV type 1 provirus DNA of Andean mummy.

PubMed

Sonoda, S; Li, H C; Cartier, L; Nunez, L; Tajima, K

2000-11-01

The worldwide geographic and ethnic clustering of patients with diseases related to human T cell lymphotropic virus type 1 (HTLV-1) may be explained by the natural history of HTLV-1 infection. The genetic characteristics of indigenous people in the Andes are similar to those of the Japanese, and HTLV-1 is generally detected in both groups. To clarify the common origin of HTLV-1 in Asia and the Andes, we analyzed HTLV-1 provirus DNA from Andean mummies about 1500 years old. Two of 104 mummy bone marrow specimens yielded a band of human beta-globin gene DNA 110 base pairs in length, and one of these two produced bands of HTLV-1-pX (open reading frame encoding p(40x), p(27x)) and HTLV-1-LTR (long terminal repeat) gene DNA 159 base pairs and 157 base pairs in length, respectively. The nucleotide sequences of ancient HTLV-1-pX and HTLV-1-LTR clones isolated from mummy bone marrow were similar to those in contemporary Andeans and Japanese, although there was microheterogeneity in the sequences of some mummy DNA clones. This result provides evidence that HTLV-1 was carried with ancient Mongoloids to the Andes before the Colonial era. Analysis of ancient HTLV-1 sequences could be a useful tool for studying the history of human retroviral infection as well as human prehistoric migration.
Eye movements reflect and shape strategies in fraction comparison.

PubMed

Ischebeck, Anja; Weilharter, Marina; Körner, Christof

2016-01-01

The comparison of fractions is a difficult task that can often be facilitated by separately comparing components (numerators and denominators) of the fractions--that is, by applying so-called component-based strategies. The usefulness of such strategies depends on the type of fraction pair to be compared. We investigated the temporal organization and the flexibility of strategy deployment in fraction comparison by evaluating sequences of eye movements in 20 young adults. We found that component-based strategies could account for the response times and the overall number of fixations observed for the different fraction pairs. The analysis of eye movement sequences showed that the initial eye movements in a trial were characterized by stereotypical scanning patterns indicative of an exploratory phase that served to establish the kind of fraction pair presented. Eye movements that followed this phase adapted to the particular type of fraction pair and indicated the deployment of specific comparison strategies. These results demonstrate that participants employ eye movements systematically to support strategy use in fraction comparison. Participants showed a remarkable flexibility to adapt to the most efficient strategy on a trial-by-trial basis. Our results confirm the value of eye movement measurements in the exploration of strategic adaptation in complex tasks.

Accurate multiplex polony sequencing of an evolved bacterial genome.

PubMed

Shendure, Jay; Porreca, Gregory J; Reppas, Nikos B; Lin, Xiaoxia; McCutcheon, John P; Rosenbaum, Abraham M; Wang, Michael D; Zhang, Kun; Mitra, Robi D; Church, George M

2005-09-09

We describe a DNA sequencing technology in which a commonly available, inexpensive epifluorescence microscope is converted to rapid nonelectrophoretic DNA sequencing automation. We apply this technology to resequence an evolved strain of Escherichia coli at less than one error per million consensus bases. A cell-free, mate-paired library provided single DNA molecules that were amplified in parallel to 1-micrometer beads by emulsion polymerase chain reaction. Millions of beads were immobilized in a polyacrylamide gel and subjected to automated cycles of sequencing by ligation and four-color imaging. Cost per base was roughly one-ninth as much as that of conventional sequencing. Our protocols were implemented with off-the-shelf instrumentation and reagents.
A molecular model for illegitimate recombination in Bacillus subtilis.

PubMed

Temeyer, K B; Hopkins, K M; Chapman, L F

1991-01-01

The recombinant DNA junctions at which pUB110 and Bacillus subtilis chromosomal DNA were joined to form the plasmid pKBT1 were cloned and sequenced. From the sequencing data we conclude that the pUB110 sequence is intact in the pair of cloned pKBT1 fragments and pTL12 sequences are not present. A molecular model for the formation of pKBT1 based on structural motifs characteristic of the joint sites is presented.
Structator: fast index-based search for RNA sequence-structure patterns

PubMed Central

2011-01-01

Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator. PMID:21619640
Mutation load in melanoma is affected by MC1R genotype.

PubMed

Johansson, Peter A; Pritchard, Antonia L; Patch, Ann-Marie; Wilmott, James S; Pearson, John V; Waddell, Nicola; Scolyer, Richard A; Mann, Graham J; Hayward, Nicholas K

2017-03-01

Whole-genome sequencing of matched germline and tumour pairs in a well-characterized cohort of melanoma patients allowed investigation of associations between melanoma body site, age at melanoma onset and MC1R variant status with overall mutation burden and specific base pair changes observed in the corresponding melanoma. We observed statistically significant associations between mutation burden in melanoma and body site, age at onset and MC1R genotype, for both ultraviolet radiation (UVR) signature changes (C>T and CC>TT) and non-UVR base pair substitutions, as well as with overall variant load. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Metal-mediated DNA base pairing: alternatives to hydrogen-bonded Watson-Crick base pairs.

PubMed

Takezawa, Yusuke; Shionoya, Mitsuhiko

2012-12-18

With its capacity to store and transfer the genetic information within a sequence of monomers, DNA forms its central role in chemical evolution through replication and amplification. This elegant behavior is largely based on highly specific molecular recognition between nucleobases through the specific hydrogen bonds in the Watson-Crick base pairing system. While the native base pairs have been amazingly sophisticated through the long history of evolution, synthetic chemists have devoted considerable efforts to create alternative base pairing systems in recent decades. Most of these new systems were designed based on the shape complementarity of the pairs or the rearrangement of hydrogen-bonding patterns. We wondered whether metal coordination could serve as an alternative driving force for DNA base pairing and why hydrogen bonding was selected on Earth in the course of molecular evolution. Therefore, we envisioned an alternative design strategy: we replaced hydrogen bonding with another important scheme in biological systems, metal-coordination bonding. In this Account, we provide an overview of the chemistry of metal-mediated base pairing including basic concepts, molecular design, characteristic structures and properties, and possible applications of DNA-based molecular systems. We describe several examples of artificial metal-mediated base pairs, such as Cu(2+)-mediated hydroxypyridone base pair, H-Cu(2+)-H (where H denotes a hydroxypyridone-bearing nucleoside), developed by us and other researchers. To design the metallo-base pairs we carefully chose appropriate combinations of ligand-bearing nucleosides and metal ions. As expected from their stronger bonding through metal coordination, DNA duplexes possessing metallo-base pairs exhibited higher thermal stability than natural hydrogen-bonded DNAs. Furthermore, we could also use metal-mediated base pairs to construct or induce other high-order structures. These features could lead to metal-responsive functional DNA molecules such as artificial DNAzymes and DNA machines. In addition, the metallo-base pairing system is a powerful tool for the construction of homogeneous and heterogeneous metal arrays, which can lead to DNA-based nanomaterials such as electronic wires and magnetic devices. Recently researchers have investigated these systems as enzyme replacements, which may offer an additional contribution to chemical biology and synthetic biology through the expansion of the genetic alphabet.
The electrostatic characteristics of G·U wobble base pairs

PubMed Central

Xu, Darui; Landon, Theresa; Greenbaum, Nancy L.; Fenley, Marcia O.

2007-01-01

G·U wobble base pairs are the most common and highly conserved non-Watson–Crick base pairs in RNA. Previous surface maps imply uniformly negative electrostatic potential at the major groove of G·U wobble base pairs embedded in RNA helices, suitable for entrapment of cationic ligands. In this work, we have used a Poisson–Boltzmann approach to gain a more detailed and accurate characterization of the electrostatic profile. We found that the major groove edge of an isolated G·U wobble displays distinctly enhanced negativity compared with standard GC or AU base pairs; however, in the context of different helical motifs, the electrostatic pattern varies. G·U wobbles with distinct widening have similar major groove electrostatic potentials to their canonical counterparts, whereas those with minimal widening exhibit significantly enhanced electronegativity, ranging from 0.8 to 2.5 kT/e, depending upon structural features. We propose that the negativity at the major groove of G·U wobble base pairs is determined by the combined effect of the base atoms and the sugar-phosphate backbone, which is impacted by stacking pattern and groove width as a result of base sequence. These findings are significant in that they provide predictive power with respect to which G·U sites in RNA are most likely to bind cationic ligands. PMID:17526525
MSP-HTPrimer: a high-throughput primer design tool to improve assay design for DNA methylation analysis in epigenetics.

PubMed

Pandey, Ram Vinay; Pulverer, Walter; Kallmeyer, Rainer; Beikircher, Gabriel; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas

2016-01-01

Bisulfite (BS) conversion-based and methylation-sensitive restriction enzyme (MSRE)-based PCR methods have been the most commonly used techniques for locus-specific DNA methylation analysis. However, both methods have advantages and limitations. Thus, an integrated approach would be extremely useful to quantify the DNA methylation status successfully with great sensitivity and specificity. Designing specific and optimized primers for target regions is the most critical and challenging step in obtaining the adequate DNA methylation results using PCR-based methods. Currently, no integrated, optimized, and high-throughput methylation-specific primer design software methods are available for both BS- and MSRE-based methods. Therefore an integrated, powerful, and easy-to-use methylation-specific primer design pipeline with great accuracy and success rate will be very useful. We have developed a new web-based pipeline, called MSP-HTPrimer, to design primers pairs for MSP, BSP, pyrosequencing, COBRA, and MSRE assays on both genomic strands. First, our pipeline converts all target sequences into bisulfite-treated templates for both forward and reverse strand and designs all possible primer pairs, followed by filtering for single nucleotide polymorphisms (SNPs) and known repeat regions. Next, each primer pairs are annotated with the upstream and downstream RefSeq genes, CpG island, and cut sites (for COBRA and MSRE). Finally, MSP-HTPrimer selects specific primers from both strands based on custom and user-defined hierarchical selection criteria. MSP-HTPrimer produces a primer pair summary output table in TXT and HTML format for display and UCSC custom tracks for resulting primer pairs in GTF format. MSP-HTPrimer is an integrated, web-based, and high-throughput pipeline and has no limitation on the number and size of target sequences and designs MSP, BSP, pyrosequencing, COBRA, and MSRE assays. It is the only pipeline, which automatically designs primers on both genomic strands to increase the success rate. It is a standalone web-based pipeline, which is fully configured within a virtual machine and thus can be readily used without any configuration. We have experimentally validated primer pairs designed by our pipeline and shown a very high success rate of primer pairs: out of 66 BSP primer pairs, 63 were successfully validated without any further optimization step and using the same qPCR conditions. The MSP-HTPrimer pipeline is freely available from http://sourceforge.net/p/msp-htprimer.
Alignment of RNA molecules: Binding energy and statistical properties of random sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Valba, O. V., E-mail: valbaolga@gmail.com; Nechaev, S. K., E-mail: sergei.nechaev@gmail.com; Tamm, M. V., E-mail: thumm.m@gmail.com

2012-02-15

A new statistical approach to the problem of pairwise alignment of RNA sequences is proposed. The problem is analyzed for a pair of interacting polymers forming an RNA-like hierarchical cloverleaf structures. An alignment is characterized by the numbers of matches, mismatches, and gaps. A weight function is assigned to each alignment; this function is interpreted as a free energy taking into account both direct monomer-monomer interactions and a combinatorial contribution due to formation of various cloverleaf secondary structures. The binding free energy is determined for a pair of RNA molecules. Statistical properties are discussed, including fluctuations of the binding energymore » between a pair of RNA molecules and loop length distribution in a complex. Based on an analysis of the free energy per nucleotide pair complexes of random RNAs as a function of the number of nucleotide types c, a hypothesis is put forward about the exclusivity of the alphabet c = 4 used by nature.« less
Characterization of a highly polymorphic region 5′ to JH in the human immunoglobulin heavy chain

PubMed Central

Silva, Alcino J.; Johnson, John P.; White, Raymond L.

1987-01-01

A cloned DNA segment 1.25 kilobases (kb) upstream from the joining segments of the human heavy chain immunoglobulin gene revealed extensive polymorphic variation at this locus, and the polymorphic pattern was stably transmitted to the next generation. Genomic restriction analysis showed that the polymorphism was caused by insertions/deletions within an MspI/BamHI fragment. Sequencing of one allele, 848 base pairs (bp) long, revealed eleven 50-base-pair tandem repeats. A second allele, 648 bp long, was cloned from a human genomic cosmid library, sequenced, and found to contain four fewer repeats than the first allele. A survey of 186 chromosomes from unrelated individuals of primarily northern European descent revealed at least six alleles. Images PMID:2884636
Structure-affinity relationships for the binding of actinomycin D to DNA

NASA Astrophysics Data System (ADS)

Gallego, José; Ortiz, Angel R.; de Pascual-Teresa, Beatriz; Gago, Federico

1997-03-01

Molecular models of the complexes between actinomycin D and 14 different DNA hexamers were built based on the X-ray crystal structure of the actinomycin-d(GAAGCTTC)2 complex. The DNA sequences included the canonical GpC binding step flanked by different base pairs, nonclassical binding sites such as GpG and GpT, and sites containing 2,6-diamino- purine. A good correlation was found between the intermolecular interaction energies calculated for the refined complexes and the relative preferences of actinomycin binding to standard and modified DNA. A detailed energy decomposition into van der Waals and electrostatic components for the interactions between the DNA base pairs and either the chromophore or the peptidic part of the antibiotic was performed for each complex. The resulting energy matrix was then subjected to principal component analysis, which showed that actinomycin D discriminates among different DNA sequences by an interplay of hydrogen bonding and stacking interactions. The structure-affinity relationships for this important antitumor drug are thus rationalized and may be used to advantage in the design of novel sequence-specific DNA-binding agents.
GRIL-seq provides a method for identifying direct targets of bacterial small regulatory RNA by in vivo proximity ligation.

PubMed

Han, Kook; Tjaden, Brian; Lory, Stephen

2016-12-22

The first step in the post-transcriptional regulatory function of most bacterial small non-coding RNAs (sRNAs) is base pairing with partially complementary sequences of targeted transcripts. We present a simple method for identifying sRNA targets in vivo and defining processing sites of the regulated transcripts. The technique, referred to as global small non-coding RNA target identification by ligation and sequencing (GRIL-seq), is based on preferential ligation of sRNAs to the ends of base-paired targets in bacteria co-expressing T4 RNA ligase, followed by sequencing to identify the chimaeras. In addition to the RNA chaperone Hfq, the GRIL-seq method depends on the activity of the pyrophosphorylase RppH. Using PrrF1, an iron-regulated sRNA in Pseudomonas aeruginosa, we demonstrated that direct regulatory targets of this sRNA can readily be identified. Therefore, GRIL-seq represents a powerful tool not only for identifying direct targets of sRNAs in a variety of environments, but also for uncovering novel roles for sRNAs and their targets in complex regulatory networks.
Sequence periodicity in nucleosomal DNA and intrinsic curvature

PubMed Central

2010-01-01

Background Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Results Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. Conclusions The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA. PMID:20487515
Model for an RNA tertiary interaction from the structure of an intermolecular complex between a GAAA tetraloop and an RNA helix.

PubMed

Pley, H W; Flaherty, K M; McKay, D B

1994-11-03

In large structured RNAs, RNA hairpins in which the strands of the duplex stem are connected by a tetraloop of the consensus sequence 5'-GNRA (where N is any nucleotide, and R is either G or A) are unusually frequent. In group I introns there is a covariation in sequence between nucleotides in the third and fourth positions of the loop with specific distant base pairs in putative RNA duplex stems: GNAA loops correlate with successive 5'-C-C.G-C base pairs in stems, whereas GNGA loops correlate with 5'-C-U.G-A. This has led to the suggestion that GNRA tetraloops may be involved in specific long-range tertiary interactions, with each A in position 3 or 4 of the loop interacting with a C-G base pair in the duplex, and G in position 3 interacting with a U-A base pair. This idea is supported experimentally for the GAAA loop of the P5b extension of the group I intron of Tetrahymena thermophila and the L9 GUGA terminal loop of the td intron of bacteriophage T4 (ref. 4). NMR has revealed the overall structure of the tetraloop for 12-nucleotide hairpins with GCAA and GAAA loops and models have been proposed for the interaction of GNRA tetraloops with base pairs in the minor groove of A-form RNA. Here we describe the crystal structure of an intermolecular complex between a GAAA tetraloop and an RNA helix. The interactions we observe correlate with the specificity of GNRA tetraloops inferred from phylogenetic studies, suggesting that this complex is a legitimate model for intramolecular tertiary interactions mediated by GNRA tetraloops in large structured RNAs.
Interaction of influenza virus polymerase with viral RNA in the 'corkscrew' conformation.

PubMed

Flick, R; Hobom, G

1999-10-01

The influenza virus RNA (vRNA) promoter structure is known to consist of the 5'- and 3'-terminal sequences of the RNA, within very narrow boundaries of 16 and 15 nucleotides, respectively. A complete set of single nucleotide substitutions led to the previously proposed model of a binary hooked or 'corkscrew' conformation for the vRNA promoter when it interacts with the viral polymerase. This functional structure is confirmed here with a complete set of complementary double substitutions, of both the regular A:U and G:C type and also the G:U type of base-pair exchanges. The proposed structure consists of a six base-pair RNA rod in the distal element in conjunction with two stem-loop structures of two short-range base-pairs (positions 2-9; 3-8). These support an exposed tetranucleotide loop within each branch of the proximal element, in an overall oblique organization due to a central unpaired A residue at position 10 in the 5' sequence. Long-range base-pairing between the entire 5' and 3' branches, as required for an unmodified 'panhandle' model, has been excluded for the proximal element, while it is known to represent the mode of interaction within the distal element. A large number of short-range base-pair exchanges in the proximal element constitute promoter-up mutations, which show activities several times above that of the wild-type in reporter gene assays. The unique overall conformation and rather few invariant nucleotides appear to be the core elements in vRNA recognition by polymerase and also in viral ribonucleoprotein packaging, to allow discrimination against the background of other RNA molecules in the cell.
The structure of the human interferon alpha/beta receptor gene.

PubMed

Lutfalla, G; Gardiner, K; Proudhon, D; Vielh, E; Uzé, G

1992-02-05

Using the cDNA coding for the human interferon alpha/beta receptor (IFNAR), the IFNAR gene has been physically mapped relative to the other loci of the chromosome 21q22.1 region. 32,906 base pairs covering the IFNAR gene have been cloned and sequenced. Primer extension and solution hybridization-ribonuclease protection have been used to determine that the transcription of the gene is initiated in a broad region of 20 base pairs. Some aspects of the polymorphism of the gene, including noncoding sequences, have been analyzed; some are allelic differences in the coding sequence that induce amino acid variations in the resulting protein. The exon structure of the IFNAR gene and of that of the available genes for the receptors of the cytokine/growth hormone/prolactin/interferon receptor family have been compared with the predictions for the secondary structure of those receptors. From this analysis, we postulate a common origin and propose an hypothesis for the divergence from the immunoglobulin superfamily.
Evaluation of Cytokine Synthesis in Human Whole Blood by Enzyme Linked Immunoassay (ELISA), Reverse Transcriptase Polymerase Chain Reaction (RT-PCR), and Flow Cytometry

DTIC Science & Technology

2007-05-08

deoxynucleotide triphosphates, from Sigma. Sequences for glyceraldehyde-3-phosphate dehydrogenase ( G3PDH ), IL-8,and TNF-a were amplified with primer...This was accomplished by normalizing all samples to the mRNA for the moderately expressed housekeeping function glyceraldehyde-3 -phosphate...without and with isolation of cells before reverse transcription and PCR. G3PDH mRNA target amplifies at 983 base pairs. The 630 base pair band is the
Modified Amber Force Field Correctly Models the Conformational Preference for Tandem GA pairs in RNA

PubMed Central

2015-01-01

Molecular mechanics with all-atom models was used to understand the conformational preference of tandem guanine-adenine (GA) noncanonical pairs in RNA. These tandem GA pairs play important roles in determining stability, flexibility, and structural dynamics of RNA tertiary structures. Previous solution structures showed that these tandem GA pairs adopt either imino (cis Watson–Crick/Watson–Crick A-G) or sheared (trans Hoogsteen/sugar edge A-G) conformations depending on the sequence and orientation of the adjacent closing base pairs. The solution structures (GCGGACGC)2 [Biochemistry, 1996, 35, 9677–9689] and (GCGGAUGC)2 [Biochemistry, 2007, 46, 1511–1522] demonstrate imino and sheared conformations for the two central GA pairs, respectively. These systems were studied using molecular dynamics and free energy change calculations for conformational changes, using umbrella sampling. For the structures to maintain their native conformations during molecular dynamics simulations, a modification to the standard Amber ff10 force field was required, which allowed the amino group of guanine to leave the plane of the base [J. Chem. Theory Comput., 2009, 5, 2088–2100] and form out-of-plane hydrogen bonds with a cross-strand cytosine or uracil. The requirement for this modification suggests the importance of out-of-plane hydrogen bonds in stabilizing the native structures. Free energy change calculations for each sequence demonstrated the correct conformational preference when the force field modification was used, but the extent of the preference is underestimated. PMID:24803859
The interaction between vocabulary size and phonotactic probability effects on children's production accuracy and fluency in nonword repetition.

PubMed

Edwards, Jan; Beckman, Mary E; Munson, Benjamin

2004-04-01

Adults' performance on a variety of tasks suggests that phonological processing of nonwords is grounded in generalizations about sublexical patterns over all known words. A small body of research suggests that children's phonological acquisition is similarly based on generalizations over the lexicon. To test this account, production accuracy and fluency were examined in nonword repetitions by 104 children and 22 adults. Stimuli were 22 pairs of nonwords, in which one nonword contained a low-frequency or unattested two-phoneme sequence and the other contained a high-frequency sequence. For a subset of these nonword pairs, segment durations were measured. The same sound was produced with a longer duration (less fluently) when it appeared in a low-frequency sequence, as compared to a high-frequency sequence. Low-frequency sequences were also repeated with lower accuracy than high-frequency sequences. Moreover, children with smaller vocabularies showed a larger influence of frequency on accuracy than children with larger vocabularies. Taken together, these results provide support for a model of phonological acquisition in which knowledge of sublexical units emerges from generalizations made over lexical items.
Repairing the sickle cell mutation. I. Specific covalent binding of a photoreactive third strand to the mutated base pair.

PubMed

Broitman, S; Amosova, O; Dolinnaya, N G; Fresco, J R

1999-07-30

A DNA third strand with a 3'-psoralen substituent was designed to form a triplex with the sequence downstream of the T.A mutant base pair of the human sickle cell beta-globin gene. Triplex-mediated psoralen modification of the mutant T residue was sought as an approach to gene repair. The 24-nucleotide purine-rich target sequence switches from one strand to the other and has four pyrimidine interruptions. Therefore, a third strand sequence favorable to two triplex motifs was used, one parallel and the other antiparallel to it. To cope with the pyrimidine interruptions, which weaken third strand binding, 5-methylcytosine and 5-propynyluracil were used in the third strand. Further, a six residue "hook" complementary to an overhang of a linear duplex target was added to the 5'-end of the third strand via a T(4) linker. In binding to the overhang by Watson-Crick pairing, the hook facilitates triplex formation. This third strand also binds specifically to the target within a supercoiled plasmid. The psoralen moiety at the 3'-end of the third strand forms photoadducts to the targeted T with high efficiency. Such monoadducts are known to preferentially trigger reversion of the mutation by DNA repair enzymes.
The cDNA sequence of mouse Pgp-1 and homology to human CD44 cell surface antigen and proteoglycan core/link proteins.

PubMed

Wolffe, E J; Gause, W C; Pelfrey, C M; Holland, S M; Steinberg, A D; August, J T

1990-01-05

We describe the isolation and sequencing of a cDNA encoding mouse Pgp-1. An oligonucleotide probe corresponding to the NH2-terminal sequence of the purified protein was synthesized by the polymerase chain reaction and used to screen a mouse macrophage lambda gt11 library. A cDNA clone with an insert of 1.2 kilobases was selected and sequenced. In Northern blot analysis, only cells expressing Pgp-1 contained mRNA species that hybridized with this Pgp-1 cDNA. The nucleotide sequence of the cDNA has a single open reading frame that yields a protein-coding sequence of 1076 base pairs followed by a 132-base pair 3'-untranslated sequence that includes a putative polyadenylation signal but no poly(A) tail. The translated sequence comprises a 13-amino acid signal peptide followed by a polypeptide core of 345 residues corresponding to an Mr of 37,800. Portions of the deduced amino acid sequence were identical to those obtained by amino acid sequence analysis from the purified glycoprotein, confirming that the cDNA encodes Pgp-1. The predicted structure of Pgp-1 includes an NH2-terminal extracellular domain (residues 14-265), a transmembrane domain (residues 266-286), and a cytoplasmic tail (residues 287-358). Portions of the mouse Pgp-1 sequence are highly similar to that of the human CD44 cell surface glycoprotein implicated in cell adhesion. The protein also shows sequence similarity to the proteoglycan tandem repeat sequences found in cartilage link protein and cartilage proteoglycan core protein which are thought to be involved in binding to hyaluronic acid.

Maintenance of an Intact Human Immunodeficiency Virus Type 1 vpr Gene following Mother-to-Infant Transmission

PubMed Central

Yedavalli, Venkat R. K.; Chappey, Colombe; Ahmad, Nafees

1998-01-01

The vpr sequences from six human immunodeficiency virus type 1 (HIV-1)-infected mother-infant pairs following perinatal transmission were analyzed. We found that 153 of the 166 clones analyzed from uncultured peripheral blood mononuclear cell DNA samples showed a 92.17% frequency of intact vpr open reading frames. There was a low degree of heterogeneity of vpr genes within mothers, within infants, and between epidemiologically linked mother-infant pairs. The distances between vpr sequences were greater in epidemiologically unlinked individuals than in epidemiologically linked mother-infant pairs. Moreover, the infants’ sequences displayed patterns similar to those seen in their mothers. The functional domains essential for Vpr activity, including virion incorporation, nuclear import, and cell cycle arrest and differentiation were highly conserved in most of the sequences. Phylogenetic analyses of 166 mother-infant pairs and 195 other available vpr sequences from HIV databases formed distinct clusters for each mother-infant pair and for other vpr sequences and grouped the six mother-infant pairs’ sequences with subtype B sequences. A high degree of conservation of intact and functional vpr supports the notion that vpr plays an important role in HIV-1 infection and replication in mother-infant isolates that are involved in perinatal transmission. PMID:9658150
Hybridization and sequencing of nucleic acids using base pair mismatches

DOEpatents

Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

2001-01-01

Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.
Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

PubMed Central

2007-01-01

We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882
Leuconostoc pseudomesenteroides WCFur3 partial 16S rRNA gene

USDA-ARS?s Scientific Manuscript database

This study used a partial 535 base pair 16S rRNA gene sequence to identify a bacterial isolate. Fatty acid profiles are consistent with the 16S rRNA gene sequence identification of this bacterium. The isolate was obtained from a compost bin in Fort Collins, Colorado, USA. The 16S rRNA gene sequen...
Aquaporin 2 of Rhipicephalus (Boophilus) microplus as a potential target to control ticks and tick-borne parasites

USDA-ARS?s Scientific Manuscript database

In a collaboration with Washington State University and ARS-Pullman, WA researchers, we identified and sequenced a 1,059 base pair Rhipicephalus microplus transcript that contained the coding region for a water channel protein, Aquaporin 2 (RmAQP2). The clone sequencing resulted in the production of...
Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78

Treesearch

Diego Martinez; Luis Larrondo; Nik Putnam; Maarten D. Sollewijn; Maarten D. Sollewijn Gelpke; Katherine Huang; Jarrod Chapman; Kevin G. Helfenbein; Preethi Ramaiya; J. Chris Detter; Frank Larimer; Pedro M. Coutinho; Bernard Henrissat; Randy Berka; Dan Cullen; Daniel Rokhsar

2004-01-01

White rot fungi efficiently degrade lignin, a complex aromatic polymer in wood that is among the most abundant natural materials on earth. These fungi use extracellular oxidative enzymes that are also able to transform related aromatic compounds found in explosive contaminants, pesticides and toxic waste. We have sequenced the 30-million base-pair genome of...
Child Development and Structural Variation in the Human Genome

ERIC Educational Resources Information Center

Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

2013-01-01

Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…
Acetylcholinesterase 1 in populations of organophosphate resistant North American strains of the cattle tick, Rhipicephalus microplus (Acari: Ixodidae)

USDA-ARS?s Scientific Manuscript database

In a collaboration with Purdue University researchers, we sequenced a 143,606 base pair Rhipicephalus microplus BAC library clone that contained the coding region for acetylcholinesterase 1 (AChE1). Sequencing was by Sanger protocols and the final assembly resulted in 15 contigs of varying length, e...
Image correlation method for DNA sequence alignment.

PubMed

Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván

2012-01-01

The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.
Unusual target site disruption by the rare-cutting HNH restriction endonuclease PacI

PubMed Central

Shen, Betty; Heiter, Daniel F.; Chan, Siu-Hong; Wang, Hua; Xu, Shuang-Yong; Morgan, Richard D.; Wilson, Geoffrey G.; Stoddard, Barry L.

2010-01-01

The crystal structure of the rare-cutting HNH restriction endonuclease PacI in complex with its eight base pair target recognition sequence 5'-TTAATTAA-3' has been determined to 1.9 Å resolution. The enzyme forms an extended homodimer, with each subunit containing two zinc-bound motifs surrounding a ββα-metal catalytic site. The latter is unusual in that a tyrosine residue likely initiates strand-cleavage. PacI dramatically distorts its target sequence from Watson-Crick duplex DNA basepairing, with every base separated from its original partner. Two bases on each strand are unpaired, four are engaged in non-canonical A:A and T:T base pairs, and the remaining two bases are matched with new Watson-Crick partners. This represents a highly unusual DNA binding mechanism for a restriction endonuclease, and implies that initial recognition of the target site might involve significantly different contacts from those visualized in the DNA-bound cocrystal structures. PMID:20541511
Cloning, sequencing, and expression of cDNA for human. beta. -glucuronidase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oshima, A.; Kyle, J.W.; Miller, R.D.

1987-02-01

The authors report here the cDNA sequence for human placental ..beta..-glucuronidase (..beta..-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH/sub 2/-terminal amino acid sequence determined for human spleen ..beta..-glucuronidase agreed with that inferred from the DNAmore » sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human ..beta..-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human ..beta..-glucuronidase, demonstrate the existence of two populations of mRNA for ..beta..-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length.« less
Synthesis and triplex forming properties of pyrimidine derivative containing extended functionality.

PubMed

Gianolio, D A; McLaughlin, L W

1999-08-01

Two pyrimidine nucleosides have been synthesized containing extended hydrogen bonding functionality. In one case the side chain is based upon semicarbazide and in the second monoacetylated carbohydrazide was employed. DNA sequences could be prepared using both analogue nucleosides in a reverse coupling protocol, and provided that the normal capping step was eliminated and that the iodine-based oxidizing solution was replaced with one based upon 10-camphorsulfonyl oxaziridine. Both derivatives exhibited moderate effects in targeting selectively C-G base pairs embedded within a polypurine target sequence.
Extended Closed-form Expressions for the Robust Symmetrical Number System Dynamic Range and an Efficient Algorithm for its Computation

DTIC Science & Technology

2014-01-01

and distance between all of the vector ambiguity pairs for the combined N−sequences. To simplify our derivation, we define the center of ambiguity (COA...modulo N . The resulting structure of the N sequences ensures that two successive RSNS vectors (paired terms from all N sequences) when considered...represented by a vector , Xh = [x1,h, x2,h, . . . , xN,h] T , of N paired integers from each se- quence at h. For example, a left-shifted, three-sequence
Reactivity of cytosine and thymine in single-base-pair mismatches with hydroxylamine and osmium tetroxide and its application to the study of mutations.

PubMed Central

Cotton, R G; Rodrigues, N R; Campbell, R D

1988-01-01

The chemical reactivity of thymine (T), when mismatched with the bases cytosine, guanine, and thymine, and of cytosine (C), when mismatched with thymine, adenine, and cytosine, has been examined. Heteroduplex DNAs containing such mismatched base pairs were first incubated with osmium tetroxide (for T and C mismatches) or hydroxylamine (for C mismatches) and then incubated with piperidine to cleave the DNA at the modified mismatched base. This cleavage was studied with an internally labeled strand containing the mismatched T or C, such that DNA cleavage and thus reactivity could be detected by gel electrophoresis. Cleavage at a total of 13 T and 21 C mismatches isolated (by at least three properly paired bases on both sides) single-base-pair mismatches was identified. All T or C mismatches studied were cleaved. By using end-labeled DNA probes containing T or C single-base-pair mismatches and conditions for limited cleavage, we were able to show that cleavage was at the base predicted by sequence analysis and that mismatches in a length of DNA could be readily detected by such an approach. This procedure may enable detection of all single-base-pair mismatches by use of sense and antisense probes and thus may be used to identify the mutated base and its position in a heteroduplex. Images PMID:3260032
Phytoplasma-specific PCR primers based on sequences of the 16S-23S rRNA spacer region.

PubMed Central

Smart, C D; Schneider, B; Blomquist, C L; Guerra, L J; Harrison, N A; Ahrens, U; Lorenz, K H; Seemüller, E; Kirkpatrick, B C

1996-01-01

In order to develop a diagnostic tool to identify phytoplasmas and classify them according to their phylogenetic group, we took advantage of the sequence diversity of the 16S-23S intergenic spacer regions (SRs) of phytoplasmas. Ten PCR primers were developed from the SR sequences and were shown to amplify in a group-specific fashion. For some groups of phytoplasmas, such as elm yellows, ash yellows, and pear decline, the SR primer was paired with a specific primer from within the 16S rRNA gene. Each of these primer pairs was specific for a specific phytoplasma group, and they did not produce PCR products of the correct size from any other phytoplasma group. One primer was designed to anneal within the conserved tRNA(Ile) and, when paired with a universal primer, amplified all phytoplasmas tested. None of the primers produced PCR amplification products of the correct size from healthy plant DNA. These primers can serve as effective tools for identifying particular phytoplasmas in field samples. PMID:8702291
Plastid primers for angiosperm phylogenetics and phylogeography.

PubMed

Prince, Linda M

2015-06-01

PCR primers are available for virtually every region of the plastid genome. Selection of which primer pairs to use is second only to selection of the genic region. This is particularly true for research at the species/population interface. Primer pairs for 130 regions of the chloroplast genome were evaluated in 12 species distributed across the angiosperms. Likelihood of amplification success was inferred based upon number and location of mismatches to target sequence. Intraspecific sequence variability was evaluated under three different criteria in four species. Many published primer pairs should work across all taxa sampled, with the exception of failure due to genomic reorganization events. Universal barcoding primers were the least likely to work (65% success). The list of most variable regions for use within species has little in common with the lists identified in prior studies. Published primer sequences should amplify a diversity of flowering plant DNAs, even those designed for specific taxonomic groups. "Universal" primers may have extremely limited utility. There was little consistency in likelihood of amplification success for any given publication across lineages or within lineage across publications.
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

PubMed

Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

2008-05-01

Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

PubMed Central

Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

2008-01-01

Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Mapping Structurally Defined Guanine Oxidation Products along DNA Duplexes: Influence of Local Sequence Context and Endogenous Cytosine Methylation

PubMed Central

2015-01-01

DNA oxidation by reactive oxygen species is nonrandom, potentially leading to accumulation of nucleobase damage and mutations at specific sites within the genome. We now present the first quantitative data for sequence-dependent formation of structurally defined oxidative nucleobase adducts along p53 gene-derived DNA duplexes using a novel isotope labeling-based approach. Our results reveal that local nucleobase sequence context differentially alters the yields of 2,2,4-triamino-2H-oxal-5-one (Z) and 8-oxo-7,8-dihydro-2′-deoxyguanosine (OG) in double stranded DNA. While both lesions are overproduced within endogenously methylated MeCG dinucleotides and at 5′ Gs in runs of several guanines, the formation of Z (but not OG) is strongly preferred at solvent-exposed guanine nucleobases at duplex ends. Targeted oxidation of MeCG sequences may be caused by a lowered ionization potential of guanine bases paired with MeC and the preferential intercalation of riboflavin photosensitizer adjacent to MeC:G base pairs. Importantly, some of the most frequently oxidized positions coincide with the known p53 lung cancer mutational “hotspots” at codons 245 (GGC), 248 (CGG), and 158 (CGC) respectively, supporting a possible role of oxidative degradation of DNA in the initiation of lung cancer. PMID:24571128
T box transcription antitermination riboswitch: Influence of nucleotide sequence and orientation on tRNA binding by the antiterminator element

PubMed Central

Fauzi, Hamid; Agyeman, Akwasi; Hines, Jennifer V.

2008-01-01

Many bacteria utilize riboswitch transcription regulation to monitor and appropriately respond to cellular levels of important metabolites or effector molecules. The T box transcription antitermination riboswitch responds to cognate uncharged tRNA by specifically stabilizing an antiterminator element in the 5′-untranslated mRNA leader region and precluding formation of a thermodynamically more stable terminator element. Stabilization occurs when the tRNA acceptor end base pairs with the first four nucleotides in the seven nucleotide bulge of the highly conserved antiterminator element. The significance of the conservation of the antiterminator bulge nucleotides that do not base pair with the tRNA is unknown, but they are required for optimal function. In vitro selection was used to determine if the isolated antiterminator bulge context alone dictates the mode in which the tRNA acceptor end binds the bulge nucleotides. No sequence conservation beyond complementarity was observed and the location was not constrained to the first four bases of the bulge. The results indicate that formation of a structure that recognizes the tRNA acceptor end in isolation is not the determinant driving force for the high phylogenetic sequence conservation observed within the antiterminator bulge. Additional factors or T box leader features more likely influenced the phylogenetic sequence conservation. PMID:19152843

Proximity to AGCT sequences dictates MMR-independent versus MMR-dependent mechanisms for AID-induced mutation via UNG2

PubMed Central

Thientosapol, Eddy Sanchai; Sharbeen, George; Lau, K.K. Edwin; Bosnjak, Daniel; Durack, Timothy; Stevanovski, Igor; Weninger, Wolfgang

2017-01-01

Abstract AID deaminates C to U in either strand of Ig genes, exclusively producing C:G/G:C to T:A/A:T transition mutations if U is left unrepaired. Error-prone processing by UNG2 or mismatch repair diversifies mutation, predominantly at C:G or A:T base pairs, respectively. Here, we show that transversions at C:G base pairs occur by two distinct processing pathways that are dictated by sequence context. Within and near AGCT mutation hotspots, transversion mutation at C:G was driven by UNG2 without requirement for mismatch repair. Deaminations in AGCT were refractive both to processing by UNG2 and to high-fidelity base excision repair (BER) downstream of UNG2, regardless of mismatch repair activity. We propose that AGCT sequences resist faithful BER because they bind BER-inhibitory protein(s) and/or because hemi-deaminated AGCT motifs innately form a BER-resistant DNA structure. Distal to AGCT sequences, transversions at G were largely co-dependent on UNG2 and mismatch repair. We propose that AGCT-distal transversions are produced when apyrimidinic sites are exposed in mismatch excision patches, because completion of mismatch repair would require bypass of these sites. PMID:28039326
Predicting DNA hybridization kinetics from sequence

NASA Astrophysics Data System (ADS)

Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu

2018-01-01

Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.
Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries.

PubMed

Lu, Fu-Hao; McKenzie, Neil; Kettleborough, George; Heavens, Darren; Clark, Matthew D; Bevan, Michael W

2018-05-01

The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. Here we use a Fosill 38-kb jumping library to assess medium and longer-range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.
Isosteric And Non-Isosteric Base Pairs In RNA Motifs: Molecular Dynamics And Bioinformatics Study Of The Sarcin-Ricin Internal Loop

PubMed Central

Havrila, Marek; Réblová, Kamila; Zirbel, Craig L.; Leontis, Neocles B.; Šponer, Jiří

2013-01-01

The Sarcin-Ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, i.e., in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of SR motif. SHAPE probing experiment was also performed to confirm fidelity of MD simulations. We identified 57 instances of the SR motif in a non-redundant subset of the RNA X-ray structure database and analyzed their basepairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large ribosomal RNA alignments to determine frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Non isosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that inability to form stable cWW geometry is an important factor in case of the first base pair of the flexible region of the SR motif. Comparison of structural, bioinformatics, SHAPE probing and MD simulation data reveals that explicit solvent MD simulations neatly reflect viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions. PMID:24144333
Sequencing of adenine in DNA by scanning tunneling microscopy

NASA Astrophysics Data System (ADS)

Tanaka, Hiroyuki; Taniguchi, Masateru

2017-08-01

The development of DNA sequencing technology utilizing the detection of a tunnel current is important for next-generation sequencer technologies based on single-molecule analysis technology. Using a scanning tunneling microscope, we previously reported that dI/dV measurements and dI/dV mapping revealed that the guanine base (purine base) of DNA adsorbed onto the Cu(111) surface has a characteristic peak at V s = -1.6 V. If, in addition to guanine, the other purine base of DNA, namely, adenine, can be distinguished, then by reading all the purine bases of each single strand of a DNA double helix, the entire base sequence of the original double helix can be determined due to the complementarity of the DNA base pair. Therefore, the ability to read adenine is important from the viewpoint of sequencing. Here, we report on the identification of adenine by STM topographic and spectroscopic measurements using a synthetic DNA oligomer and viral DNA.
Using distances between Top-n-gram and residue pairs for protein remote homology detection.

PubMed

Liu, Bin; Xu, Jinghao; Zou, Quan; Xu, Ruifeng; Wang, Xiaolong; Chen, Qingcai

2014-01-01

Protein remote homology detection is one of the central problems in bioinformatics, which is important for both basic research and practical application. Currently, discriminative methods based on Support Vector Machines (SVMs) achieve the state-of-the-art performance. Exploring feature vectors incorporating the position information of amino acids or other protein building blocks is a key step to improve the performance of the SVM-based methods. Two new methods for protein remote homology detection were proposed, called SVM-DR and SVM-DT. SVM-DR is a sequence-based method, in which the feature vector representation for protein is based on the distances between residue pairs. SVM-DT is a profile-based method, which considers the distances between Top-n-gram pairs. Top-n-gram can be viewed as a profile-based building block of proteins, which is calculated from the frequency profiles. These two methods are position dependent approaches incorporating the sequence-order information of protein sequences. Various experiments were conducted on a benchmark dataset containing 54 families and 23 superfamilies. Experimental results showed that these two new methods are very promising. Compared with the position independent methods, the performance improvement is obvious. Furthermore, the proposed methods can also provide useful insights for studying the features of protein families. The better performance of the proposed methods demonstrates that the position dependant approaches are efficient for protein remote homology detection. Another advantage of our methods arises from the explicit feature space representation, which can be used to analyze the characteristic features of protein families. The source code of SVM-DT and SVM-DR is available at http://bioinformatics.hitsz.edu.cn/DistanceSVM/index.jsp.
A common anchor facilitated GO-DNA nano-system for multiplex microRNA analysis in live cells.

PubMed

Yu, Jiantao; He, Sihui; Shao, Chen; Zhao, Haoran; Li, Jing; Tian, Leilei

2018-04-19

The design of a nano-system for the detection of intracellular microRNAs is challenging as it must fulfill complex requirements, i.e., it must have a high sensitivity to determine the dynamic expression level, a good reliability for multiplex and simultaneous detection, and a satisfactory biostability to work in biological environments. Instead of employing a commonly used physisorption or a full-conjugation strategy, here, a GO-DNA nano-system was developed under graft/base-pairing construction. The common anchor sequence was chemically grafted to GO to base-pair with various microRNA probes; and the hybridization with miRNAs drives the dyes on the probes to leave away from GO, resulting in "turned-on" fluorescence. This strategy not only simplifies the synthesis but also efficiently balances the loading yields of different probes. Moreover, the conjugation yield of GO with a base-paired hybrid has been improved by more than two-fold compared to that of the conjugation with a single strand. We demonstrated that base-paired DNA probes could be efficiently delivered into cells along with GO and are properly stabilized by the conjugated anchor sequence. The resultant GO-DNA nano-system exhibited high stability in a complex biological environment and good resistance to nucleases, and was able to accurately discriminate various miRNAs without cross-reaction. With all of these positive features, the GO-DNA nano-system can simultaneously detect three miRNAs and monitor their dynamic expression levels.
DNA hybridization kinetics: zippering, internal displacement and sequence dependence.

PubMed

Ouldridge, Thomas E; Sulc, Petr; Romano, Flavio; Doye, Jonathan P K; Louis, Ard A

2013-10-01

Although the thermodynamics of DNA hybridization is generally well established, the kinetics of this classic transition is less well understood. Providing such understanding has new urgency because DNA nanotechnology often depends critically on binding rates. Here, we explore DNA oligomer hybridization kinetics using a coarse-grained model. Strand association proceeds through a complex set of intermediate states, with successful binding events initiated by a few metastable base-pairing interactions, followed by zippering of the remaining bonds. But despite reasonably strong interstrand interactions, initial contacts frequently dissociate because typical configurations in which they form differ from typical states of similar enthalpy in the double-stranded equilibrium ensemble. Initial contacts must be stabilized by two or three base pairs before full zippering is likely, resulting in negative effective activation enthalpies. Non-Arrhenius behavior arises because the number of base pairs required for nucleation increases with temperature. In addition, we observe two alternative pathways-pseudoknot and inchworm internal displacement-through which misaligned duplexes can rearrange to form duplexes. These pathways accelerate hybridization. Our results explain why experimentally observed association rates of GC-rich oligomers are higher than rates of AT- rich equivalents, and more generally demonstrate how association rates can be modulated by sequence choice.
Stability of non-Watson-Crick G-A/A-G base pair in synthetic DNA and RNA oligonucleotides.

PubMed

Ito, Yuko; Sone, Yumiko; Mizutani, Takaharu

2004-03-01

A non-Watson-Crick G-A/A-G base pair is found in SECIS (selenocysteine-insertion sequence) element in the 3'-untranslated region of Se-protein mRNAs and in the functional site of the hammerhead ribozyme. We studied the stability of G-A/A-G base pair (bold) in 17mer GT(U)GACGGAAACCGGAAC synthetic DNA and RNA oligonucleotides by thermal melting experiments and gel electrophoresis. The measured Tm value of DNA oligonucleotide having G-A/A-G pair showed an intermediate value (58 degrees C) between that of Watson-Crick G-C/C-G base pair (75 degrees C) and that of G-G/A-A of non-base-pair (40 degrees C). Similar thermal melting patterns were obtained with RNA oligonucleotides. This result indicates that the secondary structure of oligonucleotide having G-A/A-G base pair is looser than that of the G-C type Watson-Crick base pair. In the comparison between RNA and DNA having G-A/A-G base pair, the Tm value of the RNA oligonucleotide was 11 degrees C lower than that of DNA, indicating that DNA has a more rigid structure than RNA. The stained pattern of oligonucleotide on polyacrylamide gel clarified that the mobility of the DNA oligonucleotide G-A/A-G base pair changed according to the urea concentration from the rigid state (near the mobility of G-C/C-G oligonucleotide) in the absence of urea to the random state (near the mobility of G-G/A-A oligonucleotide) in 7 M urea. However, the RNA oligonucleotide with G-A/A-G pair moved at an intermediate mobility between that of oligonucleotide with G-C/C-G and of the oligonucleotide with G-G/A-A, and the mobility pattern did not depend on urea concentration. Thus, DNA and RNA oligonucleotides with the G-A/A-G base pair showed a pattern indicating an intermediate structure between the rigid Watson-Crick base pair and the random structure of non-base pair. RNA with G-A/A-G base pair has the intermediate structure not influenced by urea concentration. Finally, this study indicated that the intermediate rigidity imparted by Non-Watson-Crick base pair in SECIS element plays an important role in the selenocysteine expression by UGA codon.
A next generation semiconductor based sequencing approach for the identification of meat species in DNA mixtures.

PubMed

Bertolini, Francesca; Ghionda, Marco Ciro; D'Alessandro, Enrico; Geraci, Claudia; Chiofalo, Vincenzo; Fontanesi, Luca

2015-01-01

The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine) for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon) as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43%) in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97) and lower for avian species (0.70). PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures.
A Next Generation Semiconductor Based Sequencing Approach for the Identification of Meat Species in DNA Mixtures

PubMed Central

Bertolini, Francesca; Ghionda, Marco Ciro; D’Alessandro, Enrico; Geraci, Claudia; Chiofalo, Vincenzo; Fontanesi, Luca

2015-01-01

The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine) for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon) as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43%) in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97) and lower for avian species (0.70). PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures. PMID:25923709
Screening the sequence selectivity of DNA-binding molecules using a gold nanoparticle-based colorimetric approach.

PubMed

Hurst, Sarah J; Han, Min Su; Lytton-Jean, Abigail K R; Mirkin, Chad A

2007-09-15

We have developed a novel competition assay that uses a gold nanoparticle (Au NP)-based, high-throughput colorimetric approach to screen the sequence selectivity of DNA-binding molecules. This assay hinges on the observation that the melting behavior of DNA-functionalized Au NP aggregates is sensitive to the concentration of the DNA-binding molecule in solution. When short, oligomeric hairpin DNA sequences were added to a reaction solution consisting of DNA-functionalized Au NP aggregates and DNA-binding molecules, these molecules may either bind to the Au NP aggregate interconnects or the hairpin stems based on their relative affinity for each. This relative affinity can be measured as a change in the melting temperature (Tm) of the DNA-modified Au NP aggregates in solution. As a proof of concept, we evaluated the selectivity of 4',6-diamidino-2-phenylindone (an AT-specific binder), ethidium bromide (a nonspecific binder), and chromomycin A (a GC-specific binder) for six sequences of hairpin DNA having different numbers of AT pairs in a five-base pair variable stem region. Our assay accurately and easily confirmed the known trends in selectivity for the DNA binders in question without the use of complicated instrumentation. This novel assay will be useful in assessing large libraries of potential drug candidates that work by binding DNA to form a drug/DNA complex.
Mitochondrial tRNA 5'-editing in Dictyostelium discoideum and Polysphondylium pallidum.

PubMed

Abad, Maria G; Long, Yicheng; Kinchen, R Dimitri; Schindel, Elinor T; Gray, Michael W; Jackman, Jane E

2014-05-30

Mitochondrial tRNA (mt-tRNA) 5'-editing was first described more than 20 years ago; however, the first candidates for 5'-editing enzymes were only recently identified in a eukaryotic microbe (protist), the slime mold Dictyostelium discoideum. In this organism, eight of 18 mt-tRNAs are predicted to be edited based on the presence of genomically encoded mismatched nucleotides in their aminoacyl-acceptor stem sequences. Here, we demonstrate that mt-tRNA 5'-editing occurs at all predicted sites in D. discoideum as evidenced by changes in the sequences of isolated mt-tRNAs compared with the expected sequences encoded by the mitochondrial genome. We also identify two previously unpredicted editing events in which G-U base pairs are edited in the absence of any other genomically encoded mismatches. A comparison of 5'-editing in D. discoideum with 5'-editing in another slime mold, Polysphondylium pallidum, suggests organism-specific idiosyncrasies in the treatment of U-G/G-U pairs. In vitro activities of putative D. discoideum editing enzymes are consistent with the observed editing reactions and suggest an overall lack of tRNA substrate specificity exhibited by the repair component of the editing enzyme. Although the presence of terminal mismatches in mt-tRNA sequences is highly predictive of the occurrence of mt-tRNA 5'-editing, the variability in treatment of U-G/G-U base pairs observed here indicates that direct experimental evidence of 5'-editing must be obtained to understand the complete spectrum of mt-tRNA editing events in any species. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
Employment of Near Full-Length Ribosome Gene TA-Cloning and Primer-Blast to Detect Multiple Species in a Natural Complex Microbial Community Using Species-Specific Primers Designed with Their Genome Sequences.

PubMed

Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou

2016-11-01

It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.
GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

PubMed

Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

2006-03-31

Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.
Repertoire of novel sequence signatures for the detection of Candidatus Liberibacter asiaticus by quantitative real-time PCR

PubMed Central

2014-01-01

Background Huanglongbing (HLB) or citrus greening is a devastating disease of citrus. The gram-negative bacterium Candidatus Liberibacter asiaticus (Las) belonging to the α-proteobacteria is responsible for HLB in North America as well as in Asia. Currently, there is no cure for this disease. Early detection and quarantine of Las-infected trees are important management strategies used to prevent HLB from invading HLB-free citrus producing regions. Quantitative real-time PCR (qRT-PCR) based molecular diagnostic assays have been routinely used in the detection and diagnosis of Las. The oligonucleotide primer pairs based on conserved genes or regions, which include 16S rDNA and the β-operon, have been widely employed in the detection of Las by qRT-PCR. The availability of whole genome sequence of Las now allows the design of primers beyond the conserved regions for the detection of Las explicitly. Results We took a complimentary approach by systematically screening the genes in a genome-wide fashion, to identify the unique signatures that are only present in Las by an exhaustive sequence based similarity search against the nucleotide sequence database. Our search resulted in 34 probable unique signatures. Furthermore, by designing the primer pair specific to the identified signatures, we showed that most of our primer sets are able to detect Las from the infected plant and psyllid materials collected from the USA and China by qRT-PCR. Overall, 18 primer pairs of the 34 are found to be highly specific to Las with no cross reactivity to the closely related species Ca. L. americanus (Lam) and Ca. L. africanus (Laf). Conclusions We have designed qRT-PCR primers based on Las specific genes. Among them, 18 are suitable for the detection of Las from Las-infected plant and psyllid samples. The repertoire of primers that we have developed and characterized in this study enhanced the qRT-PCR based molecular diagnosis of HLB. PMID:24533511
An unusual mode of DNA duplex association: Watson-Crick interaction of all-purine deoxyribonucleic acids.

PubMed

Battersby, Thomas R; Albalos, Maria; Friesenhahn, Michel J

2007-05-01

Nucleic acid duplexes associating through purine-purine base pairing have been constructed and characterized in a remarkable demonstration of nucleic acids with mixed sequence and a natural backbone in an alternative duplex structure. The antiparallel deoxyribose all-purine duplexes associate specifically through Watson-Crick pairing, violating the nucleobase size-complementarity pairing convention found in Nature. Sequence-specific recognition displayed by these structures makes the duplexes suitable, in principle, for information storage and replication fundamental to molecular evolution in all living organisms. All-purine duplexes can be formed through association of purines found in natural ribonucleosides. Key to the formation of these duplexes is the N(3)-H tautomer of isoguanine, preferred in the duplex, but not in aqueous solution. The duplexes have relevance to evolution of the modern genetic code and can be used for molecular recognition of natural nucleic acids.
Molecular phylogeny and SNP variation of polar bears (Ursus maritimus), brown bears (U. arctos), and black bears (U. americanus) derived from genome sequences.

PubMed

Cronin, Matthew A; Rincon, Gonzalo; Meredith, Robert W; MacNeil, Michael D; Islas-Trejo, Alma; Cánovas, Angela; Medrano, Juan F

2014-01-01

We assessed the relationships of polar bears (Ursus maritimus), brown bears (U. arctos), and black bears (U. americanus) with high throughput genomic sequencing data with an average coverage of 25× for each species. A total of 1.4 billion 100-bp paired-end reads were assembled using the polar bear and annotated giant panda (Ailuropoda melanoleuca) genome sequences as references. We identified 13.8 million single nucleotide polymorphisms (SNP) in the 3 species aligned to the polar bear genome. These data indicate that polar bears and brown bears share more SNP with each other than either does with black bears. Concatenation and coalescence-based analysis of consensus sequences of approximately 1 million base pairs of ultraconserved elements in the nuclear genome resulted in a phylogeny with black bears as the sister group to brown and polar bears, and all brown bears are in a separate clade from polar bears. Genotypes for 162 SNP loci of 336 bears from Alaska and Montana showed that the species are genetically differentiated and there is geographic population structure of brown and black bears but not polar bears.
OrthoANI: An improved algorithm and software for calculating average nucleotide identity.

PubMed

Lee, Imchang; Ouk Kim, Yeong; Park, Sang-Cheol; Chun, Jongsik

2016-02-01

Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.
Scaling features of noncoding DNA

NASA Technical Reports Server (NTRS)

Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.

1999-01-01

We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.

Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

DOEpatents

Studier, F. William

1995-04-18

Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.
Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

DOEpatents

Studier, F.W.

1995-04-18

Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.
Working the kinks out of nucleosomal DNA

PubMed Central

Olson, Wilma K.; Zhurkin, Victor B.

2011-01-01

Condensation of DNA in the nucleosome takes advantage of its double-helical architecture. The DNA deforms at sites where the base pairs face the histone octamer. The largest so-called kink-and-slide deformations occur in the vicinity of arginines that penetrate the minor groove. Nucleosome structures formed from the 601 positioning sequence differ subtly from those incorporating an AT-rich human α-satellite DNA. Restraints imposed by the histone arginines on the displacement of base pairs can modulate the sequence-dependent deformability of DNA and potentially contribute to the unique features of the different nucleosomes. Steric barriers mimicking constraints found in the nucleosome induce the simulated large-scale rearrangement of canonical B-DNA to kink-and-slide states. The pathway to these states shows non-harmonic behavior consistent with bending profiles inferred from AFM measurements. PMID:21482100
Stable loop in the crystal structure of the intercalated four-stranded cytosine-rich metazoan telomere

NASA Technical Reports Server (NTRS)

Kang, C.; Berger, I.; Lockshin, C.; Ratliff, R.; Moyzis, R.; Rich, A.

1995-01-01

In most metazoans, the telomeric cytosine-rich strand repeating sequence is d(TAACCC). The crystal structure of this sequence was solved to 1.9-A resolution. Four strands associate via the cytosine-containing parts to form a four-stranded intercalated structure held together by C.C+ hydrogen bonds. The base-paired strands are parallel to each other, and the two duplexes are intercalated into each other in opposite orientations. One TAA end forms a highly stabilized loop with the 5' thymine Hoogsteen-base-paired to the third adenine. The 5' end of this loop is in close proximity to the 3' end of one of the other intercalated cytosine strands. Instead of being entirely in a DNA duplex, this structure suggests the possibility of an alternative conformation for the cytosine-rich telomere strands.
Nearest-neighbor thermodynamics of deoxyinosine pairs in DNA duplexes

PubMed Central

Watkins, Norman E.; SantaLucia, John

2005-01-01

Nearest-neighbor thermodynamic parameters of the ‘universal pairing base’ deoxyinosine were determined for the pairs I·C, I·A, I·T, I·G and I·I adjacent to G·C and A·T pairs. Ultraviolet absorbance melting curves were measured and non-linear regression performed on 84 oligonucleotide duplexes with 9 or 12 bp lengths. These data were combined with data for 13 inosine containing duplexes from the literature. Multiple linear regression was used to solve for the 32 nearest-neighbor unknowns. The parameters predict the Tm for all sequences within 1.2°C on average. The general trend in decreasing stability is I·C > I·A > I·T ≈ I· G > I·I. The stability trend for the base pair 5′ of the I·X pair is G·C > C·G > A·T > T·A. The stability trend for the base pair 3′ of I·X is the same. These trends indicate a complex interplay between H-bonding, nearest-neighbor stacking, and mismatch geometry. A survey of 14 tandem inosine pairs and 8 tandem self-complementary inosine pairs is also provided. These results may be used in the design of degenerate PCR primers and for degenerate microarray probes. PMID:16264087
A label-free DNA hairpin biosensor for colorimetric detection of target with suitable functional DNA partners.

PubMed

Nie, Ji; Zhang, De-Wen; Tie, Cai; Zhou, Ying-Lin; Zhang, Xin-Xiang

2013-11-15

The combination of aptamer and peroxidase-mimicking DNAzyme within a hairpin structure can form a functional DNA probe. The activities of both aptamer (as biorecognition element) and DNAzyme (as signal amplification element) are blocked via base pairing in the hairpin structure. The presence of target triggers the opening of the hairpin to form target/aptamer complex and releases G-quadruplex sequence which can generate amplified colorimetric signals. In this work, we elaborated a universal and simple procedure to design an efficient and sensitive hairpin probe with suitable functional DNA partners. A fill-in-the-blank process was developed for sequence design, and two key points including the pretreatment of the hairpin probe and the selection of suitable signal transducer sequence were proved to enhance the detection sensitivity. Cocaine was chosen as a model target for a proof of concept. A series of hairpins with different numbers of base pairs in the stem region were prepared. Hairpin-C10 with ten base pairs was screened out and a lowest detectable cocaine concentration of 5 μM by colorimetry was obtained. The proposed functional DNA hairpin showed good selectivity and satisfactory analysis in spiked biologic fluid. The whole "mix-and-measure" detection based on DNA hairpin without the need of immobilization and labeling was indicated to be time and labor saving. The strategy has potential to be transplanted into more smart hairpins toward other targets for general application in bioanalytical chemistry. Copyright © 2013 Elsevier B.V. All rights reserved.
Four base recognition by triplex-forming oligonucleotides at physiological pH

PubMed Central

Rusling, David A.; Powers, Vicki E. C.; Ranasinghe, Rohan T.; Wang, Yang; Osborne, Sadie D.; Brown, Tom; Fox, Keith R.

2005-01-01

We have achieved recognition of all 4 bp by triple helix formation at physiological pH, using triplex-forming oligonucleotides that contain four different synthetic nucleotides. BAU [2′-aminoethoxy-5-(3-aminoprop-1-ynyl)uridine] recognizes AT base pairs with high affinity, MeP (3-methyl-2 aminopyridine) binds to GC at higher pHs than cytosine, while APP (6-(3-aminopropyl)-7-methyl-3H-pyrrolo[2,3-d]pyrimidin-2(7H)-one) and S [N-(4-(3-acetamidophenyl)thiazol-2-yl-acetamide)] bind to CG and TA base pairs, respectively. Fluorescence melting and DNase I footprinting demonstrate successful triplex formation at a 19mer oligopurine sequence that contains two CG and two TA interruptions. The complexes are pH dependent, but are still stable at pH 7.0. BAU, MeP and APP retain considerable selectivity, and single base pair changes opposite these residues cause a large reduction in affinity. In contrast, S is less selective and tolerates CG pairs as well as TA. PMID:15911633
Eye movements reflect and shape strategies in fraction comparison

PubMed Central

Ischebeck, Anja; Weilharter, Marina; Körner, Christof

2016-01-01

The comparison of fractions is a difficult task that can often be facilitated by separately comparing components (numerators and denominators) of the fractions—that is, by applying so-called component-based strategies. The usefulness of such strategies depends on the type of fraction pair to be compared. We investigated the temporal organization and the flexibility of strategy deployment in fraction comparison by evaluating sequences of eye movements in 20 young adults. We found that component-based strategies could account for the response times and the overall number of fixations observed for the different fraction pairs. The analysis of eye movement sequences showed that the initial eye movements in a trial were characterized by stereotypical scanning patterns indicative of an exploratory phase that served to establish the kind of fraction pair presented. Eye movements that followed this phase adapted to the particular type of fraction pair and indicated the deployment of specific comparison strategies. These results demonstrate that participants employ eye movements systematically to support strategy use in fraction comparison. Participants showed a remarkable flexibility to adapt to the most efficient strategy on a trial-by-trial basis. Our results confirm the value of eye movement measurements in the exploration of strategic adaptation in complex tasks. PMID:26039819
Solution structure and base pair opening kinetics of the i-motif dimer of d(5mCCTTTACC): a noncanonical structure with possible roles in chromosome stability.

PubMed

Nonin, S; Phan, A T; Leroy, J L

1997-09-15

Repetitive cytosine-rich DNA sequences have been identified in telomeres and centromeres of eukaryotic chromosomes. These sequences play a role in maintaining chromosome stability during replication and may be involved in chromosome pairing during meiosis. The C-rich repeats can fold into an 'i-motif' structure, in which two parallel-stranded duplexes with hemiprotonated C.C+ pairs are intercalated. Previous NMR studies of naturally occurring repeats have produced poor NMR spectra. This led us to investigate oligonucleotides, based on natural sequences, to produce higher quality spectra and thus provide further information as to the structure and possible biological function of the i-motif. NMR spectroscopy has shown that d(5mCCTTTACC) forms an i-motif dimer of symmetry-related and intercalated folded strands. The high-definition structure is computed on the basis of the build-up rates of 29 intraresidue and 35 interresidue nuclear Overhauser effect (NOE) connectivities. The i-motif core includes intercalated interstrand C.C+ pairs stacked in the order 2*.8/1.7*/1*.7/2.8* (where one strand is distinguished by an asterisk and the numbers relate to the base positions within the repeat). The TTTA sequences form two loops which span the two wide grooves on opposite sides of the i-motif core; the i-motif core is extended at both ends by the stacking of A6 onto C2.C8+. The lifetimes of pairs C2.C8+ and 5mC1.C7+ are 1 ms and 1 s, respectively, at 15 degrees C. Anomalous exchange properties of the T3 imino proton indicate hydrogen bonding to A6 N7 via a water bridge. The d(5mCCTTTTCC) deoxyoligonucleotide, in which position 6 is occupied by a thymidine instead of an adenine, also forms a symmetric i-motif dimer. However, in this structure the two TTTT loops are located on the same side of the i-motif core and the C.C+ pairs are formed by equivalent cytidines stacked in the order 8*.8/1.1*/7*.7/2.2*. Oligodeoxynucleotides containing two C-rich repeats can fold and dimerize into an i-motif. The change of folding topology resulting from the substitution of a single nucleoside emphasizes the influence of the loop residues on the i-motif structure formed by two folded strands.
RNA-ID, a highly sensitive and robust method to identify cis-regulatory sequences using superfolder GFP and a fluorescence-based assay.

PubMed

Dean, Kimberly M; Grayhack, Elizabeth J

2012-12-01

We have developed a robust and sensitive method, called RNA-ID, to screen for cis-regulatory sequences in RNA using fluorescence-activated cell sorting (FACS) of yeast cells bearing a reporter in which expression of both superfolder green fluorescent protein (GFP) and yeast codon-optimized mCherry red fluorescent protein (RFP) is driven by the bidirectional GAL1,10 promoter. This method recapitulates previously reported progressive inhibition of translation mediated by increasing numbers of CGA codon pairs, and restoration of expression by introduction of a tRNA with an anticodon that base pairs exactly with the CGA codon. This method also reproduces effects of paromomycin and context on stop codon read-through. Five key features of this method contribute to its effectiveness as a selection for regulatory sequences: The system exhibits greater than a 250-fold dynamic range, a quantitative and dose-dependent response to known inhibitory sequences, exquisite resolution that allows nearly complete physical separation of distinct populations, and a reproducible signal between different cells transformed with the identical reporter, all of which are coupled with simple methods involving ligation-independent cloning, to create large libraries. Moreover, we provide evidence that there are sequences within a 9-nt library that cause reduced GFP fluorescence, suggesting that there are novel cis-regulatory sequences to be found even in this short sequence space. This method is widely applicable to the study of both RNA-mediated and codon-mediated effects on expression.
Non-random distribution and co-localization of purine/pyrimidine-encoded information and transcriptional regulatory domains.

PubMed

Povinelli, C M

1992-01-01

In order to detect sequence-based information predictive for the location of eukaryotic transcriptional regulatory domains, the frequencies and distributions of the 36 possible purine/pyrimidine reverse complement hexamer pairs was determined for test sets of real and random sequences. The distribution of one of the hexamer pairs (RRYYRR/YYRRYY, referred to as M1) was further examined in a larger set of sequences (> 32 genes, 230 kb). Predominant clusters of M1 and the locations of eukaryotic transcriptional regulatory domains were found to be associated and non-randomly distributed along the DNA consistent with a periodicity of approximately 1.2 kb. In the context of higher ordered chromatin this would align promoters, enhancers and the predominant clusters of M1 longitudinally along one face of a 30 nm fiber. Using only information about the distribution of the M1 motif, 50-70% of a sequence could be eliminated as being unlikely to contain transcriptional regulatory domains with an 87% recovery of the regulatory domains present.
Fosmid Cre-LoxP Inverse PCR Paired-End (Fosmid CLIP-PE), a Novel Method for Constructing Fosmid Pair-End Library (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peng, Ze

2012-06-01

Ze Peng from DOE JGI presents "Fosmid Cre-LoxP Inverse PCR Paired-End (Fosmid CLIP-PE), a Novel Method for Constructing Fosmid Pair-End Library" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Fosmid Cre-LoxP Inverse PCR Paired-End (Fosmid CLIP-PE), a Novel Method for Constructing Fosmid Pair-End Library (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

ScienceCinema

Peng, Ze

2018-01-24

Ze Peng from DOE JGI presents "Fosmid Cre-LoxP Inverse PCR Paired-End (Fosmid CLIP-PE), a Novel Method for Constructing Fosmid Pair-End Library" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
The Teaching of Protein Synthesis--A Microcomputer Based Method.

ERIC Educational Resources Information Center

Goodridge, Frank

1983-01-01

Describes two computer programs (BASIC for 32K Commodore PET) for teaching protein synthesis. The first is an interactive test of base-pairing knowledge, and the second generates random DNA nucleotide sequences, with instructions for substitution, insertion, and deletion printed out for each student. (JN)
16S rDNA-based metagenomic analysis of dental plaque and lung bacteria in patients with severe acute exacerbations of chronic obstructive pulmonary disease.

PubMed

Tan, L; Wang, H; Li, C; Pan, Y

2014-12-01

Acute exacerbations of chronic obstructive pulmonary disease (AE-COPD) are leading causes of mortality in hospital intensive care units. We sought to determine whether dental plaque biofilms might harbor pathogenic bacteria that can eventually cause lung infections in patients with severe AE-COPD. Paired samples of subgingival plaque biofilm and tracheal aspirate were collected from 53 patients with severe AE-COPD. Total bacterial DNA was extracted from each sample individually for polymerase chain reaction amplification and/or generation of bacterial 16S rDNA sequences and cDNA libraries. We used a metagenomic approach, based on bacterial 16S rDNA sequences, to compare the distribution of species present in dental plaque and lung. Analysis of 1060 sequences (20 clones per patient) revealed a wide range of aerobic, anaerobic, pathogenic, opportunistic, novel and uncultivable bacterial species. Species indistinguishable between the paired subgingival plaque and tracheal aspirate samples (97-100% similarity in 16S rDNA sequence) were dental plaque pathogens (Aggregatibacter actinomycetemcomitans, Capnocytophaga sputigena, Porphyromonas gingivalis, Tannerella forsythia and Treponema denticola) and lung pathogens (Acinetobacter baumannii, Klebsiella pneumoniae, Pseudomonas aeruginosa and Streptococcus pneumoniae). Real-time polymerase chain reaction of 16S rDNA indicated lower levels of Pseudomonas aeruginosa and Porphyromonas gingivalis colonizing the dental plaques compared with the paired tracheal aspirate samples. These results support the hypothesis that dental bacteria may contribute to the pathology of severe AE-COPD. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Intrinsic flexibility of B-DNA: the experimental TRX scale.

PubMed

Heddi, Brahim; Oguey, Christophe; Lavelle, Christophe; Foloppe, Nicolas; Hartmann, Brigitte

2010-01-01

B-DNA flexibility, crucial for DNA-protein recognition, is sequence dependent. Free DNA in solution would in principle be the best reference state to uncover the relation between base sequences and their intrinsic flexibility; however, this has long been hampered by a lack of suitable experimental data. We investigated this relationship by compiling and analyzing a large dataset of NMR (31)P chemical shifts in solution. These measurements reflect the BI <--> BII equilibrium in DNA, intimately correlated to helicoidal descriptors of the curvature, winding and groove dimensions. Comparing the ten complementary DNA dinucleotide steps indicates that some steps are much more flexible than others. This malleability is primarily controlled at the dinucleotide level, modulated by the tetranucleotide environment. Our analyses provide an experimental scale called TRX that quantifies the intrinsic flexibility of the ten dinucleotide steps in terms of Twist, Roll, and X-disp (base pair displacement). Applying the TRX scale to DNA sequences optimized for nucleosome formation reveals a 10 base-pair periodic alternation of stiff and flexible regions. Thus, DNA flexibility captured by the TRX scale is relevant to nucleosome formation, suggesting that this scale may be of general interest to better understand protein-DNA recognition.
Nucleotide sequence of a complementary DNA encoding pea cytosolic copper/zinc superoxide dismutase. [Pisum sativum L

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, D.A.; Zilinskas, B.A.

1991-08-01

The authors now report the nucleotide sequence of the cytosolic Cu/Zn SOD cloned from a {lambda}gt11 cDNA library constructed from mRNA extracted from leaves of 7- to 10-d pea seedlings (Pisum sativum L.). The clone was isolated using a 22-base synthetic oligonucleotide complementary to the amino acid sequence CGIIGLQG. This sequence, found at the protein's carboxy terminus, is highly conserved among plant cytosolic Cu/Zn SODs but not chloroplastic Cu/Zn SODs. The 738-base pair sequence contains an open reading frame specifying 152 codons and a predicted M{sub r} of 18,024 D. The deduced amino acid sequence is highly homologous (79-82% identity)more » with the sequences of other known plant cytosolic Cu/Zn SODs but less highly conserved (63-65%) when compared with several chloroplastic Cu/Zn SODs including pea (10).« less
LookSeq: a browser-based viewer for deep sequencing data.

PubMed

Manske, Heinrich Magnus; Kwiatkowski, Dominic P

2009-11-01

Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.
SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

USGS Publications Warehouse

Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

2013-01-01

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.
Transcriptome analysis of Houttuynia cordata Thunb. by Illumina paired-end RNA sequencing and SSR marker discovery.

PubMed

Wei, Lin; Li, Shenghua; Liu, Shenggui; He, Anna; Wang, Dan; Wang, Jie; Tang, Yulian; Wu, Xianjin

2014-01-01

Houttuynia cordata Thunb. is an important traditional medical herb in China and other Asian countries, with high medicinal and economic value. However, a lack of available genomic information has become a limitation for research on this species. Thus, we carried out high-throughput transcriptomic sequencing of H. cordata to generate an enormous transcriptome sequence dataset for gene discovery and molecular marker development. Illumina paired-end sequencing technology produced over 56 million sequencing reads from H. cordata mRNA. Subsequent de novo assembly yielded 63,954 unigenes, 39,982 (62.52%) and 26,122 (40.84%) of which had significant similarity to proteins in the NCBI nonredundant protein and Swiss-Prot databases (E-value <10(-5)), respectively. Of these annotated unigenes, 30,131 and 15,363 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In addition, 24,434 (38.21%) unigenes were mapped onto 128 pathways using the KEGG pathway database and 17,964 (44.93%) unigenes showed homology to Vitis vinifera (Vitaceae) genes in BLASTx analysis. Furthermore, 4,800 cDNA SSRs were identified as potential molecular markers. Fifty primer pairs were randomly selected to detect polymorphism among 30 samples of H. cordata; 43 (86%) produced fragments of expected size, suggesting that the unigenes were suitable for specific primer design and of high quality, and the SSR marker could be widely used in marker-assisted selection and molecular breeding of H. cordata in the future. This is the first application of Illumina paired-end sequencing technology to investigate the whole transcriptome of H. cordata and to assemble RNA-seq reads without a reference genome. These data should help researchers investigating the evolution and biological processes of this species. The SSR markers developed can be used for construction of high-resolution genetic linkage maps and for gene-based association analyses in H. cordata. This work will enable future functional genomic research and research into the distinctive active constituents of this genus.

Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference.

PubMed

Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D

2004-10-01

Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.
Fluorescence Competition Assay Measurements of Free Energy Changes for RNA Pseudoknots†

PubMed Central

2009-01-01

RNA pseudoknots have important functions, and thermodynamic stability is a key to predicting pseudoknots in RNA sequences and to understanding their functions. Traditional methods, such as UV melting and differential scanning calorimetry, for measuring RNA thermodynamics are restricted to temperature ranges around the melting temperature for a pseudoknot. Here, we report RNA pseudoknot free energy changes at 37 °C measured by fluorescence competition assays. Sequence-dependent studies for the loop 1−stem 2 region reveal (1) the individual nearest-neighbor hydrogen bonding (INN-HB) model provides a reasonable estimate for the free energy change when a Watson−Crick base pair in stem 2 is changed, (2) the loop entropy can be estimated by a statistical polymer model, although some penalty for certain loop sequences is necessary, and (3) tertiary interactions can significantly stabilize pseudoknots and extending the length of stem 2 may alter tertiary interactions such that the INN-HB model does not predict the net effect of adding a base pair. The results can inform writing of algorithms for predicting and/or designing RNA secondary structures. PMID:19921809
The ChIP-exo Method: Identifying Protein-DNA Interactions with Near Base Pair Precision.

PubMed

Perreault, Andrea A; Venters, Bryan J

2016-12-23

Chromatin immunoprecipitation (ChIP) is an indispensable tool in the fields of epigenetics and gene regulation that isolates specific protein-DNA interactions. ChIP coupled to high throughput sequencing (ChIP-seq) is commonly used to determine the genomic location of proteins that interact with chromatin. However, ChIP-seq is hampered by relatively low mapping resolution of several hundred base pairs and high background signal. The ChIP-exo method is a refined version of ChIP-seq that substantially improves upon both resolution and noise. The key distinction of the ChIP-exo methodology is the incorporation of lambda exonuclease digestion in the library preparation workflow to effectively footprint the left and right 5' DNA borders of the protein-DNA crosslink site. The ChIP-exo libraries are then subjected to high throughput sequencing. The resulting data can be leveraged to provide unique and ultra-high resolution insights into the functional organization of the genome. Here, we describe the ChIP-exo method that we have optimized and streamlined for mammalian systems and next-generation sequencing-by-synthesis platform.
Heterogeneity to Homogeneity: Synthesis, Base Pairing, and Ligation Studies of 4',3'-XyluloNA/RNA and TNA/RNA Chimeric Sequences

NASA Astrophysics Data System (ADS)

Bhowmik, S.; Stoop, M.; Krishnamurthy, R.

2017-07-01

Based on the reality of "prebiotic clutter," we herein present an alternate model for pre-RNA to RNA transition, which starts, not with homogeneous-backbone system, but rather with mixtures of heterogeneous-backbone of chimeric "pre-RNA/RNA."
J Genes for Heavy Chain Immunoglobulins of Mouse

NASA Astrophysics Data System (ADS)

Newell, Nanette; Richards, Julia E.; Tucker, Philip W.; Blattner, Frederick R.

1980-09-01

A 15.8-kilobase pair fragment of BALB/c mouse liver DNA, cloned in the Charon 4Aλ phage vector system, was shown to contain the μ heavy chain constant region (CHμ ) gene for the mouse immunoglobulin M. In addition, this fragment of DNA contains at least two J genes, used to code for the carboxyl terminal portion of heavy chain variable regions. These genes are located in genomic DNA about eight kilobase pairs to the 5' side of the CHμ gene. The complete nucleotide sequence of a 1120-base pair stretch of DNA that includes the two J genes has been determined.
Genome Editing Tools in Plants

PubMed Central

Mohanta, Tapan Kumar; Bashir, Tufail; Hashem, Abeer; Bae, Hanhong

2017-01-01

Genome editing tools have the potential to change the genomic architecture of a genome at precise locations, with desired accuracy. These tools have been efficiently used for trait discovery and for the generation of plants with high crop yields and resistance to biotic and abiotic stresses. Due to complex genomic architecture, it is challenging to edit all of the genes/genomes using a particular genome editing tool. Therefore, to overcome this challenging task, several genome editing tools have been developed to facilitate efficient genome editing. Some of the major genome editing tools used to edit plant genomes are: Homologous recombination (HR), zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), pentatricopeptide repeat proteins (PPRs), the CRISPR/Cas9 system, RNA interference (RNAi), cisgenesis, and intragenesis. In addition, site-directed sequence editing and oligonucleotide-directed mutagenesis have the potential to edit the genome at the single-nucleotide level. Recently, adenine base editors (ABEs) have been developed to mutate A-T base pairs to G-C base pairs. ABEs use deoxyadeninedeaminase (TadA) with catalytically impaired Cas9 nickase to mutate A-T base pairs to G-C base pairs. PMID:29257124
Synthesis and monitored selection of nucleotide surrogates for binding T:A base pairs in homopurine-homopyrimidine DNA triple helices.

PubMed

Mokhir, A A; Connors, W H; Richert, C

2001-09-01

A total of 16 oligodeoxyribonucleotides of general sequence 5'-TCTTCTZTCTTTCT-3', where Z denotes an N-acyl-N-(2-hydroxyethyl)glycine residue, were prepared via solid phase synthesis. The ability of these oligonucleotides to form triplexes with the duplex 5'-AGAAGATAGAAAGA-HEG-TCTTTCTATCTTCT-3', where HEG is a hexaethylene glycol linker, was tested. In these triplexes, an 'interrupting' T:A base pair faces the Z residue in the third strand. Among the acyl moieties of Z tested, an anthraquinone carboxylic acid residue linked via a glycinyl group gave the most stable triplex, whose UV melting point was 8.4 degrees C higher than that of the triplex with 5'-TCTTCTGTCTTTCT-3' as the third strand. The results from exploratory nuclease selection experiments suggest that a combinatorial search for strands capable of recognizing mixed sequences by triple helix formation is feasible.
Stability of miRNA 5′terminal and seed regions is correlated with experimentally observed miRNA-mediated silencing efficacy

PubMed Central

Hibio, Naoki; Hino, Kimihiro; Shimizu, Eigo; Nagata, Yoshiro; Ui-Tei, Kumiko

2012-01-01

MicroRNAs (miRNAs) are key regulators of sequence-specific gene silencing. However, crucial factors that determine the efficacy of miRNA-mediated target gene silencing are poorly understood. Here we mathematized base-pairing stability and showed that miRNAs with an unstable 5′ terminal duplex and stable seed-target duplex exhibit strong silencing activity. The results are consistent with the previous findings that an RNA strand with unstable 5′ terminal in miRNA duplex easily loads onto the RNA-induced silencing complex (RISC), and miRNA recognizes target mRNAs with seed-complementary sequences to direct posttranscriptional repression. Our results suggested that both the unwinding and target recognition processes of miRNAs could be proficiently controlled by the thermodynamics of base-pairing in protein-free condition. Interestingly, such thermodynamic parameters might be evolutionarily well adapted to the body temperatures of various species. PMID:23251782
Improved Model for Predicting the Free Energy Contribution of Dinucleotide Bulges to RNA Duplex Stability.

PubMed

Tomcho, Jeremy C; Tillman, Magdalena R; Znosko, Brent M

2015-09-01

Predicting the secondary structure of RNA is an intermediate in predicting RNA three-dimensional structure. Commonly, determining RNA secondary structure from sequence uses free energy minimization and nearest neighbor parameters. Current algorithms utilize a sequence-independent model to predict free energy contributions of dinucleotide bulges. To determine if a sequence-dependent model would be more accurate, short RNA duplexes containing dinucleotide bulges with different sequences and nearest neighbor combinations were optically melted to derive thermodynamic parameters. These data suggested energy contributions of dinucleotide bulges were sequence-dependent, and a sequence-dependent model was derived. This model assigns free energy penalties based on the identity of nucleotides in the bulge (3.06 kcal/mol for two purines, 2.93 kcal/mol for two pyrimidines, 2.71 kcal/mol for 5'-purine-pyrimidine-3', and 2.41 kcal/mol for 5'-pyrimidine-purine-3'). The predictive model also includes a 0.45 kcal/mol penalty for an A-U pair adjacent to the bulge and a -0.28 kcal/mol bonus for a G-U pair adjacent to the bulge. The new sequence-dependent model results in predicted values within, on average, 0.17 kcal/mol of experimental values, a significant improvement over the sequence-independent model. This model and new experimental values can be incorporated into algorithms that predict RNA stability and secondary structure from sequence.
Identification and correction of systematic error in high-throughput sequence data

PubMed Central

2011-01-01

Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. PMID:22099972
Use of wavelet-packet transforms to develop an engineering model for multifractal characterization of mutation dynamics in pathological and nonpathological gene sequences

NASA Astrophysics Data System (ADS)

Walker, David Lee

1999-12-01

This study uses dynamical analysis to examine in a quantitative fashion the information coding mechanism in DNA sequences. This exceeds the simple dichotomy of either modeling the mechanism by comparing DNA sequence walks as Fractal Brownian Motion (fbm) processes. The 2-D mappings of the DNA sequences for this research are from Iterated Function System (IFS) (Also known as the ``Chaos Game Representation'' (CGR)) mappings of the DNA sequences. This technique converts a 1-D sequence into a 2-D representation that preserves subsequence structure and provides a visual representation. The second step of this analysis involves the application of Wavelet Packet Transforms, a recently developed technique from the field of signal processing. A multi-fractal model is built by using wavelet transforms to estimate the Hurst exponent, H. The Hurst exponent is a non-parametric measurement of the dynamism of a system. This procedure is used to evaluate gene- coding events in the DNA sequence of cystic fibrosis mutations. The H exponent is calculated for various mutation sites in this gene. The results of this study indicate the presence of anti-persistent, random walks and persistent ``sub-periods'' in the sequence. This indicates the hypothesis of a multi-fractal model of DNA information encoding warrants further consideration. This work examines the model's behavior in both pathological (mutations) and non-pathological (healthy) base pair sequences of the cystic fibrosis gene. These mutations both natural and synthetic were introduced by computer manipulation of the original base pair text files. The results show that disease severity and system ``information dynamics'' correlate. These results have implications for genetic engineering as well as in mathematical biology. They suggest that there is scope for more multi-fractal models to be developed.
Genetic dissection of the consensus sequence for the class 2 and class 3 flagellar promoters

PubMed Central

Wozniak, Christopher E.; Hughes, Kelly T.

2008-01-01

Summary Computational searches for DNA binding sites often utilize consensus sequences. These search models make assumptions that the frequency of a base pair in an alignment relates to the base pair’s importance in binding and presume that base pairs contribute independently to the overall interaction with the DNA binding protein. These two assumptions have generally been found to be accurate for DNA binding sites. However, these assumptions are often not satisfied for promoters, which are involved in additional steps in transcription initiation after RNA polymerase has bound to the DNA. To test these assumptions for the flagellar regulatory hierarchy, class 2 and class 3 flagellar promoters were randomly mutagenized in Salmonella. Important positions were then saturated for mutagenesis and compared to scores calculated from the consensus sequence. Double mutants were constructed to determine how mutations combined for each promoter type. Mutations in the binding site for FlhD4C2, the activator of class 2 promoters, better satisfied the assumptions for the binding model than did mutations in the class 3 promoter, which is recognized by the σ28 transcription factor. These in vivo results indicate that the activator sites within flagellar promoters can be modeled using simple assumptions but that the DNA sequences recognized by the flagellar sigma factor require more complex models. PMID:18486950
On the Power and the Systematic Biases of the Detection of Chromosomal Inversions by Paired-End Genome Sequencing

PubMed Central

Lucas Lledó, José Ignacio; Cáceres, Mario

2013-01-01

One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions —SVDetect, GRIAL, and VariationHunter—, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects. PMID:23637806
HIV drug resistance testing among patients failing second line antiretroviral therapy. Comparison of in-house and commercial sequencing.

PubMed

Chimukangara, Benjamin; Varyani, Bhavini; Shamu, Tinei; Mutsvangwa, Junior; Manasa, Justen; White, Elizabeth; Chimbetete, Cleophas; Luethy, Ruedi; Katzenstein, David

2017-05-01

HIV genotyping is often unavailable in low and middle-income countries due to infrastructure requirements and cost. We compared genotype resistance testing in patients with virologic failure, by amplification of HIV pol gene, followed by "in-house" sequencing and commercial sequencing. Remnant plasma samples from adults and children failing second-line ART were amplified and sequenced using in-house and commercial di-deoxysequencing, and analyzed in Harare, Zimbabwe and at Stanford, U.S.A, respectively. HIV drug resistance mutations were determined using the Stanford HIV drug resistance database. Twenty-six of 28 samples were amplified and 25 were successfully genotyped. Comparison of average percent nucleotide and amino acid identities between 23 pairs sequenced in both laboratories were 99.51 (±0.56) and 99.11 (±0.95), respectively. All pairs clustered together in phylogenetic analysis. Sequencing analysis identified 6/23 pairs with mutation discordances resulting in differences in phenotype, but these did not impact future regimens. The results demonstrate our ability to produce good quality drug resistance data in-house. Despite discordant mutations in some sequence pairs, the phenotypic predictions were not clinically significant. Copyright © 2016 Elsevier B.V. All rights reserved.
In silico cloning and B/T cell epitope prediction of triosephosphate isomerase from Echinococcus granulosus.

PubMed

Wang, Fen; Ye, Bin

2016-10-01

Cystic echinococcosis is a worldwide zoonosis caused by Echinococcus granulosus. Because the methods of diagnosis and treatment for cystic echinococcosis were limited, it is still necessary to screen target proteins for the development of new anti-hydatidosis vaccine. In this study, the triosephosphate isomerase gene of E. granulosus was in silico cloned. The B cell and T cell epitopes were predicted by bioinformatics methods. The cDNA sequence of EgTIM was composition of 1094 base pairs, with an open reading frame of 753 base pairs. The deduced amino acid sequences were composed of 250 amino acids. Five cross-reactive epitopes, locating on 21aa-35aa, 43aa-57aa, 94aa-107aa, 115-129aa, and 164aa-183aa, could be expected to serve as candidate epitopes in the development of vaccine against E. granulosus. These results could provide bases for gene cloning, recombinant expression, and the designation of anti-hydatidosis vaccine.
SlideSort: all pairs similarity search for short reads

PubMed Central

Shimizu, Kana; Tsuda, Koji

2011-01-01

Motivation: Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other important analyses. Results: In this study, we designed and implemented an exact algorithm SlideSort that finds all similar pairs from a string pool in terms of edit distance. Using an efficient pattern growth algorithm, SlideSort discovers chains of common k-mers to narrow down the search. Compared to existing methods based on single k-mers, our method is more effective in reducing the number of edit distance calculations. In comparison to backtracking methods such as BWA, our method is much faster in finding remote matches, scaling easily to tens of millions of sequences. Our software has an additional function of single link clustering, which is useful in summarizing short reads for further processing. Availability: Executable binary files and C++ libraries are available at http://www.cbrc.jp/~shimizu/slidesort/ for Linux and Windows. Contact: slidesort@m.aist.go.jp; shimizu-kana@aist.go.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21148542
Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs.

PubMed

Saunders, Christopher T; Wong, Wendy S W; Swamy, Sajani; Becq, Jennifer; Murray, Lisa J; Cheetham, R Keira

2012-07-15

Whole genome and exome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. The consequent increased demand for somatic variant analysis of paired samples requires methods specialized to model this problem so as to sensitively call variants at any practical level of tumor impurity. We describe Strelka, a method for somatic SNV and small indel detection from sequencing data of matched tumor-normal samples. The method uses a novel Bayesian approach which represents continuous allele frequencies for both tumor and normal samples, while leveraging the expected genotype structure of the normal. This is achieved by representing the normal sample as a mixture of germline variation with noise, and representing the tumor sample as a mixture of the normal sample with somatic variation. A natural consequence of the model structure is that sensitivity can be maintained at high tumor impurity without requiring purity estimates. We demonstrate that the method has superior accuracy and sensitivity on impure samples compared with approaches based on either diploid genotype likelihoods or general allele-frequency tests. The Strelka workflow source code is available at ftp://strelka@ftp.illumina.com/. csaunders@illumina.com
A single-molecule sequencing assay for the comprehensive profiling of T4 DNA ligase fidelity and bias during DNA end-joining.

PubMed

Potapov, Vladimir; Ong, Jennifer L; Langhorst, Bradley W; Bilotti, Katharina; Cahoon, Dan; Canton, Barry; Knight, Thomas F; Evans, Thomas C; Lohman, Gregory Js

2018-05-08

DNA ligases are key enzymes in molecular and synthetic biology that catalyze the joining of breaks in duplex DNA and the end-joining of DNA fragments. Ligation fidelity (discrimination against the ligation of substrates containing mismatched base pairs) and bias (preferential ligation of particular sequences over others) have been well-studied in the context of nick ligation. However, almost no data exist for fidelity and bias in end-joining ligation contexts. In this study, we applied Pacific Biosciences Single-Molecule Real-Time sequencing technology to directly sequence the products of a highly multiplexed ligation reaction. This method has been used to profile the ligation of all three-base 5'-overhangs by T4 DNA ligase under typical ligation conditions in a single experiment. We report the relative frequency of all ligation products with or without mismatches, the position-dependent frequency of each mismatch, and the surprising observation that 5'-TNA overhangs ligate extremely inefficiently compared to all other Watson-Crick pairings. The method can easily be extended to profile other ligases, end-types (e.g. blunt ends and overhangs of different lengths), and the effect of adjacent sequence on the ligation results. Further, the method has the potential to provide new insights into the thermodynamics of annealing and the kinetics of end-joining reactions.
A Coalescent-Based Estimator of Admixture From DNA Sequences

PubMed Central

Wang, Jinliang

2006-01-01

A variety of estimators have been developed to use genetic marker information in inferring the admixture proportions (parental contributions) of a hybrid population. The majority of these estimators used allele frequency data, ignored molecular information that is available in markers such as microsatellites and DNA sequences, and assumed that mutations are absent since the admixture event. As a result, these estimators may fail to deliver an estimate or give rather poor estimates when admixture is ancient and thus mutations are not negligible. A previous molecular estimator based its inference of admixture proportions on the average coalescent times between pairs of genes taken from within and between populations. In this article I propose an estimator that considers the entire genealogy of all of the sampled genes and infers admixture proportions from the numbers of segregating sites in DNA sequence samples. By considering the genealogy of all sequences rather than pairs of sequences, this new estimator also allows the joint estimation of other interesting parameters in the admixture model, such as admixture time, divergence time, population size, and mutation rate. Comparative analyses of simulated data indicate that the new coalescent estimator generally yields better estimates of admixture proportions than the previous molecular estimator, especially when the parental populations are not highly differentiated. It also gives reasonably accurate estimates of other admixture parameters. A human mtDNA sequence data set was analyzed to demonstrate the method, and the analysis results are discussed and compared with those from previous studies. PMID:16624918
Genome sequence of the olive tree, Olea europaea.

PubMed

Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

2016-06-27

The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

Ultrahigh-resolution Fourier transform ion cyclotron resonance mass spectrometry and tandem mass spectrometry for peptide de novo amino acid sequencing for a seven-protein mixture by paired single-residue transposed Lys-N and Lys-C digestion.

PubMed

Guan, Xiaoyan; Brownstein, Naomi C; Young, Nicolas L; Marshall, Alan G

2017-01-30

Bottom-up tandem mass spectrometry (MS/MS) is regularly used in proteomics to identify proteins from a sequence database. De novo sequencing is also available for sequencing peptides with relatively short sequence lengths. We recently showed that paired Lys-C and Lys-N proteases produce peptides of identical mass and similar retention time, but different tandem mass spectra. Such parallel experiments provide complementary information, and allow for up to 100% MS/MS sequence coverage. Here, we report digestion by paired Lys-C and Lys-N proteases of a seven-protein mixture: human hemoglobin alpha, bovine carbonic anhydrase 2, horse skeletal muscle myoglobin, hen egg white lysozyme, bovine pancreatic ribonuclease, bovine rhodanese, and bovine serum albumin, followed by reversed-phase nanoflow liquid chromatography, collision-induced dissociation, and 14.5 T Fourier transform ion cyclotron resonance mass spectrometry. Matched pairs of product peptide ions of equal precursor mass and similar retention times from each digestion are compared, leveraging single-residue transposed information with independent interferences to confidently identify fragment ion types, residues, and peptides. Selected pairs of product ion mass spectra for de novo sequenced protein segments from each member of the mixture are presented. Pairs of the transposed product ions as well as complementary information from the parallel experiments allow for both high MS/MS coverage for long peptide sequences and high confidence in the amino acid identification. Moreover, the parallel experiments in the de novo sequencing reduce false-positive matches of product ions from the single-residue transposed peptides from the same segment, and thereby further improve the confidence in protein identification. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

PubMed Central

Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

2016-01-01

Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153
Inherited Creutzfeldt-Jakob disease in a British family associated with a novel 144 base pair insertion of the prion protein gene.

PubMed Central

Nicholl, D; Windl, O; de Silva, R; Sawcer, S; Dempster, M; Ironside, J W; Estibeiro, J P; Yuill, G M; Lathe, R; Will, R G

1995-01-01

A case of familial Creutzfeldt-Jakob disease associated with a 144 base pair insertion in the open reading frame of the prion protein gene is described. Sequencing of the mutated allele showed an arrangement of six octapeptide repeats, distinct from that of a recently described British family with an insertion of similar size. Thirteen years previously the brother of the proband had died from "Huntington's disease", but re-examination of his neuropathology revealed spongiform encephalopathy and anti-prion protein immunocytochemistry gave a positive result. The independent evolution of at least two distinct pathological 144 base pair insertions in Britain is proposed. The importance of maintaining a high index of suspicion of inherited Creutzfeldt-Jakob disease in cases of familial neurodegenerative disease is stressed. Images PMID:7823070
Unlocking Short Read Sequencing for Metagenomics

DOE PAGES

Rodrigue, Sébastien; Materna, Arne C.; Timberlake, Sonia C.; ...

2010-07-28

We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.
Comprehensive thermodynamic analysis of 3′ double-nucleotide overhangs neighboring Watson–Crick terminal base pairs

PubMed Central

O'Toole, Amanda S.; Miller, Stacy; Haines, Nathan; Zink, M. Coleen; Serra, Martin J.

2006-01-01

Thermodynamic parameters are reported for duplex formation of 48 self-complementary RNA duplexes containing Watson–Crick terminal base pairs (GC, AU and UA) with all 16 possible 3′ double-nucleotide overhangs; mimicking the structures of short interfering RNAs (siRNA) and microRNAs (miRNA). Based on nearest-neighbor analysis, the addition of a second dangling nucleotide to a single 3′ dangling nucleotide increases stability of duplex formation up to 0.8 kcal/mol in a sequence dependent manner. Results from this study in conjunction with data from a previous study [A. S. O'Toole, S. Miller and M. J. Serra (2005) RNA, 11, 512.] allows for the development of a refined nearest-neighbor model to predict the influence of 3′ double-nucleotide overhangs on the stability of duplex formation. The model improves the prediction of free energy and melting temperature when tested against five oligomers with various core duplex sequences. Phylogenetic analysis of naturally occurring miRNAs was performed to support our results. Selection of the effector miR strand of the mature miRNA duplex appears to be dependent upon the identity of the 3′ double-nucleotide overhang. Thermodynamic parameters for 3′ single terminal overhangs adjacent to a UA pair are also presented. PMID:16820533
cgDNAweb: a web interface to the cgDNA sequence-dependent coarse-grain model of double-stranded DNA.

PubMed

De Bruin, Lennart; Maddocks, John H

2018-06-14

The sequence-dependent statistical mechanical properties of fragments of double-stranded DNA is believed to be pertinent to its biological function at length scales from a few base pairs (or bp) to a few hundreds of bp, e.g. indirect read-out protein binding sites, nucleosome positioning sequences, phased A-tracts, etc. In turn, the equilibrium statistical mechanics behaviour of DNA depends upon its ground state configuration, or minimum free energy shape, as well as on its fluctuations as governed by its stiffness (in an appropriate sense). We here present cgDNAweb, which provides browser-based interactive visualization of the sequence-dependent ground states of double-stranded DNA molecules, as predicted by the underlying cgDNA coarse-grain rigid-base model of fragments with arbitrary sequence. The cgDNAweb interface is specifically designed to facilitate comparison between ground state shapes of different sequences. The server is freely available at cgDNAweb.epfl.ch with no login requirement.
IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments.

PubMed

Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Naeem; Naveed, Nasir; Ahmad, Sarfraz; Muhammad, Shah; Qadri, Salman; Shahid, Muhammad; Hussain, Tanveer; Javed, Maryam

2015-01-01

IVisTMSA is a software package of seven graphical tools for multiple sequence alignments. MSApad is an editing and analysis tool. It can load 409% more data than Jalview, STRAP, CINEMA, and Base-by-Base. MSA comparator allows the user to visualize consistent and inconsistent regions of reference and test alignments of more than 21-MB size in less than 12 seconds. MSA comparator is 5,200% efficient and more than 40% efficient as compared to BALiBASE c program and FastSP, respectively. MSA reconstruction tool provides graphical user interfaces for four popular aligners and allows the user to load several sequence files at a time. FASTA generator converts seven formats of alignments of unlimited size into FASTA format in a few seconds. MSA ID calculator calculates identity matrix of more than 11,000 sequences with a sequence length of 2,696 base pairs in less than 100 seconds. Tree and Distance Matrix calculation tools generate phylogenetic tree and distance matrix, respectively, using neighbor joining% identity and BLOSUM 62 matrix.
Sequence specificity, statistical potentials, and three-dimensional structure prediction with self-correcting distance geometry calculations of beta-sheet formation in proteins.

PubMed Central

Zhu, H.; Braun, W.

1999-01-01

A statistical analysis of a representative data set of 169 known protein structures was used to analyze the specificity of residue interactions between spatial neighboring strands in beta-sheets. Pairwise potentials were derived from the frequency of residue pairs in nearest contact, second nearest and third nearest contacts across neighboring beta-strands compared to the expected frequency of residue pairs in a random model. A pseudo-energy function based on these statistical pairwise potentials recognized native beta-sheets among possible alternative pairings. The native pairing was found within the three lowest energies in 73% of the cases in the training data set and in 63% of beta-sheets in a test data set of 67 proteins, which were not part of the training set. The energy function was also used to detect tripeptides, which occur frequently in beta-sheets of native proteins. The majority of native partners of tripeptides were distributed in a low energy range. Self-correcting distance geometry (SECODG) calculations using distance constraints sets derived from possible low energy pairing of beta-strands uniquely identified the native pairing of the beta-sheet in pancreatic trypsin inhibitor (BPTI). These results will be useful for predicting the structure of proteins from their amino acid sequence as well as for the design of proteins containing beta-sheets. PMID:10048326
Simultaneous master-slave Omega pairs. [navigation system featuring low cost receiver

NASA Technical Reports Server (NTRS)

Burhans, R. W.

1974-01-01

Master-slave sequence ordering of the Omega system is suggested as a method of improving the pair geometry for low-cost receiver user benefit. The sequence change will not affect present sophisticated processor users other than require new labels for some pair combinations, but may require worldwide transmitter operators to slightly alter their long-range synchronizing techniques.
In vitro excision of adeno-associated virus DNA from recombinant plasmids: Isolation of an enzyme fraction from HeLa cells that cleaves DNA at poly(G) sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gottlieb, J.; Muzyczka, N.

1988-06-01

When circular recombinant plasmids containing adeno-associated virus (AAV) DNA sequences are transfected into human cells, the AAV provirus is rescued. Using these circular AAV plasmids as substrates, the authors isolated an enzyme fraction from HeLa cell nuclear extracts that excises intact AAV DNA in vitro from vector DNA and produces linear DNA products. The recognition signal for the enzyme is a polypurine-polypyrimidine sequence which is at least 9 residues long and rich in G . C base pairs. Such sequences are present in AAV recombinant plasmids as part of the first 15 base pairs of the AAV terminal repeat andmore » in some cases as the result of cloning the AAV genome by G . C tailing. The isolated enzyme fraction does not have significant endonucleolytic activity on single-stranded or double-stranded DNA. Plasmid DNA that is transfected into tissue culture cells is cleaved in vivo to produce a pattern of DNA fragments similar to that seen with purified enzyme in vitro. The activity has been called endo R for rescue, and its behavior suggests that it may have a role in recombination of cellular chromosomes.« less
bpRNA: large-scale automated annotation and analysis of RNA secondary structure.

PubMed

Danaee, Padideh; Rouches, Mason; Wiley, Michelle; Deng, Dezhong; Huang, Liang; Hendrix, David

2018-05-09

While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.
Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution.

PubMed

Mao, Wenzhi; Kaya, Cihan; Dutta, Anindita; Horovitz, Amnon; Bahar, Ivet

2015-06-15

With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings. Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources. Software is freely available through the Evol component of ProDy API. © The Author 2015. Published by Oxford University Press.
SMM-system: A mining tool to identify specific markers in Salmonella enterica.

PubMed

Yu, Shuijing; Liu, Weibing; Shi, Chunlei; Wang, Dapeng; Dan, Xianlong; Li, Xiao; Shi, Xianming

2011-03-01

This report presents SMM-system, a software package that implements various personalized pre- and post-BLASTN tasks for mining specific markers of microbial pathogens. The main functionalities of SMM-system are summarized as follows: (i) converting multi-FASTA file, (ii) cutting interesting genomic sequence, (iii) automatic high-throughput BLASTN searches, and (iv) screening target sequences. The utility of SMM-system was demonstrated by using it to identify 214 Salmonella enterica-specific protein-coding sequences (CDSs). Eighteen primer pairs were designed based on eighteen S. enterica-specific CDSs, respectively. Seven of these primer pairs were validated with PCR assay, which showed 100% inclusivity for the 101 S. enterica genomes and 100% exclusivity of 30 non-S. enterica genomes. Three specific primer pairs were chosen to develop a multiplex PCR assay, which generated specific amplicons with a size of 180bp (SC1286), 238bp (SC1598) and 405bp (SC4361), respectively. This study demonstrates that SMM-system is a high-throughput specific marker generation tool that can be used to identify genus-, species-, serogroup- and even serovar-specific DNA sequences of microbial pathogens, which has a potential to be applied in food industries, diagnostics and taxonomic studies. SMM-system is freely available and can be downloaded from http://foodsafety.sjtu.edu.cn/SMM-system.html. Copyright © 2011 Elsevier B.V. All rights reserved.
Amplification of the Gp41 gene for detection of mutations conferring resistance to HIV-1 fusion inhibitors on genotypic assay

NASA Astrophysics Data System (ADS)

Tanumihardja, J.; Bela, B.

2017-08-01

Fusion inhibitors have potential for future use in HIV control programs in Indonesia, so the capacity to test resistance to such drugs needs to be developed. Resistance-detection with a genotypic assay began with amplification of the target gene, gp41. Based on the sequence of the two most common HIV subtypes in Indonesia, AE and B, a primer pair was designed. Plasma samples containing both subtypes were extracted to obtain HIV RNA. Using PCR, the primer pair was used to produce the amplification product, the identity of which was checked based on length under electrophoresis. Eleven plasma samples were included in this study. One-step PCR using the primer pair was able to amplify gp41 from 54.5% of the samples, and an unspecific amplification product was seen in 1.1% of the samples. Amplification failed in 36.4% of the samples, which may be due to an inappropriate primer sequence. It was also found that the optimal annealing temperature for producing the single expected band was 57.2 °C. With one-step PCR, the designed primer pair amplified the HIV-1 gp41 gene from subtypes AE and B. However, further research should be done to determine the conditions that will increase the sensitivity and specificity of the amplification process.
Size-based molecular diagnostics using plasma DNA for noninvasive prenatal testing.

PubMed

Yu, Stephanie C Y; Chan, K C Allen; Zheng, Yama W L; Jiang, Peiyong; Liao, Gary J W; Sun, Hao; Akolekar, Ranjit; Leung, Tak Y; Go, Attie T J I; van Vugt, John M G; Minekawa, Ryoko; Oudejans, Cees B M; Nicolaides, Kypros H; Chiu, Rossa W K; Lo, Y M Dennis

2014-06-10

Noninvasive prenatal testing using fetal DNA in maternal plasma is an actively researched area. The current generation of tests using massively parallel sequencing is based on counting plasma DNA sequences originating from different genomic regions. In this study, we explored a different approach that is based on the use of DNA fragment size as a diagnostic parameter. This approach is dependent on the fact that circulating fetal DNA molecules are generally shorter than the corresponding maternal DNA molecules. First, we performed plasma DNA size analysis using paired-end massively parallel sequencing and microchip-based capillary electrophoresis. We demonstrated that the fetal DNA fraction in maternal plasma could be deduced from the overall size distribution of maternal plasma DNA. The fetal DNA fraction is a critical parameter affecting the accuracy of noninvasive prenatal testing using maternal plasma DNA. Second, we showed that fetal chromosomal aneuploidy could be detected by observing an aberrant proportion of short fragments from an aneuploid chromosome in the paired-end sequencing data. Using this approach, we detected fetal trisomy 21 and trisomy 18 with 100% sensitivity (T21: 36/36; T18: 27/27) and 100% specificity (non-T21: 88/88; non-T18: 97/97). For trisomy 13, the sensitivity and specificity were 95.2% (20/21) and 99% (102/103), respectively. For monosomy X, the sensitivity and specificity were both 100% (10/10 and 8/8). Thus, this study establishes the principle of size-based molecular diagnostics using plasma DNA. This approach has potential applications beyond noninvasive prenatal testing to areas such as oncology and transplantation monitoring.
Size-based molecular diagnostics using plasma DNA for noninvasive prenatal testing

PubMed Central

Yu, Stephanie C. Y.; Chan, K. C. Allen; Zheng, Yama W. L.; Jiang, Peiyong; Liao, Gary J. W.; Sun, Hao; Akolekar, Ranjit; Leung, Tak Y.; Go, Attie T. J. I.; van Vugt, John M. G.; Minekawa, Ryoko; Oudejans, Cees B. M.; Nicolaides, Kypros H.; Chiu, Rossa W. K.; Lo, Y. M. Dennis

2014-01-01

Noninvasive prenatal testing using fetal DNA in maternal plasma is an actively researched area. The current generation of tests using massively parallel sequencing is based on counting plasma DNA sequences originating from different genomic regions. In this study, we explored a different approach that is based on the use of DNA fragment size as a diagnostic parameter. This approach is dependent on the fact that circulating fetal DNA molecules are generally shorter than the corresponding maternal DNA molecules. First, we performed plasma DNA size analysis using paired-end massively parallel sequencing and microchip-based capillary electrophoresis. We demonstrated that the fetal DNA fraction in maternal plasma could be deduced from the overall size distribution of maternal plasma DNA. The fetal DNA fraction is a critical parameter affecting the accuracy of noninvasive prenatal testing using maternal plasma DNA. Second, we showed that fetal chromosomal aneuploidy could be detected by observing an aberrant proportion of short fragments from an aneuploid chromosome in the paired-end sequencing data. Using this approach, we detected fetal trisomy 21 and trisomy 18 with 100% sensitivity (T21: 36/36; T18: 27/27) and 100% specificity (non-T21: 88/88; non-T18: 97/97). For trisomy 13, the sensitivity and specificity were 95.2% (20/21) and 99% (102/103), respectively. For monosomy X, the sensitivity and specificity were both 100% (10/10 and 8/8). Thus, this study establishes the principle of size-based molecular diagnostics using plasma DNA. This approach has potential applications beyond noninvasive prenatal testing to areas such as oncology and transplantation monitoring. PMID:24843150
Formation and Repair of Mismatches Containing Ribonucleotides and Oxidized Bases at Repeated DNA Sequences*

PubMed Central

Cilli, Piera; Minoprio, Anna; Bossa, Cecilia; Bignami, Margherita; Mazzei, Filomena

2015-01-01

The cellular pool of ribonucleotide triphosphates (rNTPs) is higher than that of deoxyribonucleotide triphosphates. To ensure genome stability, DNA polymerases must discriminate against rNTPs and incorporated ribonucleotides must be removed by ribonucleotide excision repair (RER). We investigated DNA polymerase β (POL β) capacity to incorporate ribonucleotides into trinucleotide repeated DNA sequences and the efficiency of base excision repair (BER) and RER enzymes (OGG1, MUTYH, and RNase H2) when presented with an incorrect sugar and an oxidized base. POL β incorporated rAMP and rCMP opposite 7,8-dihydro-8-oxoguanine (8-oxodG) and extended both mispairs. In addition, POL β was able to insert and elongate an oxidized rGMP when paired with dA. We show that RNase H2 always preserves the capacity to remove a single ribonucleotide when paired to an oxidized base or to incise an oxidized ribonucleotide in a DNA duplex. In contrast, BER activity is affected by the presence of a ribonucleotide opposite an 8-oxodG. In particular, MUTYH activity on 8-oxodG:rA mispairs is fully inhibited, although its binding capacity is retained. This results in the reduction of RNase H2 incision capability of this substrate. Thus complex mispairs formed by an oxidized base and a ribonucleotide can compromise BER and RER in repeated sequences. PMID:26338705
Vfold: a web server for RNA structure and folding thermodynamics prediction.

PubMed

Xu, Xiaojun; Zhao, Peinan; Chen, Shi-Jie

2014-01-01

The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA functions in human health and our ability to design RNA-based therapeutic strategies. The Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, (b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics (heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimization. The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The web server and the source codes are freely accessible for public use at "http://rna.physics.missouri.edu".
Whole exome sequencing to estimate alloreactivity potential between donors and recipients in stem cell transplantation

PubMed Central

Sampson, Juliana K.; Sheth, Nihar U.; Koparde, Vishal N.; Scalora, Allison F.; Serrano, Myrna G.; Lee, Vladimir; Roberts, Catherine H.; Jameson-Lee, Max; Ferreira-Gonzalez, Andrea; Manjili, Masoud H.; Buck, Gregory A.; Neale, Michael C.; Toor, Amir A.

2016-01-01

Summary Whole exome sequencing (WES) was performed on stem cell transplant donor-recipient (D-R) pairs to determine the extent of potential antigenic variation at a molecular level. In a small cohort of D-R pairs, a high frequency of sequence variation was observed between the donor and recipient exomes independent of human leucocyte antigen (HLA) matching. Nonsynonymous, nonconservative single nucleotide polymorphisms were approximately twice as frequent in HLA-matched unrelated, compared with related D-R pairs. When mapped to individual chromosomes, these polymorphic nucleotides were uniformly distributed across the entire exome. In conclusion, WES reveals extensive nucleotide sequence variation in the exomes of HLA-matched donors and recipients. PMID:24749631
Single-Molecule Electrical Random Resequencing of DNA and RNA

NASA Astrophysics Data System (ADS)

Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji

2012-07-01

Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.

In trans paired nicking triggers seamless genome editing without double-stranded DNA cutting.

PubMed

Chen, Xiaoyu; Janssen, Josephine M; Liu, Jin; Maggio, Ignazio; 't Jong, Anke E J; Mikkers, Harald M M; Gonçalves, Manuel A F V

2017-09-22

Precise genome editing involves homologous recombination between donor DNA and chromosomal sequences subjected to double-stranded DNA breaks made by programmable nucleases. Ideally, genome editing should be efficient, specific, and accurate. However, besides constituting potential translocation-initiating lesions, double-stranded DNA breaks (targeted or otherwise) are mostly repaired through unpredictable and mutagenic non-homologous recombination processes. Here, we report that the coordinated formation of paired single-stranded DNA breaks, or nicks, at donor plasmids and chromosomal target sites by RNA-guided nucleases based on CRISPR-Cas9 components, triggers seamless homology-directed gene targeting of large genetic payloads in human cells, including pluripotent stem cells. Importantly, in addition to significantly reducing the mutagenicity of the genome modification procedure, this in trans paired nicking strategy achieves multiplexed, single-step, gene targeting, and yields higher frequencies of accurately edited cells when compared to the standard double-stranded DNA break-dependent approach.CRISPR-Cas9-based gene editing involves double-strand breaks at target sequences, which are often repaired by mutagenic non-homologous end-joining. Here the authors use Cas9 nickases to generate coordinated single-strand breaks in donor and target DNA for precise homology-directed gene editing.
Dissociation of single-strand DNA: single-walled carbon nanotube hybrids by Watson-Crick base-pairing.

PubMed

Jung, Seungwon; Cha, Misun; Park, Jiyong; Jeong, Namjo; Kim, Gunn; Park, Changwon; Ihm, Jisoon; Lee, Junghoon

2010-08-18

It has been known that single-strand DNA wraps around a single-walled carbon nanotube (SWNT) by pi-stacking. In this paper it is demonstrated that such DNA is dissociated from the SWNT by Watson-Crick base-pairing with a complementary sequence. Measurement of field effect transistor characteristics indicates a shift of the electrical properties as a result of this "unwrapping" event. We further confirm the suggested process through Raman spectroscopy and gel electrophoresis. Experimental results are verified in view of atomistic mechanisms with molecular dynamics simulations and binding energy analyses.
Virial Coefficients for the Liquid Argon

NASA Astrophysics Data System (ADS)

Korth, Micheal; Kim, Saesun

2014-03-01

We begin with a geometric model of hard colliding spheres and calculate probability densities in an iterative sequence of calculations that lead to the pair correlation function. The model is based on a kinetic theory approach developed by Shinomoto, to which we added an interatomic potential for argon based on the model from Aziz. From values of the pair correlation function at various values of density, we were able to find viral coefficients of liquid argon. The low order coefficients are in good agreement with theoretical hard sphere coefficients, but appropriate data for argon to which these results might be compared is difficult to find.
Unconventional P-35S sequence identified in genetically modified maize

PubMed Central

Al-Hmoud, Nisreen; Al-Husseini, Nawar; Ibrahim-Alobaide, Mohammed A; Kübler, Eric; Farfoura, Mahmoud; Alobydi, Hytham; Al-Rousan, Hiyam

2014-01-01

The Cauliflower Mosaic Virus 35S promoter sequence, CaMV P-35S, is one of several commonly used genetic targets to detect genetically modified maize and is found in most GMOs. In this research we report the finding of an alternative P-35S sequence and its incidence in GM maize marketed in Jordan. The primer pair normally used to amplify a 123 bp DNA fragment of the CaMV P-35S promoter in GMOs also amplified a previously undetected alternative sequence of CaMV P-35S in GM maize samples which we term V3. The amplified V3 sequence comprises 386 base pairs and was not found in the standard wild-type maize, MON810 and MON 863 GM maize. The identified GM maize samples carrying the V3 sequence were found free of CaMV when compared with CaMV infected brown mustard sample. The data of sequence alignment analysis of the V3 genetic element showed 90% similarity with the matching P-35S sequence of the cauliflower mosaic virus isolate CabbB-JI and 99% similarity with matching P-35S sequences found in several binary plant vectors, of which the binary vector locus JQ693018 is one example. The current study showed an increase of 44% in the incidence of the identified 386 bp sequence in GM maize sold in Jordan’s markets during the period 2009 and 2012. PMID:24495911
Genetic analysis of human immunodeficiency virus type 1 envelope V3 region isolates from mothers and infants after perinatal transmission.

PubMed Central

Ahmad, N; Baroudy, B M; Baker, R C; Chappey, C

1995-01-01

The human immunodeficiency virus type 1 (HIV-1) sequences from variable region 3 (V3) of the envelope gene were analyzed from seven infected mother-infant pairs following perinatal transmission. The V3 region sequences directly derived from the DNA of the uncultured peripheral blood mononuclear cells from infected mothers displayed a heterogeneous population. In contrast, the infants' sequences were less diverse than those of their mothers. In addition, the sequences from the younger infants' peripheral blood mononuclear cell DNA were more homogeneous than the older infants' sequences. All infants' sequences were different but displayed patterns similar to those seen in their mothers. In the mother-infant pair sequences analyzed, a minor genotype or subtype found in the mothers predominated in their infants. The conserved N-linked glycosylation site proximal to the first cysteine of the V3 loop was absent only in one infant's sequence set and in some variants of two other infants' sequences. Furthermore, the HIV-1 sequences of the epidemiologically linked mother-infant pairs were closer than the sequences of epidemiologically unlinked individuals, suggesting that the sequence comparison of mother-infant pairs done in order to identify genetic variants transmitted from mother to infant could be performed even in older infants. There was no evidence for transmission of a major genotype or multiple genotypes from mother to infant. In conclusion, a minor genotype of maternal virus is transmitted to the infants, and this finding could be useful in developing strategies to prevent maternal transmission of HIV-1 by means of perinatal interventions. PMID:7815476
Exploiting rice-sorghum synteny for targeted development of EST-SSRs to enrich the sorghum genetic linkage map.

PubMed

Ramu, P; Kassahun, B; Senthilvel, S; Ashok Kumar, C; Jayashree, B; Folkertsma, R T; Reddy, L Ananda; Kuruvinashetti, M S; Haussmann, B I G; Hash, C T

2009-11-01

The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 x E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice-sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.
An inversion of 25 base pairs causes feline GM2 gangliosidosis variant.

PubMed

Martin, Douglas R; Krum, Barbara K; Varadarajan, G S; Hathcock, Terri L; Smith, Bruce F; Baker, Henry J

2004-05-01

In G(M2) gangliosidosis variant 0, a defect in the beta-subunit of lysosomal beta-N-acetylhexosaminidase (EC 3.2.1.52) causes abnormal accumulation of G(M2) ganglioside and severe neurodegeneration. Distinct feline models of G(M2) gangliosidosis variant 0 have been described in both domestic shorthair and Korat cats. In this study, we determined that the causative mutation of G(M2) gangliosidosis in the domestic shorthair cat is a 25-base-pair inversion at the extreme 3' end of the beta-subunit (HEXB) coding sequence, which introduces three amino acid substitutions at the carboxyl terminus of the protein and a translational stop that is eight amino acids premature. Cats homozygous for the 25-base-pair inversion express levels of beta-subunit mRNA approximately 190% of normal and protein levels only 10-20% of normal. Because the 25-base-pair inversion is similar to mutations in the terminal exon of human HEXB, the domestic shorthair cat should serve as an appropriate model to study the molecular pathogenesis of human G(M2) gangliosidosis variant 0 (Sandhoff disease).
Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis

PubMed Central

Gao, Shen; Yao, Bei; Lu, Zuhong

2015-01-01

Background The chimeric sequences produced by phi29 DNA polymerase, which are named as chimeras, influence the performance of the multiple displacement amplification (MDA) and also increase the difficulty of sequence data process. Despite several articles have reported the existence of chimeric sequence, there was only one research focusing on the structure and generation mechanism of chimeras, and it was merely based on hundreds of chimeras found in the sequence data of E. coli genome. Method We finished data mining towards a series of Next Generation Sequencing (NGS) reads which were used for whole genome haplotype assembling in a primary study. We established a bioinformatics pipeline based on subsection alignment strategy to discover all the chimeras inside and achieve their structural visualization. Then, we artificially defined two statistical indexes (the chimeric distance and the overlap length), and their regular abundance distribution helped illustrate of the structural characteristics of the chimeras. Finally we analyzed the relationship between the chimera type and the average insertion size, so that illustrate a method to decrease the proportion of wasted data in the procedure of DNA library construction. Results/Conclusion 131.4 Gb pair-end (PE) sequence data was reanalyzed for the chimeras. Totally, 40,259,438 read pairs (6.19%) with chimerism were discovered among 650,430,811 read pairs. The chimeric sequences are consisted of two or more parts which locate inconsecutively but adjacently on the chromosome. The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt. The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt. Moreover, unmapped chimeras, which were classified as the wasted data, could be reduced by properly increasing the length of the insertion segment size through a linear correlation analysis. Significance This study exhibited the profile of the phi29MDA chimeras by tens of millions of chimeric sequences, and helped understand the amplification mechanism of the phi29 DNA polymerase. Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information. PMID:26440104
Control of box C/D snoRNP assembly by N6-methylation of adenine.

PubMed

Huang, Lin; Ashraf, Saira; Wang, Jia; Lilley, David Mj

2017-09-01

N 6 -methyladenine is the most widespread mRNA modification. A subset of human box C/D snoRNA species have target GAC sequences that lead to formation of N 6 -methyladenine at a key trans Hoogsteen-sugar A·G base pair, of which half are methylated in vivo The GAC target is conserved only in those that are methylated. Methylation prevents binding of the 15.5-kDa protein and the induced folding of the RNA Thus, the assembly of the box C/D snoRNP could in principle be regulated by RNA methylation at its critical first stage. Crystallography reveals that N 6 -methylation of adenine prevents the formation of trans Hoogsteen-sugar A·G base pairs, explaining why the box C/D RNA cannot adopt its kinked conformation. More generally, our data indicate that sheared A·G base pairs (but not Watson-Crick base pairs) are more susceptible to disruption by N 6 mA methylation and are therefore possible regulatory sites. The human signal recognition particle RNA and many related Alu retrotransposon RNA species are also methylated at N6 of an adenine that forms a sheared base pair with guanine and mediates a key tertiary interaction. © 2017 The Authors. Published under the terms of the CC BY 4.0 license.
Effect of base sequence on the DNA cross-linking properties of pyrrolobenzodiazepine (PBD) dimers

PubMed Central

Rahman, Khondaker M.; James, Colin H.; Thurston, David E.

2011-01-01

Pyrrolo[2,1-c][1,4]benzodiazepine (PBD) dimers are synthetic sequence-selective DNA minor-groove cross-linking agents that possess two electrophilic imine moieties (or their equivalent) capable of forming covalent aminal linkages with guanine C2-NH2 functionalities. The PBD dimer SJG-136, which has a C8–O–(CH2)3–O–C8′′ central linker joining the two PBD moieties, is currently undergoing phase II clinical trials and current research is focused on developing analogues of SJG-136 with different linker lengths and substitution patterns. Using a reversed-phase ion pair HPLC/MS method to evaluate interaction with oligonucleotides of varying length and sequence, we recently reported (JACS, 2009, 131, 13 756) that SJG-136 can form three different types of adducts: inter- and intrastrand cross-linked adducts, and mono-alkylated adducts. These studies have now been extended to include PBD dimers with a longer central linker (C8–O–(CH2)5–O–C8′), demonstrating that the type and distribution of adducts appear to depend on (i) the length of the C8/C8′-linker connecting the two PBD units, (ii) the positioning of the two reactive guanine bases on the same or opposite strands, and (iii) their separation (i.e. the number of base pairs, usually ATs, between them). Based on these data, a set of rules are emerging that can be used to predict the DNA–interaction behaviour of a PBD dimer of particular C8–C8′ linker length towards a given DNA sequence. These observations suggest that it may be possible to design PBD dimers to target specific DNA sequences. PMID:21427082
A fast method for searching for repeating earthquakes, applied to the northern San Francisco Bay area

NASA Astrophysics Data System (ADS)

Shakibay Senobari, N.; Funning, G.

2016-12-01

Repeating earthquakes (REs) are the regular or semi-regular failures of the same patch on a fault, producing near-identical waveforms at a given station. Sequences of REs are commonly interpreted as slip on small locked patches surrounded by large areas of fault that are creeping (Nadeau and McEvilly, 1999). Detecting them, therefore, places important constraints on the extent of fault creep at depth. In addition, the magnitude and recurrence interval of these RE sequences can be related to the creep rate and used as constraints on slip models. In this study we search for REs in northern California fault systems upon which creep is suspected, but not well constrained, including the Rodgers Creek, Maacama, Bartlett Springs, Concord-Green Valley, West Napa and Greenville faults, targeting events recorded at stations where the instrument was not changed for 10 years or more. A pair of events can be identified as REs based on a high cross-correlation coefficient (CCC) between their waveforms. Thus a fundamental step in RE searches is calculating the CCC for all event waveform pairs recorded at common stations. This becomes computationally expensive for large data sets. To expedite our search, we use a fast and accurate similarity search algorithm developed by the computer science community (Mueen et al., 2015; Zhu et al., 2016). Our initial tests on a data set including 1500 waveforms suggest it is around 40 times faster than the algorithm that we used previously (Shakibay Senobari and Funning, AGU Fall Meeting 2014). We search for event pairs with CCC>0.85 and cluster them based on their similarity. A second, location based filter, based on the differential S-P times for each event pair at 5 or more stations, is used as an independent check. We consider a cluster of events a RE sequence if the source location separation distance for each pair is less than the estimated circular size of the source (e.g. Chen et al., 2008); these are gathered into an RE catalogue. In future, we plan to use this information in combination with geodetic data to produce a robust creep distribution model for all of the faults in this region.
A unique chromatin complex occupies young α-satellite arrays of human centromeres

PubMed Central

Henikoff, Jorja G.; Thakur, Jitendra; Kasinathan, Sivakanthan; Henikoff, Steven

2015-01-01

The intractability of homogeneous α-satellite arrays has impeded understanding of human centromeres. Artificial centromeres are produced from higher-order repeats (HORs) present at centromere edges, although the exact sequences and chromatin conformations of centromere cores remain unknown. We use high-resolution chromatin immunoprecipitation (ChIP) of centromere components followed by clustering of sequence data as an unbiased approach to identify functional centromere sequences. We find that specific dimeric α-satellite units shared by multiple individuals dominate functional human centromeres. We identify two recently homogenized α-satellite dimers that are occupied by precisely positioned CENP-A (cenH3) nucleosomes with two ~100–base pair (bp) DNA wraps in tandem separated by a CENP-B/CENP-C–containing linker, whereas pericentromeric HORs show diffuse positioning. Precise positioning is largely maintained, whereas abundance decreases exponentially with divergence, which suggests that young α-satellite dimers with paired ~100-bp particles mediate evolution of functional human centromeres. Our unbiased strategy for identifying functional centromeric sequences should be generally applicable to tandem repeat arrays that dominate the centromeres of most eukaryotes. PMID:25927077
Analysis of the genome sequence of the pathogenic Muscovy duck parvovirus strain YY reveals a 14-nucleotide-pair deletion in the inverted terminal repeats.

PubMed

Wang, Jianye; Huang, Yu; Zhou, Mingxu; Zhu, Guoqiang

2016-09-01

Genomic information about Muscovy duck parvovirus is still limited. In this study, the genome of the pathogenic MDPV strain YY was sequenced. The full-length genome of YY is 5075 nucleotides (nt) long, 57 nt shorter than that of strain FM. Sequence alignment indicates that the 5' and 3' inverted terminal repeats (ITR) of strain YY contain a 14-nucleotide-pair deletion in the stem of the palindromic hairpin structure in comparison to strain FM and FZ91-30. The deleted region contains one "E-box" site and one repeated motif with the sequence "TTCCGGT" or "ACCGGAA". Phylogenetic trees constructed based the protein coding genes concordantly showed that YY, together with nine other MDPV isolates from various places, clustered in a separate branch, distinct from the branch formed by goose parvovirus (GPV) strains. These results demonstrate that, despite the distinctive deletion, the YY strain still belongs to the classical MDPV group. Moreover, the deletion of ITR may contribute to the genome evolution of MDPV under immunization pressure.
Chromosome-based survey sequencing reveals the genome organization of wild wheat progenitor Triticum dicoccoides.

PubMed

Akpinar, Bala Ani; Biyiklioglu, Sezgi; Alptekin, Burcu; Havránková, Miroslava; Vrána, Jan; Doležel, Jaroslav; Distelfeld, Assaf; Hernandez, Pilar; Budak, Hikmet

2018-05-04

Wild emmer wheat (Triticum turgidum ssp. dicoccoides) is the progenitor of wheat. We performed chromosome-based survey sequencing of the 14 chromosomes, examining repetitive sequences, protein-coding genes, miRNA/target pairs and tRNA genes, as well as syntenic relationships with related grasses. We found considerable differences in the content and distribution of repetitive sequences between the A and B subgenomes. The gene contents of individual chromosomes varied widely, not necessarily correlating with chromosome size. We catalogued candidate agronomically important loci, along with new alleles and flanking sequences that can be used to design exome sequencing. Syntenic relationships and virtual gene orders revealed several small-scale evolutionary rearrangements, in addition to providing evidence for the 4AL-5AL-7BS translocation in wild emmer wheat. Chromosome-based sequence assemblies contained five novel miRNA families, among 59 families putatively encoded in the entire genome which provide insight into the domestication of wheat and an overview of the genome content and organization. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Using a Sequence of Number Pairs as an Example in Teaching Mathematics

ERIC Educational Resources Information Center

Mauch, Elizabeth; Shi, Yixun

2005-01-01

A sequence of number pairs can be used to generate many interesting examples in teaching mathematics subjects at various levels. It is often used in elementary or middle school mathematics classes to illustrate the concept of "patterns." In this paper the authors present a few interesting ways of using this sequence to form examples for high…
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis

PubMed Central

2012-01-01

Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.

PubMed

Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong

2012-01-25

The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.
FlyPrimerBank: An Online Database for Drosophila melanogaster Gene Expression Analysis and Knockdown Evaluation of RNAi Reagents

PubMed Central

Hu, Yanhui; Sopko, Richelle; Foos, Marianna; Kelley, Colleen; Flockhart, Ian; Ammeux, Noemie; Wang, Xiaowei; Perkins, Lizabeth; Perrimon, Norbert; Mohr, Stephanie E.

2013-01-01

The evaluation of specific endogenous transcript levels is important for understanding transcriptional regulation. More specifically, it is useful for independent confirmation of results obtained by the use of microarray analysis or RNA-seq and for evaluating RNA interference (RNAi)-mediated gene knockdown. Designing specific and effective primers for high-quality, moderate-throughput evaluation of transcript levels, i.e., quantitative, real-time PCR (qPCR), is nontrivial. To meet community needs, predefined qPCR primer pairs for mammalian genes have been designed and sequences made available, e.g., via PrimerBank. In this work, we adapted and refined the algorithms used for the mammalian PrimerBank to design 45,417 primer pairs for 13,860 Drosophila melanogaster genes, with three or more primer pairs per gene. We experimentally validated primer pairs for ~300 randomly selected genes expressed in early Drosophila embryos, using SYBR Green-based qPCR and sequence analysis of products derived from conventional PCR. All relevant information, including primer sequences, isoform specificity, spatial transcript targeting, and any available validation results and/or user feedback, is available from an online database (www.flyrnai.org/flyprimerbank). At FlyPrimerBank, researchers can retrieve primer information for fly genes either one gene at a time or in batch mode. Importantly, we included the overlap of each predicted amplified sequence with RNAi reagents from several public resources, making it possible for researchers to choose primers suitable for knockdown evaluation of RNAi reagents (i.e., to avoid amplification of the RNAi reagent itself). We demonstrate the utility of this resource for validation of RNAi reagents in vivo. PMID:23893746
Partial gene sequences for the A subunit of methyl-coenzyme M reductase (mcrI) as a phylogenetic tool for the family Methanosarcinaceae

NASA Technical Reports Server (NTRS)

Springer, E.; Sachs, M. S.; Woese, C. R.; Boone, D. R.

1995-01-01

Representatives of the family Methanosarcinaceae were analyzed phylogenetically by comparing partial sequences of their methyl-coenzyme M reductase (mcrI) genes. A 490-bp fragment from the A subunit of the gene was selected, amplified by the PCR, cloned, and sequenced for each of 25 strains belonging to the Methanosarcinaceae. The sequences obtained were aligned with the corresponding portions of five previously published sequences, and all of the sequences were compared to determine phylogenetic distances by Fitch distance matrix methods. We prepared analogous trees based on 16S rRNA sequences; these trees corresponded closely to the mcrI trees, although the mcrI sequences of pairs of organisms had 3.01 +/- 0.541 times more changes than the respective pairs of 16S rRNA sequences, suggesting that the mcrI fragment evolved about three times more rapidly than the 16S rRNA gene. The qualitative similarity of the mcrI and 16S rRNA trees suggests that transfer of genetic information between dissimilar organisms has not significantly affected these sequences, although we found inconsistencies between some mcrI distances that we measured and and previously published DNA reassociation data. It is unlikely that multiple mcrI isogenes were present in the organisms that we examined, because we found no major discrepancies in multiple determinations of mcrI sequences from the same organism. Our primers for the PCR also match analogous sites in the previously published mcrII sequences, but all of the sequences that we obtained from members of the Methanosarcinaceae were more closely related to mcrI sequences than to mcrII sequences, suggesting that members of the Methanosarcinaceae do not have distinct mcrII genes.
Incorporating a guanidine-modified cytosine base into triplex-forming PNAs for the recognition of a C-G pyrimidine–purine inversion site of an RNA duplex

PubMed Central

Toh, Desiree-Faye Kaixin; Devi, Gitali; Patil, Kiran M.; Qu, Qiuyu; Maraswami, Manikantha; Xiao, Yunyun; Loh, Teck Peng; Zhao, Yanli; Chen, Gang

2016-01-01

RNA duplex regions are often involved in tertiary interactions and protein binding and thus there is great potential in developing ligands that sequence-specifically bind to RNA duplexes. We have developed a convenient synthesis method for a modified peptide nucleic acid (PNA) monomer with a guanidine-modified 5-methyl cytosine base. We demonstrated by gel electrophoresis, fluorescence and thermal melting experiments that short PNAs incorporating the modified residue show high binding affinity and sequence specificity in the recognition of an RNA duplex containing an internal inverted Watson-Crick C-G base pair. Remarkably, the relatively short PNAs show no appreciable binding to DNA duplexes or single-stranded RNAs. The attached guanidine group stabilizes the base triple through hydrogen bonding with the G base in a C-G pair. Selective binding towards an RNA duplex over a single-stranded RNA can be rationalized by the fact that alkylation of the amine of a 5-methyl C base blocks the Watson–Crick edge. PNAs incorporating multiple guanidine-modified cytosine residues are able to enter HeLa cells without any transfection agent. PMID:27596599

Manipulative interplay of two adozelesin molecules with d(ATTAAT)₂achieving ligand-stacked Watson-Crick and Hoogsteen base-paired duplex adducts.

PubMed

Hopton, Suzanne R; Thompson, Andrew S

2011-05-17

Previous structural studies of the cyclopropapyrroloindole (CPI) antitumor antibiotics have shown that these ligands bind covalently edge-on into the minor groove of double-stranded DNA. Reversible covalent modification of the DNA via N3 of adenine occurs in a sequence-specific fashion. Early nuclear magnetic resonance and molecular modeling studies with both mono- and bis-alkylating ligands indicated that the ligands fit tightly within the minor groove, causing little distortion of the helix. In this study, we propose a new binding model for several of the CPI-based analogues, in which the aromatic secondary rings form π-stacked complexes within the minor groove. One of the adducts, formed with adozelesin and the d(ATTAAT)(2) sequence, also demonstrates the ability of these ligands to manipulate the DNA of the binding site, resulting in a Hoogsteen base-paired adduct. Although this type of base pairing has been previously observed with the bisfunctional CPI analogue bizelesin, this is the first time that such an observation has been made with a monoalkylating nondimeric analogue. Together, these results provide a new model for the design of CPI-based antitumor antibiotics, which also has a significant bearing on other structurally related and structurally unrelated minor groove-binding ligands. They indicate the dynamic nature of ligand-DNA interactions, demonstrating both DNA conformational flexibility and the ability of two DNA-bound ligands to interact to form stable covalent modified complexes.
AlignMe—a membrane protein sequence alignment web server

PubMed Central

Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

2014-01-01

We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425
Technical Report on Modeling for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

McLoughlin, K.

2016-01-11

The overall aim of this project is to develop a software package, called MetaQuant, that can determine the constituents of a complex microbial sample and estimate their relative abundances by analysis of metagenomic sequencing data. The goal for Task 1 is to create a generative model describing the stochastic process underlying the creation of sequence read pairs in the data set. The stages in this generative process include the selection of a source genome sequence for each read pair, with probability dependent on its abundance in the sample. The other stages describe the evolution of the source genome from itsmore » nearest common ancestor with a reference genome, breakage of the source DNA into short fragments, and the errors in sequencing the ends of the fragments to produce read pairs.« less
Anticancer property of sediment actinomycetes against MCF-7 and MDA-MB-231 cell lines.

PubMed

Ravikumar, S; Fredimoses, M; Gnanadesigan, M

2012-02-01

To investigate the anticancer property of marine sediment actinomycetes against two different breast cancer cell lines. In vitro anticancer activity was carried out against breast (MCF-7 and MDA-MB-231) cancer cell lines. Partial sequences of the 16s rRNA gene, phylogenetic tree construction, multiple sequence analysis and secondary structure analysis were also carried out with the actinomycetes isolates. Of the selected five actinomycete isolates, ACT01 and ACT02 showed the IC50 value with (10.13±0.92) and (22.34±5.82) µg/mL concentrations, respectively for MCF-7 cell line at 48 h, but ACT01 showed the minimum (18.54±2.49 µg/mL) level of IC50 value with MDA-MB-231 cell line. Further, the 16s rRNA partial sequences of ACT01, ACT02, ACT03, ACT04 and ACT05 isolates were also deposited in NCBI data bank with the accession numbers of GQ478246, GQ478247, GQ478248, GQ478249 and GQ478250, respectively. The phylogenetic tree analysis showed that, the isolates of ACT02 and ACT03 were represented in group I and III, respectively, but ACT01 and ACT02 were represented in group II. The multiple sequence alignment of the actinomycete isolates showed that, the maximum identical conserved regions were identified with the nucleotide regions of 125 to 221st base pairs, 65 to 119th base pairs and 55, 48 and 31st base pairs. Secondary structure prediction of the 16s rRNA showed that, the maximum free energy was consumed with ACT03 isolate (-45.4 kkal/mol) and the minimum free energy was consumed with ACT04 isolate (-57.6 kkal/mol). The actinomycete isolates of ACT01 and ACT02 (GQ478246 and GQ478247) which are isolated from sediment sample can be further used as anticancer agents against breast cancer cell lines.
Cryptic breakpoint identified by whole-genome mate-pair sequencing in a rare paternally inherited complex chromosomal rearrangement.

PubMed

Aristidou, Constantia; Theodosiou, Athina; Ketoni, Andria; Bak, Mads; Mehrjouy, Mana M; Tommerup, Niels; Sismani, Carolina

2018-01-01

Precise characterization of apparently balanced complex chromosomal rearrangements in non-affected individuals is crucial as they may result in reproductive failure, recurrent miscarriages or affected offspring. We present a family, where the non-affected father and daughter were found, using FISH and karyotyping, to be carriers of a three-way complex chromosomal rearrangement [t(6;7;10)(q16.2;q34;q26.1), de novo in the father]. The family suffered from two stillbirths, one miscarriage, and has a son with severe intellectual disability. In the present study, the family was revisited using whole-genome mate-pair sequencing. Interestingly, whole-genome mate-pair sequencing revealed a cryptic breakpoint on derivative (der) chromosome 6 rendering the rearrangement even more complex. FISH using a chromosome (chr) 6 custom-designed probe and a chr10 control probe confirmed that the interstitial chr6 segment, created by the two chr6 breakpoints, was translocated onto der(10). Breakpoints were successfully validated with Sanger sequencing, and small imbalances as well as microhomology were identified. Finally, the complex chromosomal rearrangement breakpoints disrupted the SIM1 , GRIK2 , CNTNAP2 , and PTPRE genes without causing any phenotype development. In contrast to the majority of maternally transmitted complex chromosomal rearrangement cases, our study investigated a rare case where a complex chromosomal rearrangement, which most probably resulted from a Type IV hexavalent during the pachytene stage of meiosis I, was stably transmitted from a fertile father to his non-affected daughter. Whole-genome mate-pair sequencing proved highly successful in identifying cryptic complexity, which consequently provided further insight into the meiotic segregation of chromosomes and the increased reproductive risk in individuals carrying the specific complex chromosomal rearrangement. We propose that such complex rearrangements should be characterized in detail using a combination of conventional cytogenetic and NGS-based approaches to aid in better prenatal preimplantation genetic diagnosis and counseling in couples with reproductive problems.
Robust prediction of consensus secondary structures using averaged base pairing probability matrices.

PubMed

Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi

2007-02-15

Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.
Structural basis of DNA bending and oriented heterodimer binding by the basic leucine zipper domains of Fos and Jun.

PubMed

Leonard, D A; Rajaram, N; Kerppola, T K

1997-05-13

Interactions among transcription factors that bind to separate sequence elements require bending of the intervening DNA and juxtaposition of interacting molecular surfaces in an appropriate orientation. Here, we examine the effects of single amino acid substitutions adjacent to the basic regions of Fos and Jun as well as changes in sequences flanking the AP-1 site on DNA bending. Substitution of charged amino acid residues at positions adjacent to the basic DNA-binding domains of Fos and Jun altered DNA bending. The change in DNA bending was directly proportional to the change in net charge for all heterodimeric combinations between these proteins. Fos and Jun induced distinct DNA bends at different binding sites. Exchange of a single base pair outside of the region contacted in the x-ray crystal structure altered DNA bending. Substitution of base pairs flanking the AP-1 site had converse effects on the opposite directions of DNA bending induced by homodimers and heterodimers. These results suggest that Fos and Jun induce DNA bending in part through electrostatic interactions between amino acid residues adjacent to the basic region and base pairs flanking the AP-1 site. DNA bending by Fos and Jun at inverted binding sites indicated that heterodimers bind to the AP-1 site in a preferred orientation. Mutation of a conserved arginine within the basic regions of Fos and transversion of the central C:G base pair in the AP-1 site to G:C had complementary effects on the orientation of heterodimer binding and DNA bending. The conformational variability of the Fos-Jun-AP-1 complex may contribute to its functional versatility at different promoters.
Whole exome sequencing to estimate alloreactivity potential between donors and recipients in stem cell transplantation.

PubMed

Sampson, Juliana K; Sheth, Nihar U; Koparde, Vishal N; Scalora, Allison F; Serrano, Myrna G; Lee, Vladimir; Roberts, Catherine H; Jameson-Lee, Max; Ferreira-Gonzalez, Andrea; Manjili, Masoud H; Buck, Gregory A; Neale, Michael C; Toor, Amir A

2014-08-01

Whole exome sequencing (WES) was performed on stem cell transplant donor-recipient (D-R) pairs to determine the extent of potential antigenic variation at a molecular level. In a small cohort of D-R pairs, a high frequency of sequence variation was observed between the donor and recipient exomes independent of human leucocyte antigen (HLA) matching. Nonsynonymous, nonconservative single nucleotide polymorphisms were approximately twice as frequent in HLA-matched unrelated, compared with related D-R pairs. When mapped to individual chromosomes, these polymorphic nucleotides were uniformly distributed across the entire exome. In conclusion, WES reveals extensive nucleotide sequence variation in the exomes of HLA-matched donors and recipients. © 2014 John Wiley & Sons Ltd.
How Mg2+ ion and water network affect the stability and structure of non-Watson-Crick base pairs in E. coli loop E of 5S rRNA: a molecular dynamics and reference interaction site model (RISM) study.

PubMed

Shanker, Sudhanshu; Bandyopadhyay, Pradipta

2017-08-01

The non-Watson-Crick (non-WC) base pairs of Escherichia coli loop E of 5S rRNA are stabilized by Mg 2+ ions through water-mediated interaction. It is important to know the synergic role of Mg 2+ and the water network surrounding Mg 2+ in stabilizing the non-WC base pairs of RNA. For this purpose, free energy change of the system is calculated using molecular dynamics (MD) simulation as Mg 2+ is pulled from RNA, which causes disturbance of the water network. It was found that Mg 2+ remains hexahydrated unless it is close to or far from RNA. In the pentahydrated form, Mg 2+ interacts directly with RNA. Water network has been identified by two complimentary methods; MD followed by a density-based clustering algorithm and three-dimensional-reference interaction site model. These two methods gave similar results. Identification of water network around Mg 2+ and non-WC base pairs gives a clue to the strong effect of water network on the stability of this RNA. Based on sequence analysis of all Eubacteria 5s rRNA, we propose that hexahydrated Mg 2+ is an integral part of this RNA and geometry of base pairs surrounding it adjust to accommodate the [Formula: see text]. Overall the findings from this work can help in understanding the basis of the complex structure and stability of RNA with non-WC base pairs.
SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

USGS Publications Warehouse

Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

2013-01-01

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

PubMed

Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

2013-01-01

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
Digital analyzer for point processes based on first-in-first-out memories

NASA Astrophysics Data System (ADS)

Basano, Lorenzo; Ottonello, Pasquale; Schiavi, Enore

1992-06-01

We present an entirely new version of a multipurpose instrument designed for the statistical analysis of point processes, especially those characterized by high bunching. A long sequence of pulses can be recorded in the RAM bank of a personal computer via a suitably designed front end which employs a pair of first-in-first-out (FIFO) memories; these allow one to build an analyzer that, besides being simpler from the electronic point of view, is capable of sustaining much higher intensity fluctuations of the point process. The overflow risk of the device is evaluated by treating the FIFO pair as a queueing system. The apparatus was tested using both a deterministic signal and a sequence of photoelectrons obtained from laser light scattered by random surfaces.
Formation of rings from segments of HeLa-cell nuclear deoxyribonucleic acid

PubMed Central

Hardman, Norman

1974-01-01

Duplex segments of HeLa-cell nuclear DNA were generated by cleavage with DNA restriction endonuclease from Haemophilus influenzae. About 20–25% of the DNA segments produced, when partly degraded with exonuclease III and annealed, were found to form rings visible in the electron microscope. A further 5% of the DNA segments formed structures that were branched in configuration. Similar structures were generated from HeLa-cell DNA, without prior treatment with restriction endonuclease, when the complementary polynucleotide chains were exposed by exonuclease III action at single-chain nicks. After exposure of an average single-chain length of 1400 nucleotides per terminus at nicks in HeLa-cell DNA by exonuclease III, followed by annealing, the physical length of ring closures was estimated and found to be 0.02–0.1μm, or 50–300 base pairs. An almost identical distribution of lengths was recorded for the regions of complementary base sequence responsible for branch formation. It is proposed that most of the rings and branches are formed from classes of reiterated base sequence with an average length of 180 base pairs arranged intermittenly in HeLa-cell DNA. From the rate of formation of branched structures when HeLa-cell DNA segments were heat-denatured and annealed, it is estimated that the reiterated sequences are in families containing approximately 2400–24000 copies. ImagesPLATE 2PLATE 1 PMID:4462738
Architecture of a Fur Binding Site: a Comparative Analysis

PubMed Central

Lavrrar, Jennifer L.; McIntosh, Mark A.

2003-01-01

Fur is an iron-binding transcriptional repressor that recognizes a 19-bp consensus site of the sequence 5′-GATAATGATAATCATTATC-3′. This site can be defined as three adjacent hexamers of the sequence 5′-GATAAT-3′, with the third being slightly imperfect (an F-F-F configuration), or as two hexamers in the forward orientation separated by one base pair from a third hexamer in the reverse orientation (an F-F-x-R configuration). Although Fur can bind synthetic DNA sequences containing the F-F-F arrangement, most natural binding sites are variations of the F-F-x-R arrangement. The studies presented here compared the ability of Fur to recognize synthetic DNA sequences containing two to four adjacent hexamers with binding to sequences containing variations of the F-F-x-R arrangement (including natural operator sequences from the entS and fepB promoter regions of Escherichia coli). Gel retardation assays showed that the F-F-x-R architecture was necessary for high-affinity Fur-DNA interactions and that contiguous hexamers were not recognized as effectively. In addition, the stoichiometry of Fur at each binding site was determined, showing that Fur interacted with its minimal 19-bp binding site as two overlapping dimers. These data confirm the proposed overlapping-dimer binding model, where the unit of interaction with a single Fur dimer is two inverted hexamers separated by a C:G base pair, with two overlapping units comprising the 19-bp consensus binding site required for the high-affinity interaction with two Fur dimers. PMID:12644489
Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project

PubMed Central

Gilbert, Jack A.; Meyer, Folker; Antonopoulos, Dion; Balaji, Pavan; Brown, C. Titus; Brown, Christopher T.; Desai, Narayan; Eisen, Jonathan A; Evers, Dirk; Field, Dawn; Feng, Wu; Huson, Daniel; Jansson, Janet; Knight, Rob; Knight, James; Kolker, Eugene; Konstantindis, Kostas; Kostka, Joel; Kyrpides, Nikos; Mackelprang, Rachel; McHardy, Alice; Quince, Christopher; Raes, Jeroen; Sczyrba, Alexander; Shade, Ashley; Stevens, Rick

2010-01-01

Between July 18th and 24th 2010, 26 leading microbial ecology, computation, bioinformatics and statistics researchers came together in Snowbird, Utah (USA) to discuss the challenge of how to best characterize the microbial world using next-generation sequencing technologies. The meeting was entitled “Terabase Metagenomics” and was sponsored by the Institute for Computing in Science (ICiS) summer 2010 workshop program. The aim of the workshop was to explore the fundamental questions relating to microbial ecology that could be addressed using advances in sequencing potential. Technological advances in next-generation sequencing platforms such as the Illumina HiSeq 2000 can generate in excess of 250 billion base pairs of genetic information in 8 days. Thus, the generation of a trillion base pairs of genetic information is becoming a routine matter. The main outcome from this meeting was the birth of a concept and practical approach to exploring microbial life on earth, the Earth Microbiome Project (EMP). Here we briefly describe the highlights of this meeting and provide an overview of the EMP concept and how it can be applied to exploration of the microbiome of each ecosystem on this planet. PMID:21304727
Homology between DNA polymerases of poxviruses, herpesviruses, and adenoviruses: nucleotide sequence of the vaccinia virus DNA polymerase gene.

PubMed Central

Earl, P L; Jones, E V; Moss, B

1986-01-01

A 5400-base-pair segment of the vaccinia virus genome was sequenced and an open reading frame of 938 codons was found precisely where the DNA polymerase had been mapped by transfer of a phosphonoacetate-resistance marker. A single nucleotide substitution changing glycine at position 347 to aspartic acid accounts for the drug resistance of the mutant vaccinia virus. The 5' end of the DNA polymerase mRNA was located 80 base pairs before the methionine codon initiating the open reading frame. Correspondence between the predicted Mr 108,577 polypeptide and the 110,000 purified enzyme indicates that little or no proteolytic processing occurs. Extensive homology, extending over 435 amino acids, was found upon comparing the DNA polymerase of vaccinia virus and DNA polymerase of Epstein-Barr virus. A highly conserved sequence of 14 amino acids in the carboxyl-terminal regions of the above DNA polymerases is also present at a similar location in adenovirus DNA polymerase. This structure, which is predicted to form a turn flanked by beta-pleated sheets, may form part of an essential binding or catalytic site that accounts for its presence in DNA polymerases of poxviruses, herpesviruses, and adenoviruses. Images PMID:3012524
Identification of a new Apscaviroid from Japanese persimmon.

PubMed

Nakaune, Ryoji; Nakano, Masaaki

2008-01-01

Three viroid-like sequences were detected from Japanese persimmon (Diospyrus kaki Thunb.) by RT-PCR using primers specific for members of the genus Apscaviroid. Based on the sequences, we determined the complete genomic sequences. Two had 92.1-94.3% sequence identity with citrus viroid OS (CVd-OS) and 91.4-96.3% identity with apple fruit crinkle viroid (AFCVd), respectively. Another one, tentatively named persimmon viroid (PVd), had 396 nucleotides and less than 70% sequence identity with known viroids. The secondary structure of PVd is proposed to be rod-like with extensive base pairing and contains the terminal conserved region and the central conserved region characteristic of the genus Apscaviroid. Moreover, we confirmed that the viroids, including PVd, are graft transmissible from persimmon to persimmon and that persimmon is a natural host of these viroids. According to its molecular and biological properties, PVd should be considered a member of a new species in the genus Apscaviroid.
Bovine papilloma virus contains an activator of gene expression at the distal end of the early transcription unit.

PubMed Central

Lusky, M; Berg, L; Weiher, H; Botchan, M

1983-01-01

Bovine papilloma virus (BPV) contains a cis-acting DNA element which can enhance transcription of distal promoters. Utilizing both direct and indirect transient transfection assays, we showed that a 59-base-pair DNA sequence from the BPV genome could activate the simian virus 40 promoter from distances exceeding 2.5 kilobases and in an orientation-independent manner. In contrast to the promoter 5'-proximal localization of other known viral activators, this element was located immediately 3' to the early polyadenylation signal in the BPV genome. Deletion of these sequences from the BPV genome inactivated the transforming ability of BPV recombinant plasmids. Orientation-independent reinsertion of this 59-base-pair sequence, or alternatively of activator DNA sequences from simian virus 40 or polyoma virus, restored the transforming activity of the BPV recombinant plasmids. Furthermore, the stable transformation frequency of the herpes simplex virus type 1 thymidine kinase gene was enhanced when linked to restriction fragments of BPV DNA which included the defined activator element. This enhancement was orientation independent with respect to the thymidine kinase promoter. The enhancement also appeared to be unrelated to the establishment of the recombinant plasmids as episomes, since in transformed cells these sequences are found linked to high-molecular-weight DNA. We propose that the enhancement of stable transformation frequencies and the activation of transcription units are in this case alternate manifestations of the same biochemical events. Images PMID:6308425
Structural analysis of the human U3 ribonucleoprotein particle reveal a conserved sequence available for base pairing with pre-rRNA.

PubMed Central

Parker, K A; Steitz, J A

1987-01-01

The human U3 ribonucleoprotein (RNP) has been analyzed to determine its protein constituents, sites of protein-RNA interaction, and RNA secondary structure. By using anti-U3 RNP antibodies and extracts prepared from HeLa cells labeled in vivo, the RNP was found to contain four nonphosphorylated proteins of 36, 30, 13, and 12.5 kilodaltons and two phosphorylated proteins of 74 and 59 kilodaltons. U3 nucleotides 72-90, 106-121, 154-166, and 190-217 must contain sites that interact with proteins since these regions are immunoprecipitated after treatment of the RNP with RNase A or T1. The secondary structure was probed with specific nucleases and by chemical modification with single-strand-specific reagents that block subsequent reverse transcription. Regions that are single stranded (and therefore potentially able to interact with a substrate RNA) include an evolutionarily conserved sequence at nucleotides 104-112 and nonconserved sequences at nucleotides 65-74, 80-84, and 88-93. Nucleotides 159-168 do not appear to be highly accessible, thus making it unlikely that this U3 sequence base pairs with sequences near the 5.8S rRNA-internal transcribed spacer II junction, as previously proposed. Alternative functions of the U3 RNP are discussed, including the possibility that U3 may participate in a processing event near the 3' end of 28S rRNA. Images PMID:2959855
Molecular epidemiology of Plum pox virus in Japan.

PubMed

Maejima, Kensaku; Himeno, Misako; Komatsu, Ken; Takinami, Yusuke; Hashimoto, Masayoshi; Takahashi, Shuichiro; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou

2011-05-01

For a molecular epidemiological study based on complete genome sequences, 37 Plum pox virus (PPV) isolates were collected from the Kanto region in Japan. Pair-wise analyses revealed that all 37 Japanese isolates belong to the PPV-D strain, with low genetic diversity (less than 0.8%). In phylogenetic analysis of the PPV-D strain based on complete nucleotide sequences, the relationships of the PPV-D strain were reconstructed with high resolution: at the global level, the American, Canadian, and Japanese isolates formed their own distinct monophyletic clusters, suggesting that the routes of viral entry into these countries were independent; at the local level, the actual transmission histories of PPV were precisely reconstructed with high bootstrap support. This is the first description of the molecular epidemiology of PPV based on complete genome sequences.

Designing heteropolymers to fold into unique structures via water-mediated interactions.

PubMed

Jamadagni, Sumanth N; Bosoy, Christian; Garde, Shekhar

2010-10-28

Hydrophobic homopolymers collapse into globular structures in water driven by hydrophobic interactions. Here we employ extensive molecular dynamics simulations to study the collapse of heteropolymers containing one or two pairs of oppositely charged monomers. We show that charging a pair of monomers can dramatically alter the most stable conformations from compact globular to more open hairpin-like. We systematically explore a subset of the sequence space of one- and two-charge-pair polymers, focusing on the locations of the charge pairs. Conformational stability is governed by a balance of hydrophobic interactions, hydration and interactions of charge groups, water-mediated charged-hydrophobic monomer repulsions, and other factors. As a result, placing charge pairs in the middle, away from the hairpin ends, leads to stable hairpin-like structures. Turning off the monomer-water attractions enhances hydrophobic interactions significantly leading to a collapse into compact globular structures even for two-charge-pair heteropolymers. In contrast, the addition of salt leads to open and extended structures, suggesting that solvation of charged monomer sites by salt ions dominates the salt-induced enhancement of hydrophobic interactions. We also test the ability of a predictive scheme based on the additivity of free energy of contact formation. The success of the scheme for symmetric two-charge-pair sequences and the failure for their flipped versions highlight the complexity of the heteropolymer conformation space and of the design problem. Collectively, our results underscore the ability of tuning water-mediated interactions to design stable nonglobular structures in water and present model heteropolymers for further studies in the extended thermodynamic space and in inhomogeneous environments.
Structural Requirement in Clostridium perfringens Collagenase mRNA 5′ Leader Sequence for Translational Induction through Small RNA-mRNA Base Pairing

PubMed Central

Nomura, Nobuhiko; Nakamura, Kouji

2013-01-01

The Gram-positive anaerobic bacterium Clostridium perfringens is pathogenic to humans and animals, and the production of its toxins is strictly regulated during the exponential phase. We recently found that the 5′ leader sequence of the colA transcript encoding collagenase, which is a major toxin of this organism, is processed and stabilized in the presence of the small RNA VR-RNA. The primary colA 5′-untranslated region (5′UTR) forms a long stem-loop structure containing an internal bulge and masks its own ribosomal binding site. Here we found that VR-RNA directly regulates colA expression through base pairing with colA mRNA in vivo. However, when the internal bulge structure was closed by point mutations in colA mRNA, translation ceased despite the presence of VR-RNA. In addition, a mutation disrupting the colA stem-loop structure induced mRNA processing and ColA-FLAG translational activation in the absence of VR-RNA, indicating that the stem-loop and internal bulge structure of the colA 5′ leader sequence is important for regulation by VR-RNA. On the other hand, processing was required for maximal ColA expression but was not essential for VR-RNA-dependent colA regulation. Finally, colA processing and translational activation were induced at a high temperature without VR-RNA. These results suggest that inhibition of the colA 5′ leader structure through base pairing is the primary role of VR-RNA in colA regulation and that the colA 5′ leader structure is a possible thermosensor. PMID:23585542
pH-Modulated Watson-Crick duplex-quadruplex equilibria of guanine-rich and cytosine-rich DNA sequences 140 base pairs upstream of the c-kit transcription initiation site.

PubMed

Bucek, Pavel; Jaumot, Joaquim; Aviñó, Anna; Eritja, Ramon; Gargallo, Raimundo

2009-11-23

Guanine-rich regions of DNA are sequences capable of forming G-quadruplex structures. The formation of a G-quadruplex structure in a region 140 base pairs (bp) upstream of the c-kit transcription initiation site was recently proposed (Fernando et al., Biochemistry, 2006, 45, 7854). In the present study, the acid-base equilibria and the thermally induced unfolding of the structures formed by a guanine-rich region and by its complementary cytosine-rich strand in c-kit were studied by means of circular dichroism and molecular absorption spectroscopies. In addition, competition between the Watson-Crick duplex and the isolated structures was studied as a function of pH value and temperature. Multivariate data analysis methods based on both hard and soft modeling were used to allow accurate quantification of the various acid-base species present in the mixtures. Results showed that the G-quadruplex and i-motif coexist with the Watson-Crick duplex over the pH range from 3.0 to 6.5, approximately, under the experimental conditions tested in this study. At pH 7.0, the duplex is practically the only species present.
Transcriptome Analysis of Houttuynia cordata Thunb. by Illumina Paired-End RNA Sequencing and SSR Marker Discovery

PubMed Central

Wei, Lin; Li, Shenghua; Liu, Shenggui; He, Anna; Wang, Dan; Wang, Jie; Tang, Yulian; Wu, Xianjin

2014-01-01

Background Houttuynia cordata Thunb. is an important traditional medical herb in China and other Asian countries, with high medicinal and economic value. However, a lack of available genomic information has become a limitation for research on this species. Thus, we carried out high-throughput transcriptomic sequencing of H. cordata to generate an enormous transcriptome sequence dataset for gene discovery and molecular marker development. Principal Findings Illumina paired-end sequencing technology produced over 56 million sequencing reads from H. cordata mRNA. Subsequent de novo assembly yielded 63,954 unigenes, 39,982 (62.52%) and 26,122 (40.84%) of which had significant similarity to proteins in the NCBI nonredundant protein and Swiss-Prot databases (E-value <10−5), respectively. Of these annotated unigenes, 30,131 and 15,363 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In addition, 24,434 (38.21%) unigenes were mapped onto 128 pathways using the KEGG pathway database and 17,964 (44.93%) unigenes showed homology to Vitis vinifera (Vitaceae) genes in BLASTx analysis. Furthermore, 4,800 cDNA SSRs were identified as potential molecular markers. Fifty primer pairs were randomly selected to detect polymorphism among 30 samples of H. cordata; 43 (86%) produced fragments of expected size, suggesting that the unigenes were suitable for specific primer design and of high quality, and the SSR marker could be widely used in marker-assisted selection and molecular breeding of H. cordata in the future. Conclusions This is the first application of Illumina paired-end sequencing technology to investigate the whole transcriptome of H. cordata and to assemble RNA-seq reads without a reference genome. These data should help researchers investigating the evolution and biological processes of this species. The SSR markers developed can be used for construction of high-resolution genetic linkage maps and for gene-based association analyses in H. cordata. This work will enable future functional genomic research and research into the distinctive active constituents of this genus. PMID:24392108
The promises and pitfalls of RNA-interference-based therapeutics

PubMed Central

Castanotto, Daniela; Rossi, John J.

2009-01-01

The discovery that gene expression can be controlled by the Watson–Crick base-pairing of small RNAs with messenger RNAs containing complementary sequence — a process known as RNA interference — has markedly advanced our understanding of eukaryotic gene regulation and function. The ability of short RNA sequences to modulate gene expression has provided a powerful tool with which to study gene function and is set to revolutionize the treatment of disease. Remarkably, despite being just one decade from its discovery, the phenomenon is already being used therapeutically in human clinical trials, and biotechnology companies that focus on RNA-interference-based therapeutics are already publicly traded. PMID:19158789
The role of the hippocampus in transitive inference

PubMed Central

Zalesak, Martin; Heckers, Stephan

2009-01-01

Transitive inference (TI) is the ability to infer the relationship between items (e.g., A>C) after having learned a set of premise pairs (e.g., A>B and B>C). Previous studies in humans have identified a distributed neural network, including cortex, hippocampus, and thalamus, during TI judgments. We studied two aspects of TI using fMRI of subjects who had acquired the 6-item sequence (A>B>C>D>E>F) of visual stimuli. First, the identification of novel pairs not containing end items (i.e., B>D, C>E, B>E) was associated with greater left hippocampal activation when compared to the identification of novel pairs containing end items A and F. This demonstrates that the identification of stimulus pairs requiring the flexible representation of a sequence is associated with hippocampal activation. Second, for the three novel pairs devoid of end items we found greater right hippocampal activation for pairs B>D and C>E compared with pair B>E. This indicates that TI decisions on pairs derived from more adjacent items in the sequence are associated with greater hippocampal activation. Hippocampal activation thus scales with the degree of relational processing necessary for TI judgments. Both findings confirm a role of the hippocampus in transitive inference in humans. PMID:19216061
Deriving Heterospecific Self-Assembling Protein-Protein Interactions Using a Computational Interactome Screen.

PubMed

Crooks, Richard O; Baxter, Daniel; Panek, Anna S; Lubben, Anneke T; Mason, Jody M

2016-01-29

Interactions between naturally occurring proteins are highly specific, with protein-network imbalances associated with numerous diseases. For designed protein-protein interactions (PPIs), required specificity can be notoriously difficult to engineer. To accelerate this process, we have derived peptides that form heterospecific PPIs when combined. This is achieved using software that generates large virtual libraries of peptide sequences and searches within the resulting interactome for preferentially interacting peptides. To demonstrate feasibility, we have (i) generated 1536 peptide sequences based on the parallel dimeric coiled-coil motif and varied residues known to be important for stability and specificity, (ii) screened the 1,180,416 member interactome for predicted Tm values and (iii) used predicted Tm cutoff points to isolate eight peptides that form four heterospecific PPIs when combined. This required that all 32 hypothetical off-target interactions within the eight-peptide interactome be disfavoured and that the four desired interactions pair correctly. Lastly, we have verified the approach by characterising all 36 pairs within the interactome. In analysing the output, we hypothesised that several sequences are capable of adopting antiparallel orientations. We subsequently improved the software by removing sequences where doing so led to fully complementary electrostatic pairings. Our approach can be used to derive increasingly large and therefore complex sets of heterospecific PPIs with a wide range of potential downstream applications from disease modulation to the design of biomaterials and peptides in synthetic biology. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Bi-PROF

PubMed Central

Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha

2013-01-01

The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588
A Thiazole Coumarin (TC) Turn-On Fluorescence Probe for AT-Base Pair Detection and Multipurpose Applications in Different Biological Systems

NASA Astrophysics Data System (ADS)

Narayanaswamy, Nagarjun; Kumar, Manoj; Das, Sadhan; Sharma, Rahul; Samanta, Pralok K.; Pati, Swapan K.; Dhar, Suman K.; Kundu, Tapas K.; Govindaraju, T.

2014-09-01

Sequence-specific recognition of DNA by small turn-on fluorescence probes is a promising tool for bioimaging, bioanalytical and biomedical applications. Here, the authors report a novel cell-permeable and red fluorescent hemicyanine-based thiazole coumarin (TC) probe for DNA recognition, nuclear staining and cell cycle analysis. TC exhibited strong fluorescence enhancement in the presence of DNA containing AT-base pairs, but did not fluoresce with GC sequences, single-stranded DNA, RNA and proteins. The fluorescence staining of HeLa S3 and HEK 293 cells by TC followed by DNase and RNase digestion studies depicted the selective staining of DNA in the nucleus over the cytoplasmic region. Fluorescence-activated cell sorting (FACS) analysis by flow cytometry demonstrated the potential application of TC in cell cycle analysis in HEK 293 cells. Metaphase chromosome and malaria parasite DNA imaging studies further confirmed the in vivo diagnostic and therapeutic applications of probe TC. Probe TC may find multiple applications in fluorescence spectroscopy, diagnostics, bioimaging and molecular and cell biology.
A Thiazole Coumarin (TC) Turn-On Fluorescence Probe for AT-Base Pair Detection and Multipurpose Applications in Different Biological Systems

PubMed Central

Narayanaswamy, Nagarjun; Kumar, Manoj; Das, Sadhan; Sharma, Rahul; Samanta, Pralok K.; Pati, Swapan K.; Dhar, Suman K.; Kundu, Tapas K.; Govindaraju, T.

2014-01-01

Sequence-specific recognition of DNA by small turn-on fluorescence probes is a promising tool for bioimaging, bioanalytical and biomedical applications. Here, the authors report a novel cell-permeable and red fluorescent hemicyanine-based thiazole coumarin (TC) probe for DNA recognition, nuclear staining and cell cycle analysis. TC exhibited strong fluorescence enhancement in the presence of DNA containing AT-base pairs, but did not fluoresce with GC sequences, single-stranded DNA, RNA and proteins. The fluorescence staining of HeLa S3 and HEK 293 cells by TC followed by DNase and RNase digestion studies depicted the selective staining of DNA in the nucleus over the cytoplasmic region. Fluorescence-activated cell sorting (FACS) analysis by flow cytometry demonstrated the potential application of TC in cell cycle analysis in HEK 293 cells. Metaphase chromosome and malaria parasite DNA imaging studies further confirmed the in vivo diagnostic and therapeutic applications of probe TC. Probe TC may find multiple applications in fluorescence spectroscopy, diagnostics, bioimaging and molecular and cell biology. PMID:25252596
GRIL-Seq, a method for identifying direct targets of bacterial small regulatory RNA by in vivo proximity ligation

PubMed Central

Han, Kook; Tjaden, Brian; Lory, Stephen

2017-01-01

The first step in the post-transcriptional regulatory function of most bacterial small non-coding RNAs (sRNAs) is base-pairing with partially complementary sequences of targeted transcripts. We present a simple method for identifying sRNA targets in vivo and defining processing sites of the regulated transcripts. The technique (referred to as GRIL-Seq) is based on preferential ligation of sRNAs to ends of base-paired targets in bacteria co-expressing T4 RNA ligase, followed by sequencing to identify the chimeras. In addition to the RNA chaperone Hfq, the GRIL-Seq method depends on the activity of the pyrophosphorylase RppH. Using PrrF1, an iron-regulated sRNA in Pseudomonas aeruginosa, we demonstrate that direct regulatory targets of this sRNA can be readily identified. Therefore, GRIL-Seq represents a powerful tool not only for identifying direct targets of sRNAs in a variety of environments, but can also result in uncovering novel roles for sRNAs and their targets in complex regulatory networks. PMID:28005055
Bacillus Strains Most Closely Related to Bacillus nealsonii Are Not Effectively Circumscribed within the Taxonomic Species Definition

PubMed Central

Peak, K. Kealy; Duncan, Kathleen E.; Luna, Vicki A.; King, Debra S.; McCarthy, Peter J.; Cannons, Andrew C.

2011-01-01

Bacillus strains with >99.7% 16S rRNA gene sequence similarity were characterized with DNA:DNA hybridization, cellular fatty acid (CFA) analysis, and testing of 100 phenotypic traits. When paired with the most closely related type strain, percent DNA:DNA similarities (% S) for six Bacillus strains were all far below the recommended 70% threshold value for species circumscription with Bacillus nealsonii. An apparent genomic group of four Bacillus strain pairings with 94%–70% S was contradicted by the failure of the strains to cluster in CFA- and phenotype-based dendrograms as well as by their differentiation with 9–13 species level discriminators such as nitrate reduction, temperature range, and acid production from carbohydrates. The novel Bacillus strains were monophyletic and very closely related based on 16S rRNA gene sequence. Coherent genomic groups were not however supported by similarly organized phenotypic clusters. Therefore, the strains were not effectively circumscribed within the taxonomic species definition. PMID:22046187
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.

PubMed

Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania

2015-01-01

This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.
The complete mitogenome of the river blackfish, Gadopsis marmoratus (Richardson, 1848) (Teleostei: Percichthyidae).

PubMed

Gan, Han Ming; Tan, Mun Hua; Lee, Yin Peng; Austin, Christopher M

2016-05-01

The mitogenome of the Australian freshwater blackfish, Gadopsis marmoratus was recovered coverage by genome skimming using the MiSeq sequencer (GenBank Accession Number: NC_024436). The blackfish mitogenome has 16,407 base pairs made up of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a 819 bp non-coding AT-rich region. This is the 5th mitogenome sequence to be reported for the family Percichthyidae.
A Bayesian mixture model for chromatin interaction data.

PubMed

Niu, Liang; Lin, Shili

2015-02-01

Chromatin interactions mediated by a particular protein are of interest for studying gene regulation, especially the regulation of genes that are associated with, or known to be causative of, a disease. A recent molecular technique, Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET), that uses chromatin immunoprecipitation (ChIP) and high throughput paired-end sequencing, is able to detect such chromatin interactions genomewide. However, ChIA-PET may generate noise (i.e., pairings of DNA fragments by random chance) in addition to true signal (i.e., pairings of DNA fragments by interactions). In this paper, we propose MC_DIST based on a mixture modeling framework to identify true chromatin interactions from ChIA-PET count data (counts of DNA fragment pairs). The model is cast into a Bayesian framework to take into account the dependency among the data and the available information on protein binding sites and gene promoters to reduce false positives. A simulation study showed that MC_DIST outperforms the previously proposed hypergeometric model in terms of both power and type I error rate. A real data study showed that MC_DIST may identify potential chromatin interactions between protein binding sites and gene promoters that may be missed by the hypergeometric model. An R package implementing the MC_DIST model is available at http://www.stat.osu.edu/~statgen/SOFTWARE/MDM.
Charge transport and ac response under light illumination in gate-modulated DNA molecular junctions.

PubMed

Zhang, Yan; Zhu, Wen-Huan; Ding, Guo-Hui; Dong, Bing; Wang, Xue-Feng

2015-05-22

Using a two-strand tight-binding model and within nonequilibrium Green's function approach, we study charge transport through DNA sequences (GC)NGC and (GC)1(TA)NTA (GC)3 sandwiched between two Pt electrodes. We show that at low temperature DNA sequence (GC)NGC exhibits coherent charge carrier transport at very small bias, since the highest occupied molecular orbital in the GC base pair can be aligned with the Fermi energy of the metallic electrodes by a gate voltage. A weak distance dependent conductance is found in DNA sequence (GC)1(TA)NTA (GC)3 with large NTA. Different from the mechanism of thermally induced hopping of charges proposed by the previous experiments, we find that this phenomenon is dominated by quantum tunnelling through discrete quantum well states in the TA base pairs. In addition, ac response of this DNA junction under light illumination is also investigated. The suppression of ac conductances of the left and right lead of DNA sequences at some particular frequencies is attributed to the excitation of electrons in the DNA to the lead Fermi surface by ac potential, or the excitation of electrons in deep DNA energy levels to partially occupied energy levels in the transport window. Therefore, measuring ac response of DNA junctions can reveal a wealth of information about the intrinsic dynamics of DNA molecules.
Identifying N6-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine

NASA Astrophysics Data System (ADS)

Xing, Pengwei; Su, Ran; Guo, Fei; Wei, Leyi

2017-04-01

N6-methyladenosine (m6A) refers to methylation of the adenosine nucleotide acid at the nitrogen-6 position. It plays an important role in a series of biological processes, such as splicing events, mRNA exporting, nascent mRNA synthesis, nuclear translocation and translation process. Numerous experiments have been done to successfully characterize m6A sites within sequences since high-resolution mapping of m6A sites was established. However, as the explosive growth of genomic sequences, using experimental methods to identify m6A sites are time-consuming and expensive. Thus, it is highly desirable to develop fast and accurate computational identification methods. In this study, we propose a sequence-based predictor called RAM-NPPS for identifying m6A sites within RNA sequences, in which we present a novel feature representation algorithm based on multi-interval nucleotide pair position specificity, and use support vector machine classifier to construct the prediction model. Comparison results show that our proposed method outperforms the state-of-the-art predictors on three benchmark datasets across the three species, indicating the effectiveness and robustness of our method. Moreover, an online webserver implementing the proposed predictor has been established at http://server.malab.cn/RAM-NPPS/. It is anticipated to be a useful prediction tool to assist biologists to reveal the mechanisms of m6A site functions.
Drastic stability change of X-X mismatch in d(CXG) trinucleotide repeat disorders under molecular crowding condition.

PubMed

Teng, Ye; Pramanik, Smritimoy; Tateishi-Karimata, Hisae; Ohyama, Tatsuya; Sugimoto, Naoki

2018-02-05

The trinucleotide repeat d(CXG) (X = A, C, G or T) is the most common sequence causing repeat expansion disorders. The formation of non-canonical structures, such as hairpin structures with X-X mismatches, has been proposed to affect gene expression and regulation, which are important in pathological studies of these devastating neurological diseases. However, little information is available regarding the thermodynamics of the repeat sequence under crowded cellular conditions where many non-canonical structures such as G-quadruplexes are highly stabilized, while duplexes are destabilised. In this study, we investigated the different stabilities of X-X mismatches in the context of internal d(CXG) self-complementary sequences in an environment with a high concentration of cosolutes to mimic the crowding conditions in cells. The stabilities of full-matched duplexes and duplexes with A-A, G-G, and T-T mismatched base pairs under molecular crowding conditions were notably decreased compared to under dilute conditions. However, the stability of the DNA duplex with a C-C mismatch base pair was only slightly destabilised. Investigating different stabilities of X-X mismatches in d(CXG) sequences is important for improving our understanding of the formation and transition of multiple non-canonical structures in trinucleotide repeat diseases, and may provide insights for pathological studies and drug development. Copyright © 2018 Elsevier Inc. All rights reserved.
Formation and Repair of Mismatches Containing Ribonucleotides and Oxidized Bases at Repeated DNA Sequences.

PubMed

Cilli, Piera; Minoprio, Anna; Bossa, Cecilia; Bignami, Margherita; Mazzei, Filomena

2015-10-23

The cellular pool of ribonucleotide triphosphates (rNTPs) is higher than that of deoxyribonucleotide triphosphates. To ensure genome stability, DNA polymerases must discriminate against rNTPs and incorporated ribonucleotides must be removed by ribonucleotide excision repair (RER). We investigated DNA polymerase β (POL β) capacity to incorporate ribonucleotides into trinucleotide repeated DNA sequences and the efficiency of base excision repair (BER) and RER enzymes (OGG1, MUTYH, and RNase H2) when presented with an incorrect sugar and an oxidized base. POL β incorporated rAMP and rCMP opposite 7,8-dihydro-8-oxoguanine (8-oxodG) and extended both mispairs. In addition, POL β was able to insert and elongate an oxidized rGMP when paired with dA. We show that RNase H2 always preserves the capacity to remove a single ribonucleotide when paired to an oxidized base or to incise an oxidized ribonucleotide in a DNA duplex. In contrast, BER activity is affected by the presence of a ribonucleotide opposite an 8-oxodG. In particular, MUTYH activity on 8-oxodG:rA mispairs is fully inhibited, although its binding capacity is retained. This results in the reduction of RNase H2 incision capability of this substrate. Thus complex mispairs formed by an oxidized base and a ribonucleotide can compromise BER and RER in repeated sequences. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Activation energies for dissociation of double strand oligonucleotide anions: evidence for watson-crick base pairing in vacuo.

PubMed

Schnier, P D; Klassen, J S; Strittmatter, E F; Williams, E R

1998-09-23

The dissociation kinetics of a series of complementary and noncomplementary DNA duplexes, (TGCA)(2) (3-), (CCGG)(2) (3-), (AATTAAT)(2) (3-), (CCGGCCG)(2) (3-), A(7)*T(7) (3-), A(7)*A(7) (3-), T(7)*T(7) (3-), and A(7)*C(7) (3-) were investigated using blackbody infrared radiative dissociation in a Fourier transform mass spectrometer. From the temperature dependence of the unimolecular dissociation rate constants, Arrhenius activation parameters in the zero-pressure limit are obtained. Activation energies range from 1.2 to 1.7 eV, and preexponential factors range from 10(13) to 10(19) s(-1). Dissociation of the duplexes results in cleavage of the noncovalent bonds and/or cleavage of covalent bonds leading to loss of a neutral nucleobase followed by backbone cleavage producing sequence-specific (a - base) and w ions. Four pieces of evidence are presented which indicate that Watson-Crick (WC) base pairing is preserved in complementary DNA duplexes in the gas phase: i. the activation energy for dissociation of the complementary dimer, A(7)*T(7) (3-), to the single strands is significantly higher than that for the related noncomplementary A(7)*A(7) (3-) and T(7)*T(7) (3-) dimers, indicating a stronger interaction between strands with a specific base sequence, ii. extensive loss of neutral adenine occurs for A(7)*A(7) (3-) and A(7)*C(7) (3-) but not for A(7)*T(7) (3-) consistent with this process being shut down by WC hydrogen bonding, iii. a correlation is observed between the measured activation energy for dissociation to single strands and the dimerization enthalpy (-DeltaH(d)) in solution, and iv. molecular dynamics carried out at 300 and 400 K indicate that WC base pairing is preserved for A(7)*T(7) (3-) duplex, although the helical structure is essentially lost. In combination, these results provide strong evidence that WC base pairing can exist in the complete absence of solvent.

Frnakenstein: multiple target inverse RNA folding.

PubMed

Lyngsø, Rune B; Anderson, James W J; Sizikova, Elena; Badugu, Amarendra; Hyland, Tomas; Hein, Jotun

2012-10-09

RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein.
Frnakenstein: multiple target inverse RNA folding

PubMed Central

2012-01-01

Background RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. Results In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Conclusions Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein. PMID:23043260
Exact method for numerically analyzing a model of local denaturation in superhelically stressed DNA

NASA Astrophysics Data System (ADS)

Fye, Richard M.; Benham, Craig J.

1999-03-01

Local denaturation, the separation at specific sites of the two strands comprising the DNA double helix, is one of the most fundamental processes in biology, required to allow the base sequence to be read both in DNA transcription and in replication. In living organisms this process can be mediated by enzymes which regulate the amount of superhelical stress imposed on the DNA. We present a numerically exact technique for analyzing a model of denaturation in superhelically stressed DNA. This approach is capable of predicting the locations and extents of transition in circular superhelical DNA molecules of kilobase lengths and specified base pair sequences. It can also be used for closed loops of DNA which are typically found in vivo to be kilobases long. The analytic method consists of an integration over the DNA twist degrees of freedom followed by the introduction of auxiliary variables to decouple the remaining degrees of freedom, which allows the use of the transfer matrix method. The algorithm implementing our technique requires O(N2) operations and O(N) memory to analyze a DNA domain containing N base pairs. However, to analyze kilobase length DNA molecules it must be implemented in high precision floating point arithmetic. An accelerated algorithm is constructed by imposing an upper bound M on the number of base pairs that can simultaneously denature in a state. This accelerated algorithm requires O(MN) operations, and has an analytically bounded error. Sample calculations show that it achieves high accuracy (greater than 15 decimal digits) with relatively small values of M (M<0.05N) for kilobase length molecules under physiologically relevant conditions. Calculations are performed on the superhelical pBR322 DNA sequence to test the accuracy of the method. With no free parameters in the model, the locations and extents of local denaturation predicted by this analysis are in quantitatively precise agreement with in vitro experimental measurements. Calculations performed on the fructose-1,6-bisphosphatase gene sequence from yeast show that this approach can also accurately treat in vivo denaturation.
Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

DOEpatents

Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

2002-10-15

A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Determining protein function and interaction from genome analysis

DOEpatents

Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

2004-08-03

A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Assigning protein functions by comparative genome analysis protein phylogenetic profiles

DOEpatents

Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

2003-05-13

A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Identification of Forensic Samples via Mitochondrial DNA in the Undergraduate Biochemistry Laboratory

NASA Astrophysics Data System (ADS)

Millard, Julie T.; Pilon, André M.

2003-04-01

A recent forensic approach for identification of unknown biological samples is mitochondrial DNA (mtDNA) sequencing. We describe a laboratory exercise suitable for an undergraduate biochemistry course in which the polymerase chain reaction is used to amplify a 440 base pair hypervariable region of human mtDNA from a variety of "crime scene" samples (e.g., teeth, hair, nails, cigarettes, envelope flaps, toothbrushes, and chewing gum). Amplification is verified via agarose gel electrophoresis and then samples are subjected to cycle sequencing. Sequence alignments are made via the program CLUSTAL W, allowing students to compare samples and solve the "crime."
Nucleotide sequence of the gene determining plasmid-mediated citrate utilization.

PubMed Central

Ishiguro, N; Sato, G

1985-01-01

The citrate utilization determinant from transposon Tn3411 has been cloned and sequenced, and its polypeptide products have been characterized in minicell experiments. The nucleotide sequence was determined for a 2,047-base-pair BglII restriction endonuclease fragment that includes the citrate determinant. This region contains an open reading frame that would encode a 431-amino-acid very hydrophobic polypeptide and which is preceded by a reasonable ribosomal binding site. However, the single polypeptide found in minicell experiments had an apparent molecular weight of 35,000 on sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Images PMID:2999087
Nucleotide sequence of the Varkud mitochondrial plasmid of Neurospora and synthesis of a hybrid transcript with a 5' leader derived from mitochondrial RNA.

PubMed

Akins, R A; Grant, D M; Stohl, L L; Bottorff, D A; Nargang, F E; Lambowitz, A M

1988-11-05

The Mauriceville and Varkud mitochondrial plasmids of Neurospora are closely related, closed circular DNAs (3.6 and 3.7 kb, respectively; 1 kb = 10(3) bases or base-pairs), whose characteristics suggest relationships to mitochondrial DNA introns and retrotransposons. Here, we characterized the structure of the Varkud plasmid, determined its complete nucleotide sequence and mapped its major transcripts. The Mauriceville and Varkud plasmids have more than 97% positional identity. Both plasmids contain a 710 amino acid open reading frame that encodes a reverse transcriptase-like protein. The amino acid sequence of this open reading frame is strongly conserved between the two plasmids (701/710 amino acids) as expected for a functionally important protein. Both plasmids have a 0.4 kb region that contains five PstI palindromes and a direct repeat of approximately 160 base-pairs. Comparison of sequences in this region suggests that the Varkud plasmid has diverged less from a common ancestor than has the Mauriceville plasmid. Two major transcripts of the Varkud plasmid were detected by Northern hybridization experiments: a full-length linear RNA of 3.7 kb and an additional prominent transcript of 4.9 kb, 1.2 kb longer than monomer plasmid. Remarkably, we find that the 4.9 kb transcript is a hybrid RNA consisting of the full-length 3.7 kb Varkud plasmid transcript plus a 5' leader of 1.2 kb that is derived from the 5' end of the mitochondrial small rRNA. This and other findings suggest that the Varkud plasmid, like certain RNA viruses, has a mechanism for joining heterologous RNAs to the 5' end of its major transcript, and that, under some circumstances, nucleotide sequences in mitochondria may be recombined at the RNA level.
Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

PubMed

Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

2014-01-01

Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.
Diversity Analysis in Cannabis sativa Based on Large-Scale Development of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers

PubMed Central

Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

2014-01-01

Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis. PMID:25329551
Sequence Variation in the Small-Subunit rRNA Gene of Plasmodium malariae and Prevalence of Isolates with the Variant Sequence in Sichuan, China

PubMed Central

Liu, Qing; Zhu, Shenghua; Mizuno, Sahoko; Kimura, Masatsugu; Liu, Peina; Isomura, Shin; Wang, Xingzhen; Kawamoto, Fumihiko

1998-01-01

By two PCR-based diagnostic methods, Plasmodium malariae infections have been rediscovered at two foci in the Sichuan province of China, a region where no cases of P. malariae have been officially reported for the last 2 decades. In addition, a variant form of P. malariae which has a deletion of 19 bp and seven substitutions of base pairs in the target sequence of the small-subunit (SSU) rRNA gene was detected with high frequency. Alignment analysis of Plasmodium sp. SSU rRNA gene sequences revealed that the 5′ region of the variant sequence is identical to that of P. vivax or P. knowlesi and its 3′ region is identical to that of P. malariae. The same sequence variations were also found in P. malariae isolates collected along the Thai-Myanmar border, suggesting a wide distribution of this variant form from southern China to Southeast Asia. PMID:9774600
Recognition of platinum-DNA adducts by HMGB1a.

PubMed

Ramachandran, Srinivas; Temple, Brenda; Alexandrova, Anastassia N; Chaney, Stephen G; Dokholyan, Nikolay V

2012-09-25

Cisplatin (CP) and oxaliplatin (OX), platinum-based drugs used widely in chemotherapy, form adducts on intrastrand guanines (5'GG) in genomic DNA. DNA damage recognition proteins, transcription factors, mismatch repair proteins, and DNA polymerases discriminate between CP- and OX-GG DNA adducts, which could partly account for differences in the efficacy, toxicity, and mutagenicity of CP and OX. In addition, differential recognition of CP- and OX-GG adducts is highly dependent on the sequence context of the Pt-GG adduct. In particular, DNA binding protein domain HMGB1a binds to CP-GG DNA adducts with up to 53-fold greater affinity than to OX-GG adducts in the TGGA sequence context but shows much smaller differences in binding in the AGGC or TGGT sequence contexts. Here, simulations of the HMGB1a-Pt-DNA complex in the three sequence contexts revealed a higher number of interface contacts for the CP-DNA complex in the TGGA sequence context than in the OX-DNA complex. However, the number of interface contacts was similar in the TGGT and AGGC sequence contexts. The higher number of interface contacts in the CP-TGGA sequence context corresponded to a larger roll of the Pt-GG base pair step. Furthermore, geometric analysis of stacking of phenylalanine 37 in HMGB1a (Phe37) with the platinated guanines revealed more favorable stacking modes correlated with a larger roll of the Pt-GG base pair step in the TGGA sequence context. These data are consistent with our previous molecular dynamics simulations showing that the CP-TGGA complex was able to sample larger roll angles than the OX-TGGA complex or either CP- or OX-DNA complexes in the AGGC or TGGT sequences. We infer that the high binding affinity of HMGB1a for CP-TGGA is due to the greater flexibility of CP-TGGA compared to OX-TGGA and other Pt-DNA adducts. This increased flexibility is reflected in the ability of CP-TGGA to sample larger roll angles, which allows for a higher number of interface contacts between the Pt-DNA adduct and HMGB1a.
High-speed all-optical DNA local sequence alignment based on a three-dimensional artificial neural network.

PubMed

Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra

2017-07-01

This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.
Sequetyping: Serotyping Streptococcus pneumoniae by a Single PCR Sequencing Strategy

PubMed Central

Leung, Marcus H.; Bryson, Kevin; Freystatter, Kathrin; Pichon, Bruno; Edwards, Giles; Gillespie, Stephen H.

2012-01-01

The introduction of pneumococcal conjugate vaccines necessitates continued monitoring of circulating strains to assess vaccine efficacy and replacement serotypes. Conventional serological methods are costly, labor-intensive, and prone to misidentification, while current DNA-based methods have limited serotype coverage requiring multiple PCR primers. In this study, a computer algorithm was developed to interrogate the capsulation locus (cps) of vaccine serotypes to locate primer pairs in conserved regions that border variable regions and could differentiate between serotypes. In silico analysis of cps from 92 serotypes indicated that a primer pair spanning the regulatory gene cpsB could putatively amplify 84 serotypes and differentiate 46. This primer set was specific to Streptococcus pneumoniae, with no amplification observed for other species, including S. mitis, S. oralis, and S. pseudopneumoniae. One hundred thirty-eight pneumococcal strains covering 48 serotypes were tested. Of 23 vaccine serotypes included in the study, most (19/22, 86%) were identified correctly at least to the serogroup level, including all of the 13-valent conjugate vaccine and other replacement serotypes. Reproducibility was demonstrated by the correct sequetyping of different strains of a serotype. This novel sequence-based method employing a single PCR primer pair is cost-effective and simple. Furthermore, it has the potential to identify new serotypes that may evolve in the future. PMID:22553238
Expressed Sequence Reference Standards for Evaluating Stage-specific Gene Expression in Southern Green Lacewings, Chrysoperla rufilabris

USDA-ARS?s Scientific Manuscript database

Five developmental stages of Chrysoperla rufilabris were tested using nine primer pairs. Three sequences were highly expressed at all life stages and six were differentially expressed. These primer pairs may be used as standards to quantitate functional gene expression associated with physiological ...
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gao, X.; Patel, D.J.

The authors report on two-dimensional proton NMR studies of echinomycin complexes with the self-complementary d(A1-C2-G3-Tr) and d(T1-C2-G3-A4) duplexes in aqueous solution. The exchangeable and nonexchangeable antibiotic and nucleic acid protons in the 1 echinomycin per tetranucleotide duplex complexes have been assigned from analyses of scalar coupling and distance connectivities in two-dimensional data sets records in H/sub 2/O and D/sub 2/O solution. An analysis of the intermolecular NOE patterns for both complexes combined with large upfield imino proton and large downfield phosphorus complexation chemical shift changes demonstrates that the two quinoxaline chromophores of echinomycin bisintercalate into the minor groove surrounding themore » dC-dG step of each tetranucleotide duplex. Further, the quinoxaline rings selectively stack between A1 and C2 bases in the d(ACGT) complex and between T1 and C2 bases in the d(TCGA) complex. The intermolecular NOE patterns and the base and sugar proton chemical shifts for residues C2 and G3 are virtually identical for the d(ACGT) and d(TCGA) complexes. A large set of intermolecular contacts established from nuclear Overhauser effects (NOEs) between antibiotic and nucleic acid protons in the echinomycin-tetranucleotide complexes in solution are consistent with corresponding contacts reported for echinomycin-oligonucleotide complexes in the crystalline state. The authors demonstrate that the G x G base pairs adopt Watson-Crick pairing in both d(ACGT) and d(TCGA) complexes in solution. By contrast, the A1 x T4 base pairs adopt Hoogsteen pairing for the echinomycin-d(A1-C2-G3-Tr) complex while the T1 x A4 base pairs adopt Watson-Crick pairing for the echinomycin-d(T1-C2-G3-A4) complex in aqueous solution. These results emphasize the role of sequence in discriminating between Watson-Crick and Hoogsteen pairs at base pairs flanking the echinomycin bisintercalation site in solution.« less
Sequencing of the amylopullulanase (apu) gene of Thermoanaerobacter ethanolicus 39E, and identification of the active site by site-directed mutagenesis.

PubMed

Mathupala, S P; Lowe, S E; Podkovyrov, S M; Zeikus, J G

1993-08-05

The complete nucleotide sequence of the gene encoding the dual active amylopullulanase of Thermoanaerobacter ethanolicus 39E (formerly Clostridium thermohydrosulfuricum) was determined. The structural gene (apu) contained a single open reading frame 4443 base pairs in length, corresponding to 1481 amino acids, with an estimated molecular weight of 162,780. Analysis of the deduced sequence of apu with sequences of alpha-amylases and alpha-1,6 debranching enzymes enabled the identification of four conserved regions putatively involved in substrate binding and in catalysis. The conserved regions were localized within a 2.9-kilobase pair gene fragment, which encoded a M(r) 100,000 protein that maintained the dual activities and thermostability of the native enzyme. The catalytic residues of amylopullulanase were tentatively identified by using hydrophobic cluster analysis for comparison of amino acid sequences of amylopullulanase and other amylolytic enzymes. Asp597, Glu626, and Asp703 were individually modified to their respective amide form, or the alternate acid form, and in all cases both alpha-amylase and pullulanase activities were lost, suggesting the possible involvement of 3 residues in a catalytic triad, and the presence of a putative single catalytic site within the enzyme. These findings substantiate amylopullulanase as a new type of amylosaccharidase.
Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

PubMed Central

Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

2013-01-01

Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
A SSR-based composite genetic linkage map for the cultivated peanut (Arachis hypogaea L.) genome

PubMed Central

2010-01-01

Background The construction of genetic linkage maps for cultivated peanut (Arachis hypogaea L.) has and continues to be an important research goal to facilitate quantitative trait locus (QTL) analysis and gene tagging for use in a marker-assisted selection in breeding. Even though a few maps have been developed, they were constructed using diploid or interspecific tetraploid populations. The most recently published intra-specific map was constructed from the cross of cultivated peanuts, in which only 135 simple sequence repeat (SSR) markers were sparsely populated in 22 linkage groups. The more detailed linkage map with sufficient markers is necessary to be feasible for QTL identification and marker-assisted selection. The objective of this study was to construct a genetic linkage map of cultivated peanut using simple sequence repeat (SSR) markers derived primarily from peanut genomic sequences, expressed sequence tags (ESTs), and by "data mining" sequences released in GenBank. Results Three recombinant inbred lines (RILs) populations were constructed from three crosses with one common female parental line Yueyou 13, a high yielding Spanish market type. The four parents were screened with 1044 primer pairs designed to amplify SSRs and 901 primer pairs produced clear PCR products. Of the 901 primer pairs, 146, 124 and 64 primer pairs (markers) were polymorphic in these populations, respectively, and used in genotyping these RIL populations. Individual linkage maps were constructed from each of the three populations and a composite map based on 93 common loci were created using JoinMap. The composite linkage maps consist of 22 composite linkage groups (LG) with 175 SSR markers (including 47 SSRs on the published AA genome maps), representing the 20 chromosomes of A. hypogaea. The total composite map length is 885.4 cM, with an average marker density of 5.8 cM. Segregation distortion in the 3 populations was 23.0%, 13.5% and 7.8% of the markers, respectively. These distorted loci tended to cluster on LG1, LG3, LG4 and LG5. There were only 15 EST-SSR markers mapped due to low polymorphism. By comparison, there were potential synteny, collinear order of some markers and conservation of collinear linkage groups among the maps and with the AA genome but not fully conservative. Conclusion A composite linkage map was constructed from three individual mapping populations with 175 SSR markers in 22 composite linkage groups. This composite genetic linkage map is among the first "true" tetraploid peanut maps produced. This map also consists of 47 SSRs that have been used in the published AA genome maps, and could be used in comparative mapping studies. The primers described in this study are PCR-based markers, which are easy to share for genetic mapping in peanuts. All 1044 primer pairs are provided as additional files and the three RIL populations will be made available to public upon request for quantitative trait loci (QTL) analysis and linkage map improvement. PMID:20105299

Problem-Solving Test: Conditional Gene Targeting Using the Cre/loxP Recombination System

ERIC Educational Resources Information Center

Szeberényi, József

2013-01-01

Terms to be familiar with before you start to solve the test: gene targeting, knock-out mutation, bacteriophage, complementary base-pairing, homologous recombination, deletion, transgenic organisms, promoter, polyadenylation element, transgene, DNA replication, RNA polymerase, Shine-Dalgarno sequence, restriction endonuclease, polymerase chain…
Large scale DNA microsequencing device

DOEpatents

Foote, Robert S.

1997-01-01

A microminiature sequencing apparatus and method provide means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus comprises a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means.
Large scale DNA microsequencing device

DOEpatents

Foote, Robert S.

1999-01-01

A microminiature sequencing apparatus and method provide means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus comprises a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means.
Large scale DNA microsequencing device

DOEpatents

Foote, R.S.

1999-08-31

A microminiature sequencing apparatus and method provide means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus comprises a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means. 11 figs.
Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

PubMed Central

Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

1985-01-01

The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815
A reference human genome dataset of the BGISEQ-500 sequencer.

PubMed

Huang, Jie; Liang, Xinming; Xuan, Yuankai; Geng, Chunyu; Li, Yuxiang; Lu, Haorong; Qu, Shoufang; Mei, Xianglin; Chen, Hongbo; Yu, Ting; Sun, Nan; Rao, Junhua; Wang, Jiahao; Zhang, Wenwei; Chen, Ying; Liao, Sha; Jiang, Hui; Liu, Xin; Yang, Zhaopeng; Mu, Feng; Gao, Shangxian

2017-05-01

BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform. © The Authors 2017. Published by Oxford University Press.
BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing

PubMed Central

Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph

2011-01-01

Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797
Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs.

PubMed

Auch, Alexander F; Klenk, Hans-Peter; Göker, Markus

2010-01-28

DNA-DNA hybridization (DDH) is a widely applied wet-lab technique to obtain an estimate of the overall similarity between the genomes of two organisms. To base the species concept for prokaryotes ultimately on DDH was chosen by microbiologists as a pragmatic approach for deciding about the recognition of novel species, but also allowed a relatively high degree of standardization compared to other areas of taxonomy. However, DDH is tedious and error-prone and first and foremost cannot be used to incrementally establish a comparative database. Recent studies have shown that in-silico methods for the comparison of genome sequences can be used to replace DDH. Considering the ongoing rapid technological progress of sequencing methods, genome-based prokaryote taxonomy is coming into reach. However, calculating distances between genomes is dependent on multiple choices for software and program settings. We here provide an overview over the modifications that can be applied to distance methods based in high-scoring segment pairs (HSPs) or maximally unique matches (MUMs) and that need to be documented. General recommendations on determining HSPs using BLAST or other algorithms are also provided. As a reference implementation, we introduce the GGDC web server (http://ggdc.gbdp.org).
A comparative study of sequence- and structure-based features of small RNAs and other RNAs of bacteria.

PubMed

Barik, Amita; Das, Santasabuj

2018-01-02

Small RNAs (sRNAs) in bacteria have emerged as key players in transcriptional and post-transcriptional regulation of gene expression. Here, we present a statistical analysis of different sequence- and structure-related features of bacterial sRNAs to identify the descriptors that could discriminate sRNAs from other bacterial RNAs. We investigated a comprehensive and heterogeneous collection of 816 sRNAs, identified by northern blotting across 33 bacterial species and compared their various features with other classes of bacterial RNAs, such as tRNAs, rRNAs and mRNAs. We observed that sRNAs differed significantly from the rest with respect to G+C composition, normalized minimum free energy of folding, motif frequency and several RNA-folding parameters like base-pairing propensity, Shannon entropy and base-pair distance. Based on the selected features, we developed a predictive model using Random Forests (RF) method to classify the above four classes of RNAs. Our model displayed an overall predictive accuracy of 89.5%. These findings would help to differentiate bacterial sRNAs from other RNAs and further promote prediction of novel sRNAs in different bacterial species.
Enlightenment of Yeast Mitochondrial Homoplasmy: Diversified Roles of Gene Conversion

PubMed Central

Ling, Feng; Mikawa, Tsutomu; Shibata, Takehiko

2011-01-01

Mitochondria have their own genomic DNA. Unlike the nuclear genome, each cell contains hundreds to thousands of copies of mitochondrial DNA (mtDNA). The copies of mtDNA tend to have heterogeneous sequences, due to the high frequency of mutagenesis, but are quickly homogenized within a cell (“homoplasmy”) during vegetative cell growth or through a few sexual generations. Heteroplasmy is strongly associated with mitochondrial diseases, diabetes and aging. Recent studies revealed that the yeast cell has the machinery to homogenize mtDNA, using a common DNA processing pathway with gene conversion; i.e., both genetic events are initiated by a double-stranded break, which is processed into 3′ single-stranded tails. One of the tails is base-paired with the complementary sequence of the recipient double-stranded DNA to form a D-loop (homologous pairing), in which repair DNA synthesis is initiated to restore the sequence lost by the breakage. Gene conversion generates sequence diversity, depending on the divergence between the donor and recipient sequences, especially when it occurs among a number of copies of a DNA sequence family with some sequence variations, such as in immunoglobulin diversification in chicken. MtDNA can be regarded as a sequence family, in which the members tend to be diversified by a high frequency of spontaneous mutagenesis. Thus, it would be interesting to determine why and how double-stranded breakage and D-loop formation induce sequence homogenization in mitochondria and sequence diversification in nuclear DNA. We will review the mechanisms and roles of mtDNA homoplasmy, in contrast to nuclear gene conversion, which diversifies gene and genome sequences, to provide clues toward understanding how the common DNA processing pathway results in such divergent outcomes. PMID:24710143
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

PubMed

Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C

2012-09-11

Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.
A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.

PubMed

Luczak, Brian B; James, Benjamin T; Girgis, Hani Z

2017-12-06

Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences. We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover's distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover's distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours. The source code of the benchmarking tool is available as Supplementary Materials. © The Author 2017. Published by Oxford University Press.
Dual Priming Oligonucleotides for Broad-Range Amplification of the Bacterial 16S rRNA Gene Directly from Human Clinical Specimens

PubMed Central

Simmon, Keith; Karaca, Dilek; Langeland, Nina; Wiker, Harald G.

2012-01-01

Broad-range amplification and sequencing of the bacterial 16S rRNA gene directly from clinical specimens are offered as a diagnostic service in many laboratories. One major pitfall is primer cross-reactivity with human DNA which will result in mixed chromatograms. Mixed chromatograms will complicate subsequent sequence analysis and impede identification. In SYBR green real-time PCR assays, it can also affect crossing threshold values and consequently the status of a specimen as positive or negative. We evaluated two conventional primer pairs in common use and a new primer pair based on the dual priming oligonucleotide (DPO) principle. Cross-reactivity was observed when both conventional primer pairs were used, resulting in interpretation difficulties. No cross-reactivity was observed using the DPOs even in specimens with a high ratio of human to bacterial DNA. In addition to reducing cross-reactivity, the DPO principle also offers a high degree of flexibility in the design of primers and should be considered for any PCR assay intended for detection and identification of pathogens directly from human clinical specimens. PMID:22278843
Unit-length line-1 transcripts in human teratocarcinoma cells.

PubMed Central

Skowronski, J; Fanning, T G; Singer, M F

1988-01-01

We have characterized the approximately 6.5-kilobase cytoplasmic poly(A)+ Line-1 (L1) RNA present in a human teratocarcinoma cell line, NTera2D1, by primer extension and by analysis of cloned cDNAs. The bulk of the RNA begins (5' end) at the residue previously identified as the 5' terminus of the longest known primate genomic L1 elements, presumed to represent "unit" length. Several of the cDNA clones are close to 6 kilobase pairs, that is, close to full length. The partial sequences of 18 cDNA clones and full sequence of one (5,975 base pairs) indicate that many different genomic L1 elements contribute transcripts to the 6.5-kilobase cytoplasmic poly(A)+ RNA in NTera2D1 cells because no 2 of the 19 cDNAs analyzed had identical sequences. The transcribed elements appear to represent a subset of the total genomic L1s, a subset that has a characteristic consensus sequence in the 3' noncoding region and a high degree of sequence conservation throughout. Two open reading frames (ORFs) of 1,122 (ORF1) and 3,852 (ORF2) bases, flanked by about 800 and 200 bases of sequence at the 5' and 3' ends, respectively, can be identified in the cDNAs. Both ORFs are in the same frame, and they are separated by 33 bases bracketed by two conserved in-frame stop codons. ORF 2 is interrupted by at least one randomly positioned stop codon in the majority of the cDNAs. The data support proposals suggesting that the human L1 family includes one or more functional genes as well as an extraordinarily large number of pseudogenes whose ORFs are broken by stop codons. The cDNA structures suggest that both genes and pseudogenes are transcribed. At least one of the cDNAs (cD11), which was sequenced in its entirety, could, in principle, represent an mRNA for production of the ORF1 polypeptide. The similarity of mammalian L1s to several recently described invertebrate movable elements defines a new widely distributed class of elements which we term class II retrotransposons. Images PMID:2454389
Targeting the Mevalonate Pathway to Reduce Mortality from Ovarian Cancer

DTIC Science & Technology

2017-12-01

at cis-regulatory elements such as enhancers to facilitate gene transcription. CRISPR /Cas9- mediated ablation of a putative Meis1 enhancer carrying...Tables S4 and S5. 10 Cancer Cell 30, 1–16, July 11, 2016the CRISPR /Cas9-based genomic editing technology. Cas9 and a pair of single guide RNAs (sgRNA... CRISPR /Cas9-mediated deletio sgMeis1, a pair of sgRNAs that target the DMR boundaries. (N) Sequencing of the genomic PCR products from F2/R2 primers shows
ProtPhylo: identification of protein-phenotype and protein-protein functional associations via phylogenetic profiling.

PubMed

Cheng, Yiming; Perocchi, Fabiana

2015-07-01

ProtPhylo is a web-based tool to identify proteins that are functionally linked to either a phenotype or a protein of interest based on co-evolution. ProtPhylo infers functional associations by comparing protein phylogenetic profiles (co-occurrence patterns of orthology relationships) for more than 9.7 million non-redundant protein sequences from all three domains of life. Users can query any of 2048 fully sequenced organisms, including 1678 bacteria, 255 eukaryotes and 115 archaea. In addition, they can tailor ProtPhylo to a particular kind of biological question by choosing among four main orthology inference methods based either on pair-wise sequence comparisons (One-way Best Hits and Best Reciprocal Hits) or clustering of orthologous proteins across multiple species (OrthoMCL and eggNOG). Next, ProtPhylo ranks phylogenetic neighbors of query proteins or phenotypic properties using the Hamming distance as a measure of similarity between pairs of phylogenetic profiles. Candidate hits can be easily and flexibly prioritized by complementary clues on subcellular localization, known protein-protein interactions, membrane spanning regions and protein domains. The resulting protein list can be quickly exported into a csv text file for further analyses. ProtPhylo is freely available at http://www.protphylo.org. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Promoters, toll like receptors and microRNAs: a strange association.

PubMed

Korla, Kalyani; Arrigo, Patrizio; Mitra, Chanchal K

2013-06-01

Toll-like receptors (TLRs) are proteins that play key role in the innate immune system. In the present study, -1000 base pairs upstream are taken from the transcription start site of the various TLR genes (10 known) in human. About 40 microRNAs have been identified that share 12-19 nucleotide sequence similarity with the promoter regions of 10 TLRs. It is proposed that the microRNA performs potential role in identification of promoter sequence and initiation of transcription.
Structure, replication efficiency and fragility of yeast ARS elements.

PubMed

Dhar, Manoj K; Sehgal, Shelly; Kaul, Sanjana

2012-05-01

DNA replication in eukaryotes initiates at specific sites known as origins of replication, or replicators. These replication origins occur throughout the genome, though the propensity of their occurrence depends on the type of organism. In eukaryotes, zones of initiation of replication spanning from about 100 to 50,000 base pairs have been reported. The characteristics of eukaryotic replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where some autonomously replicating sequences, or ARS elements, confer origin activity. ARS elements are short DNA sequences of a few hundred base pairs, identified by their efficiency at initiating a replication event when cloned in a plasmid. ARS elements, although structurally diverse, maintain a basic structure composed of three domains, A, B and C. Domain A is comprised of a consensus sequence designated ACS (ARS consensus sequence), while the B domain has the DNA unwinding element and the C domain is important for DNA-protein interactions. Although there are ∼400 ARS elements in the yeast genome, not all of them are active origins of replication. Different groups within the genus Saccharomyces have ARS elements as components of replication origin. The present paper provides a comprehensive review of various aspects of ARSs, starting from their structural conservation to sequence thermodynamics. All significant and conserved functional sequence motifs within different types of ARS elements have been extensively described. Issues like silencing at ARSs, their inherent fragility and factors governing their replication efficiency have also been addressed. Progress in understanding crucial components associated with the replication machinery and timing at these ARS elements is discussed in the section entitled "The replicon revisited". Copyright © 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically

PubMed Central

Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel

2015-01-01

Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082
SAAS-CNV: A Joint Segmentation Approach on Aggregated and Allele Specific Signals for the Identification of Somatic Copy Number Alterations with Next-Generation Sequencing Data.

PubMed

Zhang, Zhongyang; Hao, Ke

2015-11-01

Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity.

SAAS-CNV: A Joint Segmentation Approach on Aggregated and Allele Specific Signals for the Identification of Somatic Copy Number Alterations with Next-Generation Sequencing Data

PubMed Central

Zhang, Zhongyang; Hao, Ke

2015-01-01

Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity. PMID:26583378
Geomfinder: a multi-feature identifier of similar three-dimensional protein patterns: a ligand-independent approach.

PubMed

Núñez-Vivanco, Gabriel; Valdés-Jiménez, Alejandro; Besoaín, Felipe; Reyes-Parada, Miguel

2016-01-01

Since the structure of proteins is more conserved than the sequence, the identification of conserved three-dimensional (3D) patterns among a set of proteins, can be important for protein function prediction, protein clustering, drug discovery and the establishment of evolutionary relationships. Thus, several computational applications to identify, describe and compare 3D patterns (or motifs) have been developed. Often, these tools consider a 3D pattern as that described by the residues surrounding co-crystallized/docked ligands available from X-ray crystal structures or homology models. Nevertheless, many of the protein structures stored in public databases do not provide information about the location and characteristics of ligand binding sites and/or other important 3D patterns such as allosteric sites, enzyme-cofactor interaction motifs, etc. This makes necessary the development of new ligand-independent methods to search and compare 3D patterns in all available protein structures. Here we introduce Geomfinder, an intuitive, flexible, alignment-free and ligand-independent web server for detailed estimation of similarities between all pairs of 3D patterns detected in any two given protein structures. We used around 1100 protein structures to form pairs of proteins which were assessed with Geomfinder. In these analyses each protein was considered in only one pair (e.g. in a subset of 100 different proteins, 50 pairs of proteins can be defined). Thus: (a) Geomfinder detected identical pairs of 3D patterns in a series of monoamine oxidase-B structures, which corresponded to the effectively similar ligand binding sites at these proteins; (b) we identified structural similarities among pairs of protein structures which are targets of compounds such as acarbose, benzamidine, adenosine triphosphate and pyridoxal phosphate; these similar 3D patterns are not detected using sequence-based methods; (c) the detailed evaluation of three specific cases showed the versatility of Geomfinder, which was able to discriminate between similar and different 3D patterns related to binding sites of common substrates in a range of diverse proteins. Geomfinder allows detecting similar 3D patterns between any two pair of protein structures, regardless of the divergency among their amino acids sequences. Although the software is not intended for simultaneous multiple comparisons in a large number of proteins, it can be particularly useful in cases such as the structure-based design of multitarget drugs, where a detailed analysis of 3D patterns similarities between a few selected protein targets is essential.
Clonal origins and parallel evolution of regionally synchronous colorectal adenoma and carcinoma.

PubMed

Kim, Tae-Min; An, Chang Hyeok; Rhee, Je-Keun; Jung, Seung-Hyun; Lee, Sung Hak; Baek, In-Pyo; Kim, Min Sung; Lee, Sug Hyung; Chung, Yeun-Jun

2015-09-29

Although the colorectal adenoma-to-carcinoma sequence represents a classical cancer progression model, the evolution of the mutational landscape underlying this model is not fully understood. In this study, we analyzed eight synchronous pairs of colorectal high-grade adenomas and carcinomas, four microsatellite-unstable (MSU) and four-stable (MSS) pairs, using whole-exome sequencing. In the MSU adenoma-carcinoma pairs, we observed no subclonal mutations in adenomas that became fixed in paired carcinomas, suggesting a 'parallel' evolution of synchronous adenoma-to-carcinoma, rather than a 'stepwise' evolution. The abundance of indel (in MSU and MSS pairs) and microsatellite instability (in MSU pairs) was noted in the later adenoma- or carcinoma-specific mutations, indicating that the mutational processes and functional constraints operative in early and late colorectal carcinogenesis are different. All MSU cases exhibited clonal, truncating mutations in ACVR2A, TGFBR2, and DNA mismatch repair genes, but none were present in APC or KRAS. In three MSS pairs, both APC and KRAS mutations were identified as both early and clonal events, often accompanying clonal copy number changes. An MSS case uniquely exhibited clonal ERBB2 amplification, followed by APC and TP53 mutations as carcinoma-specific events. Along with the previously unrecognized clonal origins of synchronous colorectal adenoma-carcinoma pairs, our study revealed that the preferred sequence of mutational events during colorectal carcinogenesis can be context-dependent.
A Simple and Efficient Method for Assembling TALE Protein Based on Plasmid Library

PubMed Central

Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying

2013-01-01

DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate. PMID:23840477
A simple and efficient method for assembling TALE protein based on plasmid library.

PubMed

Zhang, Zhiqiang; Li, Duo; Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying

2013-01-01

DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate.
Roles of the amino group of purine bases in the thermodynamic stability of DNA base pairing.

PubMed

Nakano, Shu-ichi; Sugimoto, Naoki

2014-08-05

The energetic aspects of hydrogen-bonded base-pair interactions are important for the design of functional nucleotide analogs and for practical applications of oligonucleotides. The present study investigated the contribution of the 2-amino group of DNA purine bases to the thermodynamic stability of oligonucleotide duplexes under different salt and solvent conditions, using 2'-deoxyriboinosine (I) and 2'-deoxyribo-2,6-diaminopurine (D) as non-canonical nucleotides. The stability of DNA duplexes was changed by substitution of a single base pair in the following order: G • C > D • T ≈ I • C > A • T > G • T > I • T. The apparent stabilization energy due to the presence of the 2-amino group of G and D varied depending on the salt concentration, and decreased in the water-ethanol mixed solvent. The effects of salt concentration on the thermodynamics of DNA duplexes were found to be partially sequence-dependent, and the 2-amino group of the purine bases might have an influence on the binding of ions to DNA through the formation of a stable base-paired structure. Our results also showed that physiological salt conditions were energetically favorable for complementary base recognition, and conversely, low salt concentration media and ethanol-containing solvents were effective for low stringency oligonucleotide hybridization, in the context of conditions employed in this study.
Information Entropy of Influenza A Segment 7

NASA Astrophysics Data System (ADS)

Thompson, William A.; Fan, Shaohua; Weltman, Joel K.

2008-12-01

Information entropy (H) is a measure of uncertainty at each position within in a sequence of nucleotides.H was used to characterize a set of influenza A segment 7 nucleotide sequences. Nucleotide locations of high entropy were identified near the 5’ start of all of the sequences and the sequences were assigned to subsets according to synonymous nucleotide variants at those positions: either uracil at position six (U6), cytosine at position six (C6), adenine (A12) at position 12, guanine at position 12 (G12), adenine at position 15 (A15) or cytosine (C15) at position 15. H values were found to be correlated/corresponding (Kendall tau) along the lengths of the nucleotide segments of the subset pairs at each position. However, the H values of each subset of sequences were statistically distinguishable from those of the other member of the pair (Kolmogorov-Smirnov test). The joint probability of uncorrelated distributions of U6 and C6 sequences to viral subtypes and to viral host species was 34 times greater than for the A12:G12 subset pair and 214 times greater than for the A15:C15 pair. This result indicates that the high entropy position six of segment 7 is either a reporter or a sentinel location. The fact that not one of the H5N1 sequences in the dataset was a member of the C6 subset, but all 125 H5N1 sequences are members of the U6 subset suggests a non-random sentinel function.
Whole-genome sequencing of Salmonella enterica subsp. enterica serovar Cubana strains isolated from agricultural sources

USDA-ARS?s Scientific Manuscript database

We report draft genomes of Salmonella enterica subsp. enterica Serovar Cubana strain CVM42234 isolated from chick feed in 2012 and Salmonella Cubana strain 76814 isolated from swine in 2004. The genome sizes are 4,975,046 and 4,936,251 base pairs, respectively....
Pangenome and taxonomic analysis of Salmonella enterica subspecies enterica

USDA-ARS?s Scientific Manuscript database

Salmonella enterica subspecies enterica (S. enterica ssp. I) contains almost all the major pathogens in this genus. We sequenced 354 new S. enterica ssp. I genomes using paired end 100 base reads to ~80-fold coverage. These genomes were chosen to maximize genetic diversity, representing at least 100...
Characterization of 14 microsatellite markers for genetic analysis and cultivar identification of walnut

USDA-ARS?s Scientific Manuscript database

One hundred and forty-seven primer pairs originally designed to amplify microsatellites, also known as simple sequence repeats (SSR), in black walnut (Juglans nigra L.) were screened for utility in persian walnut (J. regia L.). Based on scorability and number of informative polymorphisms, the best 1...
Draft genome sequence of ramie, Boehmeria nivea (L.) Gaudich.

PubMed

Luan, Ming-Bao; Jian, Jian-Bo; Chen, Ping; Chen, Jun-Hui; Chen, Jian-Hua; Gao, Qiang; Gao, Gang; Zhou, Ju-Hong; Chen, Kun-Mei; Guang, Xuan-Min; Chen, Ji-Kang; Zhang, Qian-Qian; Wang, Xiao-Fei; Fang, Long; Sun, Zhi-Min; Bai, Ming-Zhou; Fang, Xiao-Dong; Zhao, Shan-Cen; Xiong, He-Ping; Yu, Chun-Ming; Zhu, Ai-Guo

2018-05-01

Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal-contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired-end and mate-pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole-genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein-coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single-copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single-copy gene families and one-to-one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae. © 2018 John Wiley & Sons Ltd.
Swellix: a computational tool to explore RNA conformational space.

PubMed

Sloat, Nathan; Liu, Jui-Wen; Schroeder, Susan J

2017-11-21

The sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and thus also determines the overall shape and function of an RNA. The Swellix program presented here combines a helix abstraction with a combinatorial approach to the RNA folding problem in order to compute all possible non-pseudoknotted RNA structures for RNA sequences. The Swellix program builds on the Crumple program and can include experimental constraints on global RNA structures such as the minimum number and lengths of helices from crystallography, cryoelectron microscopy, or in vivo crosslinking and chemical probing methods. The conceptual advance in Swellix is to count helices and generate all possible combinations of helices rather than counting and combining base pairs. Swellix bundles similar helices and includes improvements in memory use and efficient parallelization. Biological applications of Swellix are demonstrated by computing the reduction in conformational space and entropy due to naturally modified nucleotides in tRNA sequences and by motif searches in Human Endogenous Retroviral (HERV) RNA sequences. The Swellix motif search reveals occurrences of protein and drug binding motifs in the HERV RNA ensemble that do not occur in minimum free energy or centroid predicted structures. Swellix presents significant improvements over Crumple in terms of efficiency and memory use. The efficient parallelization of Swellix enables the computation of sequences as long as 418 nucleotides with sufficient experimental constraints. Thus, Swellix provides a practical alternative to free energy minimization tools when multiple structures, kinetically determined structures, or complex RNA-RNA and RNA-protein interactions are present in an RNA folding problem.
rRNA Gene Internal Transcribed Spacer 1 and 2 Sequences of Asexual, Anthropophilic Dermatophytes Related to Trichophyton rubrum

PubMed Central

Summerbell, R. C.; Haugland, R. A.; Li, A.; Gupta, A. K.

1999-01-01

The ribosomal region spanning the two internal transcribed spacer (ITS) regions and the 5.8S ribosomal DNA region was sequenced for asexual, anthropophilic dermatophyte species with morphological similarity to Trichophyton rubrum, as well as for members of the three previously delineated, related major clades in the T. mentagrophytes complex. Representative isolates of T. raubitschekii, T. fischeri, and T. kanei were found to have ITS sequences identical to that of T. rubrum. The ITS sequences of T. soudanense and T. megninii differed from that of T. rubrum by only a small number of base pairs. Their continued status as species, however, appears to meet criteria outlined in the population genetics-based cohesion species concept of A. R. Templeton. The ITS sequence of T. tonsurans differed from that of the biologically distinct T. equinum by only 1 bp, while the ITS sequence of the recently described species T. krajdenii had a sequence identical to that of T. mentagrophytes isolates related to the teleomorph Arthroderma vanbreuseghemii. PMID:10565922
Riboswitch-based sensor in low optical background

NASA Astrophysics Data System (ADS)

Harbaugh, Svetlana V.; Davidson, Molly E.; Chushak, Yaroslav G.; Kelley-Loughnane, Nancy; Stone, Morley O.

2008-08-01

Riboswitches are a type of natural genetic control element that use untranslated sequence in the RNA to recognize and bind to small molecules that regulate expression of that gene. Creation of synthetic riboswitches to novel ligands depends on the ability to screen for analyte binding sensitivity and specificity. In our work, we have coupled a synthetic riboswitch to an optical reporter assay based on fluorescence resonance energy transfer (FRET) between two genetically-coded fluorescent proteins. Specifically, a theophylline-sensitive riboswitch was placed upstream of the Tobacco Etch Virus (TEV) protease coding sequence, and a FRET-based construct, BFP-eGFP or eGFP-REACh, was linked by a peptide encoding the recognition sequence for TEV protease. Cells expressing the riboswitch showed a marked optical difference in fluorescence emission in the presence of theophylline. However, the BFP-eGFP FRET pair posses significant optical background that reduces the sensitivity of a FRET-based assay. To improve the optical assay, we designed a nonfluorescent yellow fluorescent protein (YFP) mutant called REACh (for Resonance Energy-Accepting Chromoprotein) as the FRET acceptor for eGFP. The advantage of using an eGFP-REACh pair is the elimination of acceptor fluorescence which leads to an improved detection of FRET via better signal-to-noise ratio. The EGFP-REACh fusion protein was constructed with the TEV protease cleavage site; thus upon TEV translation, cleavage occurs diminishing REACh quenching and increasing eGFP emission resulting in a 4.5-fold improvement in assay sensitivity.
Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires.

PubMed

DeKosky, Brandon J; Lungu, Oana I; Park, Daechan; Johnson, Erik L; Charab, Wissam; Chrysostomou, Constantine; Kuroda, Daisuke; Ellington, Andrew D; Ippolito, Gregory C; Gray, Jeffrey J; Georgiou, George

2016-05-10

Elucidating how antigen exposure and selection shape the human antibody repertoire is fundamental to our understanding of B-cell immunity. We sequenced the paired heavy- and light-chain variable regions (VH and VL, respectively) from large populations of single B cells combined with computational modeling of antibody structures to evaluate sequence and structural features of human antibody repertoires at unprecedented depth. Analysis of a dataset comprising 55,000 antibody clusters from CD19(+)CD20(+)CD27(-) IgM-naive B cells, >120,000 antibody clusters from CD19(+)CD20(+)CD27(+) antigen-experienced B cells, and >2,000 RosettaAntibody-predicted structural models across three healthy donors led to a number of key findings: (i) VH and VL gene sequences pair in a combinatorial fashion without detectable pairing restrictions at the population level; (ii) certain VH:VL gene pairs were significantly enriched or depleted in the antigen-experienced repertoire relative to the naive repertoire; (iii) antigen selection increased antibody paratope net charge and solvent-accessible surface area; and (iv) public heavy-chain third complementarity-determining region (CDR-H3) antibodies in the antigen-experienced repertoire showed signs of convergent paired light-chain genetic signatures, including shared light-chain third complementarity-determining region (CDR-L3) amino acid sequences and/or Vκ,λ-Jκ,λ genes. The data reported here address several longstanding questions regarding antibody repertoire selection and development and provide a benchmark for future repertoire-scale analyses of antibody responses to vaccination and disease.
RUCS: rapid identification of PCR primers for unique core sequences.

PubMed

Thomsen, Martin Christen Frølund; Hasman, Henrik; Westh, Henrik; Kaya, Hülya; Lund, Ole

2017-12-15

Designing PCR primers to target a specific selection of whole genome sequenced strains can be a long, arduous and sometimes impractical task. Such tasks would benefit greatly from an automated tool to both identify unique targets, and to validate the vast number of potential primer pairs for the targets in silico. Here we present RUCS, a program that will find PCR primer pairs and probes for the unique core sequences of a positive genome dataset complement to a negative genome dataset. The resulting primer pairs and probes are in addition to simple selection also validated through a complex in silico PCR simulation. We compared our method, which identifies the unique core sequences, against an existing tool called ssGeneFinder, and found that our method was 6.5-20 times more sensitive. We used RUCS to design primer pairs that would target a set of genomes known to contain the mcr-1 colistin resistance gene. Three of the predicted pairs were chosen for experimental validation using PCR and gel electrophoresis. All three pairs successfully produced an amplicon with the target length for the samples containing mcr-1 and no amplification products were produced for the negative samples. The novel methods presented in this manuscript can reduce the time needed to identify target sequences, and provide a quick virtual PCR validation to eliminate time wasted on ambiguously binding primers. Source code is freely available on https://bitbucket.org/genomicepidemiology/rucs. Web service is freely available on https://cge.cbs.dtu.dk/services/RUCS. mcft@cbs.dtu.dk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
An Efficient Approach for the Development of Locus Specific Primers in Bread Wheat (Triticum aestivum L.) and Its Application to Re-Sequencing of Genes Involved in Frost Tolerance

PubMed Central

Babben, Steve; Perovic, Dragan; Koch, Michael; Ordon, Frank

2015-01-01

Recent declines in costs accelerated sequencing of many species with large genomes, including hexaploid wheat (Triticum aestivum L.). Although the draft sequence of bread wheat is known, it is still one of the major challenges to developlocus specific primers suitable to be used in marker assisted selection procedures, due to the high homology of the three genomes. In this study we describe an efficient approach for the development of locus specific primers comprising four steps, i.e. (i) identification of genomic and coding sequences (CDS) of candidate genes, (ii) intron- and exon-structure reconstruction, (iii) identification of wheat A, B and D sub-genome sequences and primer development based on sequence differences between the three sub-genomes, and (iv); testing of primers for functionality, correct size and localisation. This approach was applied to single, low and high copy genes involved in frost tolerance in wheat. In summary for 27 of these genes for which sequences were derived from Triticum aestivum, Triticum monococcum and Hordeum vulgare, a set of 119 primer pairs was developed and after testing on Nulli-tetrasomic (NT) lines, a set of 65 primer pairs (54.6%), corresponding to 19 candidate genes, turned out to be specific. Out of these a set of 35 fragments was selected for validation via Sanger's amplicon re-sequencing. All fragments, with the exception of one, could be assigned to the original reference sequence. The approach presented here showed a much higher specificity in primer development in comparison to techniques used so far in bread wheat and can be applied to other polyploid species with a known draft sequence. PMID:26565976
Energy Landscape and Pathways for Transitions between Watson-Crick and Hoogsteen Base Pairing in DNA.

PubMed

Chakraborty, Debayan; Wales, David J

2018-01-04

The recent discovery that Hoogsteen (HG) base pairs are widespread in DNA across diverse sequences and positional contexts could have important implications for understanding DNA replication and DNA-protein recognition. While evidence is emerging that the Hoogsteen conformation could be a thermodynamically accessible conformation of the DNA duplex and provide a means to expand its functionality, relatively little is known about the molecular mechanism underlying the Watson-Crick (WC) to HG transition. In this Perspective, we describe pathways and kinetics for this transition at an atomic level of detail, using the energy landscape perspective. We show that competition between the duplex conformations results in a double funnel landscape, which explains some recent experimental observations. The interconversion pathways feature a number of intermediates, with a variable number of WC and HG base pairs. The relatively slow kinetics, with possible deviations from two-state behavior, suggest that this conformational switch is likely to be a challenging target for both simulation and experiment.
Statistical use of argonaute expression and RISC assembly in microRNA target identification.

PubMed

Stanhope, Stephen A; Sengupta, Srikumar; den Boon, Johan; Ahlquist, Paul; Newton, Michael A

2009-09-01

MicroRNAs (miRNAs) posttranscriptionally regulate targeted messenger RNAs (mRNAs) by inducing cleavage or otherwise repressing their translation. We address the problem of detecting m/miRNA targeting relationships in homo sapiens from microarray data by developing statistical models that are motivated by the biological mechanisms used by miRNAs. The focus of our modeling is the construction, activity, and mediation of RNA-induced silencing complexes (RISCs) competent for targeted mRNA cleavage. We demonstrate that regression models accommodating RISC abundance and controlling for other mediating factors fit the expression profiles of known target pairs substantially better than models based on m/miRNA expressions alone, and lead to verifications of computational target pair predictions that are more sensitive than those based on marginal expression levels. Because our models are fully independent of exogenous results from sequence-based computational methods, they are appropriate for use as either a primary or secondary source of information regarding m/miRNA target pair relationships, especially in conjunction with high-throughput expression studies.
Optimization of single-base-pair mismatch discrimination in oligonucleotide microarrays

NASA Technical Reports Server (NTRS)

Urakawa, Hidetoshi; El Fantroussi, Said; Smidt, Hauke; Smoot, James C.; Tribou, Erik H.; Kelly, John J.; Noble, Peter A.; Stahl, David A.

2003-01-01

The discrimination between perfect-match and single-base-pair-mismatched nucleic acid duplexes was investigated by using oligonucleotide DNA microarrays and nonequilibrium dissociation rates (melting profiles). DNA and RNA versions of two synthetic targets corresponding to the 16S rRNA sequences of Staphylococcus epidermidis (38 nucleotides) and Nitrosomonas eutropha (39 nucleotides) were hybridized to perfect-match probes (18-mer and 19-mer) and to a set of probes having all possible single-base-pair mismatches. The melting profiles of all probe-target duplexes were determined in parallel by using an imposed temperature step gradient. We derived an optimum wash temperature for each probe and target by using a simple formula to calculate a discrimination index for each temperature of the step gradient. This optimum corresponded to the output of an independent analysis using a customized neural network program. These results together provide an experimental and analytical framework for optimizing mismatch discrimination among all probes on a DNA microarray.

Design factors that influence PCR amplification success of cross-species primers among 1147 mammalian primer pairs

PubMed Central

Housley, Donna JE; Zalewski, Zachary A; Beckett, Stephanie E; Venta, Patrick J

2006-01-01

Background Cross-species primers have been used with moderate success to address a variety of questions concerning genome structure, evolution, and gene function. However, the factors affecting their success have never been adequately addressed, particularly with respect to producing a consistent method to achieve high throughput. Using 1,147 mammalian cross-species primer pairs (1089 not previously reported), we tested several factors to determine their influence on the probability that a given target will amplify in a given species under a single amplification condition. These factors included: number of mismatches between the two species (the index species) used to identify conserved regions to which the primers were designed, GC-content of the gene and amplified region, CpG dinucleotides in the primer region, degree of encoded protein conservation, length of the primers, and the degree of evolutionary distance between the target species and the two index species. Results The amplification success rate for the cross-species primers was significantly influenced by the number of mismatches between the two index species (6–8% decrease per mismatch in a primer pair), the GC-content within the amplified region (for the dog, GC ≥ 50%, 56.9% amplified; GC<50%, 74.2% amplified), the degree of protein conservation (R2 = 0.14) and the relatedness of the target species to the index species. For the dog, 598 products of 930 primer pairs (64.3%) (excluding primers in which dog was an index species) were sequenced and shown to be the expected product, with an additional three percent producing the incorrect sequence. When hamster DNA was used with the single amplification condition in a microtiter plate-based format, 510 of 1087 primer pairs (46.9%) produced amplified products. The primer pairs are spaced at an average distance of 2.3 Mb in the human genome and may be used to produce up to several hundred thousand bp of species-specific sequence. Conclusion The most important factors influencing the proportion of successful amplifications are the number of index species mismatches, GC-richness of the target amplimer, and the relatedness of the target species to the index species, at least under the single PCR condition used. The 1147 cross-species primer pairs can be used in a high throughput manner to generate data for studies on the genetics and genomics of non-sequenced mammalian genomes. PMID:17029642
Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

PubMed

Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

2014-01-01

Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.
DNA Nucleotide Sequence Restricted by the RI Endonuclease

PubMed Central

Hedgpeth, Joe; Goodman, Howard M.; Boyer, Herbert W.

1972-01-01

The sequence of DNA base pairs adjacent to the phosphodiester bonds cleaved by the RI restriction endonuclease in unmodified DNA from coliphage λ has been determined. The 5′-terminal nucleotide labeled with 32P and oligonucleotides up to the heptamer were analyzed from a pancreatic DNase digest. The following sequence of nucleotides adjacent to the RI break made in λ DNA was deduced from these data and from the 3′-dinucleotide sequence and nearest-neighbor analysis obtained from repair synthesis with the DNA polymerase of Rous sarcoma virus [Formula: see text] The RI endonuclease cleavage of the phosphodiester bonds (indicated by arrows) generates 5′-phosphoryls and short cohesive termini of four nucleotides, pApApTpT. The most striking feature of the sequence is its symmetry. PMID:4343974
Genomic Signal Processing Methods for Computation of Alignment-Free Distances from DNA Sequences

PubMed Central

Borrayo, Ernesto; Mendizabal-Ruiz, E. Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P.; Morales, J. Alejandro

2014-01-01

Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments. PMID:25393409
Crystallization and preliminary X-ray diffraction analysis of a self-complementary DNA heptacosamer with a 20-base-pair duplex flanked by seven-nucleotide overhangs at the 3;-terminus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yeo, Hyun Koo; Lee, Jae Young

2012-04-18

The self-complementary DNA heptacosamer (a 27-mer oligonucleotide) with sequence d(CGAGCACTGCGCAGTGCTCGTTGTTAT) forms a 20-base-pair duplex flanked by seven-nucleotide overhangs at the 3'-terminus. Crystals of the oligonucleotide were obtained by sitting-drop vapor diffusion and diffracted to 2.8 {angstrom} resolution. The oligonucleotide was crystallized at 277 K using polyethylene glycol as a precipitant in the presence of magnesium chloride. The crystals belonged to the triclinic space group, with unit-cell parameters a = 48.74, b = 64.23, c = 79.34 {angstrom}, {alpha} = 91.37, {beta} = 93.21, {gamma} = 92.35{sup o}.
Crystallization and preliminary X-ray diffraction analysis of a self-complementary DNA heptacosamer with a 20-base-pair duplex flanked by seven-nucleotide overhangs at the 3'-terminus.

PubMed

Yeo, Hyun Koo; Lee, Jae Young

2010-05-01

The self-complementary DNA heptacosamer (a 27-mer oligonucleotide) with sequence d(CGAGCACTGCGCAGTGCTCGTTGTTAT) forms a 20-base-pair duplex flanked by seven-nucleotide overhangs at the 3'-terminus. Crystals of the oligonucleotide were obtained by sitting-drop vapour diffusion and diffracted to 2.8 A resolution. The oligonucleotide was crystallized at 277 K using polyethylene glycol as a precipitant in the presence of magnesium chloride. The crystals belonged to the triclinic space group, with unit-cell parameters a = 48.74, b = 64.23, c = 79.34 A, alpha = 91.37, beta = 93.21, gamma = 92.35 degrees .
Novel techniques for data decomposition and load balancing for parallel processing of vision systems: Implementation and evaluation using a motion estimation system

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

1989-01-01

Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
Development of SCoT-Based SCAR Marker for Rapid Authentication of Taxus Media.

PubMed

Hao, Juan; Jiao, Kaili; Yu, Chenliang; Guo, Hong; Zhu, Yujia; Yang, Xiao; Zhang, Siyang; Zhang, Lei; Feng, Shangguo; Song, Yaobin; Dong, Ming; Wang, Huizhong; Shen, Chenjia

2018-06-01

Taxus media is an important species in the family Taxaceae with high medicinal and commercial value. Overexploitation and illegal trade have led T. media to a severe threat of extinction. In addition, T. media and other Taxus species have similar morphological traits and are easily misidentified, particularly during the seedling stage. The purpose of this study is to develop a species-specific marker for T. media. Through a screening of 36 start codon targeted (SCoT) polymorphism primers, among 15 individuals of 4 Taxus species (T. media, T. chinensis, T. cuspidate and T. fuana), a clear species-specific DNA fragment (amplified by primer SCoT3) for T. media was identified. After isolation and sequencing, a DNA sequence with 530 bp was obtained. Based on this DNA fragment, a primer pair for the sequence-characterized amplified region marker was designed and named MHSF/MHSR. PCR analysis with primer pair MHSF/MHSR revealed a clear amplified band for all individuals of T. media but not for T. chinensis, T. cuspidate and T. fuana. Therefore, this marker can be used as a quick, efficient and reliable tool to identify T. media among other related Taxus species. The results of this study will lay an important foundation for the protection and management of T. media as a natural resource.
Mapping the zebrafish brain methylome using reduced representation bisulfite sequencing

PubMed Central

Chatterjee, Aniruddha; Ozaki, Yuichi; Stockwell, Peter A; Horsfield, Julia A; Morison, Ian M; Nakagawa, Shinichi

2013-01-01

Reduced representation bisulfite sequencing (RRBS) has been used to profile DNA methylation patterns in mammalian genomes such as human, mouse and rat. The methylome of the zebrafish, an important animal model, has not yet been characterized at base-pair resolution using RRBS. Therefore, we evaluated the technique of RRBS in this model organism by generating four single-nucleotide resolution DNA methylomes of adult zebrafish brain. We performed several simulations to show the distribution of fragments and enrichment of CpGs in different in silico reduced representation genomes of zebrafish. Four RRBS brain libraries generated 98 million sequenced reads and had higher frequencies of multiple mapping than equivalent human RRBS libraries. The zebrafish methylome indicates there is higher global DNA methylation in the zebrafish genome compared with its equivalent human methylome. This observation was confirmed by RRBS of zebrafish liver. High coverage CpG dinucleotides are enriched in CpG island shores more than in the CpG island core. We found that 45% of the mapped CpGs reside in gene bodies, and 7% in gene promoters. This analysis provides a roadmap for generating reproducible base-pair level methylomes for zebrafish using RRBS and our results provide the first evidence that RRBS is a suitable technique for global methylation analysis in zebrafish. PMID:23975027
Molecular cloning of an inducible serine esterase gene from human cytotoxic lymphocytes.

PubMed Central

Trapani, J A; Klein, J L; White, P C; Dupont, B

1988-01-01

A cDNA clone encoding a human serine esterase gene was isolated from a library constructed from poly(A)+ RNA of allogeneically stimulated, interleukin 2-expanded peripheral blood mononuclear cells. The clone, designated HSE26.1, represents a full-length copy of a 0.9-kilobase mRNA present in human cytotoxic cells but absent from a wide variety of noncytotoxic cell lines. Clone HSE26.1 contains an 892-base-pair sequence, including a single 741-base-pair open reading frame encoding a putative 247-residue polypeptide. The first 20 amino acids of the polypeptide form a leader sequence. The mature protein is predicted to have an unglycosylated Mr of approximately equal to 26,000 and contains a single potential site for N-linked glycosylation. The nucleotide and predicted amino acid sequences of clone HSE26.1 are homologous with all murine and human serine esterases cloned thus far but are most similar to mouse granzyme B (70% nucleotide and 68% amino acid identity). HSE26.1 protein is expressed weakly in unstimulated peripheral blood mononuclear cells but is strongly induced within 6-hr incubation in medium containing phytohemagglutinin. The data suggest that the protein encoded by HSE26.1 plays a role in cell-mediated cytotoxicity. Images PMID:3261871
Identification of immunity-related genes in the larvae of Protaetia brevitarsis seulensis (Coleoptera: Cetoniidae) by a next-generation sequencing-based transcriptome analysis.

PubMed

Bang, Kyeongrin; Hwang, Sejung; Lee, Jiae; Cho, Saeyoull

2015-01-01

To identify immune-related genes in the larvae of white-spotted flower chafers, next-generation sequencing was conducted with an Illumina HiSeq2000, resulting in 100 million cDNA reads with sequence information from over 10 billion base pairs (bp) and >50× transcriptome coverage. A subset of 77,336 contigs was created, and ∼35,532 sequences matched entries against the NCBI nonredundant database (cutoff, e < 10(-5)). Statistical analysis was performed on the 35,532 contigs. For profiling of the immune response, samples were analyzed by aligning 42 base sequence tags to the de novo reference assembly, comparing levels in immunized larvae to control levels of expression. Of the differentially expressed genes, 3,440 transcripts were upregulated and 3,590 transcripts were downregulated. Many of these genes were confirmed as immune-related genes such as pattern recognition proteins, immune-related signal transduction proteins, antimicrobial peptides, and cellular response proteins, by comparison to published data. © The Author 2015. Published by Oxford University Press on behalf of the Entomological Society of America.
Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species.

PubMed

Wang, Xiao-Wei; Zhao, Qiong-Yi; Luan, Jun-Bo; Wang, Yu-Jun; Yan, Gen-Hong; Liu, Shu-Sheng

2012-10-04

Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED), respectively. More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp) and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84%) and much higher than that of MEAM1 and MED (0.83%). This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for the investigation of association between allelic and phenotypes. Our data present the most comprehensive sequences for the indigenous whitefly species Asia II 3. The extensive comparisons of Asia II 3, MEAM1 and MED transcriptomes will serve as an invaluable resource for revealing the genetic basis of whitefly invasion and the molecular mechanisms underlying their biological differences.
Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species

PubMed Central

2012-01-01

Background Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED), respectively. Results More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp) and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84%) and much higher than that of MEAM1 and MED (0.83%). This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for the investigation of association between allelic and phenotypes. Conclusions Our data present the most comprehensive sequences for the indigenous whitefly species Asia II 3. The extensive comparisons of Asia II 3, MEAM1 and MED transcriptomes will serve as an invaluable resource for revealing the genetic basis of whitefly invasion and the molecular mechanisms underlying their biological differences. PMID:23036081
Cyanine-based probe\\tag-peptide pair fluorescence protein imaging and fluorescence protein imaging methods

DOEpatents

Mayer-Cumblidge, M. Uljana; Cao, Haishi

2013-01-15

A molecular probe comprises two arsenic atoms and at least one cyanine based moiety. A method of producing a molecular probe includes providing a molecule having a first formula, treating the molecule with HgOAc, and subsequently transmetallizing with AsCl.sub.3. The As is liganded to ethanedithiol to produce a probe having a second formula. A method of labeling a peptide includes providing a peptide comprising a tag sequence and contacting the peptide with a biarsenical molecular probe. A complex is formed comprising the tag sequence and the molecular probe. A method of studying a peptide includes providing a mixture containing a peptide comprising a peptide tag sequence, adding a biarsenical probe to the mixture, and monitoring the fluorescence of the mixture.
Cyanine-based probe\\tag-peptide pair for fluorescence protein imaging and fluorescence protein imaging methods

DOEpatents

Mayer-Cumblidge, M Uljana [Richland, WA; Cao, Haishi [Richland, WA

2010-08-17

A molecular probe comprises two arsenic atoms and at least one cyanine based moiety. A method of producing a molecular probe includes providing a molecule having a first formula, treating the molecule with HgOAc, and subsequently transmetallizing with AsCl.sub.3. The As is liganded to ethanedithiol to produce a probe having a second formula. A method of labeling a peptide includes providing a peptide comprising a tag sequence and contacting the peptide with a biarsenical molecular probe. A complex is formed comprising the tag sequence and the molecular probe. A method of studying a peptide includes providing a mixture containing a peptide comprising a peptide tag sequence, adding a biarsenical probe to the mixture, and monitoring the fluorescence of the mixture.
Toward rules relating zinc finger protein sequences and DNA binding site preferences.

PubMed

Desjarlais, J R; Berg, J M

1992-08-15

Zinc finger proteins of the Cys2-His2 type consist of tandem arrays of domains, where each domain appears to contact three adjacent base pairs of DNA through three key residues. We have designed and prepared a series of variants of the central zinc finger within the DNA binding domain of Sp1 by using information from an analysis of a large data base of zinc finger protein sequences. Through systematic variations at two of the three contact positions (underlined), relatively specific recognition of sequences of the form 5'-GGGGN(G or T)GGG-3' has been achieved. These results provide the basis for rules that may develop into a code that will allow the design of zinc finger proteins with preselected DNA site specificity.
Characterization of GM events by insert knowledge adapted re-sequencing approaches

PubMed Central

Yang, Litao; Wang, Congmao; Holst-Jensen, Arne; Morisset, Dany; Lin, Yongjun; Zhang, Dabing

2013-01-01

Detection methods and data from molecular characterization of genetically modified (GM) events are needed by stakeholders of public risk assessors and regulators. Generally, the molecular characteristics of GM events are incomprehensively revealed by current approaches and biased towards detecting transformation vector derived sequences. GM events are classified based on available knowledge of the sequences of vectors and inserts (insert knowledge). Herein we present three insert knowledge-adapted approaches for characterization GM events (TT51-1 and T1c-19 rice as examples) based on paired-end re-sequencing with the advantages of comprehensiveness, accuracy, and automation. The comprehensive molecular characteristics of two rice events were revealed with additional unintended insertions comparing with the results from PCR and Southern blotting. Comprehensive transgene characterization of TT51-1 and T1c-19 is shown to be independent of a priori knowledge of the insert and vector sequences employing the developed approaches. This provides an opportunity to identify and characterize also unknown GM events. PMID:24088728
Characterization of GM events by insert knowledge adapted re-sequencing approaches.

PubMed

Yang, Litao; Wang, Congmao; Holst-Jensen, Arne; Morisset, Dany; Lin, Yongjun; Zhang, Dabing

2013-10-03

Detection methods and data from molecular characterization of genetically modified (GM) events are needed by stakeholders of public risk assessors and regulators. Generally, the molecular characteristics of GM events are incomprehensively revealed by current approaches and biased towards detecting transformation vector derived sequences. GM events are classified based on available knowledge of the sequences of vectors and inserts (insert knowledge). Herein we present three insert knowledge-adapted approaches for characterization GM events (TT51-1 and T1c-19 rice as examples) based on paired-end re-sequencing with the advantages of comprehensiveness, accuracy, and automation. The comprehensive molecular characteristics of two rice events were revealed with additional unintended insertions comparing with the results from PCR and Southern blotting. Comprehensive transgene characterization of TT51-1 and T1c-19 is shown to be independent of a priori knowledge of the insert and vector sequences employing the developed approaches. This provides an opportunity to identify and characterize also unknown GM events.
Structural energetics of the adenine tract from an intrinsic transcription terminator.

PubMed

Huang, Yuegao; Weng, Xiaoli; Russu, Irina M

2010-04-02

Intrinsic transcription termination sites generally contain a tract of adenines in the DNA template that yields a tract of uracils at the 3' end of the nascent RNA. To understand how this base sequence contributes to termination of transcription, we have investigated two nucleic acid structures. The first is the RNA-DNA hybrid that contains the uracil tract 5'-rUUUUUAU-3' from the tR2 intrinsic terminator of bacteriophage lambda. The second is the homologous DNA-DNA duplex that contains the adenine tract 5'-dATAAAAA-3'. This duplex is present at the tR2 site when the DNA is not transcribed. The opening and the stability of each rU-dA/dT-dA base pair in the two structures are characterized by imino proton exchange and nuclear magnetic resonance spectroscopy. The results reveal concerted opening of the central rU-dA base pairs in the RNA-DNA hybrid. Furthermore, the stability profile of the adenine tract in the RNA-DNA hybrid is very different from that of the tract in the template DNA-DNA duplex. In the RNA-DNA hybrid, the stabilities of rU-dA base pairs range from 4.3 to 6.5 kcal/mol (at 10 degrees C). The sites of lowest stability are identified at the central positions of the tract. In the template DNA-DNA duplex, the dT-dA base pairs are more stable than the corresponding rU-dA base pairs in the hybrid by 0.9 to 4.6 kcal/mol and, in contrast to the RNA-DNA hybrid, the central base pairs have the highest stability. These results suggest that the central rU-dA/dT-dA base pairs in the adenine tract make the largest energetic contributions to transcription termination by promoting both the dissociation of the RNA transcript and the closing of the transcription bubble. The results also suggest that the high stability of dT-dA base pairs in the DNA provides a signal for the pausing of RNA polymerase at the termination site. Copyright 2010 Elsevier Ltd. All rights reserved.
Hepatozoon parasites (Apicomplexa: Adeleorina) in bats.

PubMed

Pinto, C Miguel; Helgen, Kristofer M; Fleischer, Robert C; Perkins, Susan L

2013-08-01

We provide the first evidence of Hepatozoon parasites infecting bats. We sequenced a short fragment of the 18S rRNA gene (~600 base pairs) of Hepatozoon parasites from 3 Hipposideros cervinus bats from Borneo. Phylogenies inferred by model-based methods place these Hepatozoon within a clade formed by parasites of reptiles, rodents, and marsupials. We discuss the scenario that bats might be common hosts of Hepatozoon.

Autofluorescence microscopy for paired-matched morphological and molecular identification of individual chigger mites (Acari: Trombiculidae), the vectors of scrub typhus.

PubMed

Kumlert, Rawadee; Chaisiri, Kittipong; Anantatat, Tippawan; Stekolnikov, Alexandr A; Morand, Serge; Prasartvit, Anchana; Makepeace, Benjamin L; Sungvornyothin, Sungsit; Paris, Daniel H

2018-01-01

Conventional gold standard characterization of chigger mites involves chemical preparation procedures (i.e. specimen clearing) for visualization of morphological features, which however contributes to destruction of the arthropod host DNA and any endosymbiont or pathogen DNA harbored within the specimen. In this study, a novel work flow based on autofluorescence microscopy was developed to enable identification of trombiculid mites to the species level on the basis of morphological traits without any special preparation, while preserving the mite DNA for subsequent genotyping. A panel of 16 specifically selected fluorescence microscopy images of mite features from available identification keys served for complete chigger morphological identification to the species level, and was paired with corresponding genotype data. We evaluated and validated this method for paired chigger morphological and genotypic ID using the mitochondrial cytochrome c oxidase subunit I gene (coi) in 113 chigger specimens representing 12 species and 7 genera (Leptotrombidium, Ascoschoengastia, Gahrliepia, Walchia, Blankaartia, Schoengastia and Schoutedenichia) from the Lao People's Democratic Republic (Lao PDR) to the species level (complete characterization), and 153 chiggers from 5 genera (Leptotrombidium, Ascoschoengastia, Helenicula, Schoengastiella and Walchia) from Thailand, Cambodia and Lao PDR to the genus level. A phylogenetic tree constructed from 77 coi gene sequences (approximately 640 bp length, n = 52 new coi sequences and n = 25 downloaded from GenBank), demonstrated clear grouping of assigned morphotypes at the genus levels, although evidence of both genetic polymorphism and morphological plasticity was found. With this new methodology, we provided the largest collection of characterized coi gene sequences for trombiculid mites to date, and almost doubled the number of available characterized coi gene sequences with a single study. The ability to provide paired phenotypic-genotypic data is of central importance for future characterization of mites and dissecting the molecular epidemiology of mites transmitting diseases like scrub typhus.
Autofluorescence microscopy for paired-matched morphological and molecular identification of individual chigger mites (Acari: Trombiculidae), the vectors of scrub typhus

PubMed Central

Chaisiri, Kittipong; Anantatat, Tippawan; Stekolnikov, Alexandr A.; Morand, Serge; Prasartvit, Anchana; Makepeace, Benjamin L.; Sungvornyothin, Sungsit; Paris, Daniel H.

2018-01-01

Background Conventional gold standard characterization of chigger mites involves chemical preparation procedures (i.e. specimen clearing) for visualization of morphological features, which however contributes to destruction of the arthropod host DNA and any endosymbiont or pathogen DNA harbored within the specimen. Methodology/Principal findings In this study, a novel work flow based on autofluorescence microscopy was developed to enable identification of trombiculid mites to the species level on the basis of morphological traits without any special preparation, while preserving the mite DNA for subsequent genotyping. A panel of 16 specifically selected fluorescence microscopy images of mite features from available identification keys served for complete chigger morphological identification to the species level, and was paired with corresponding genotype data. We evaluated and validated this method for paired chigger morphological and genotypic ID using the mitochondrial cytochrome c oxidase subunit I gene (coi) in 113 chigger specimens representing 12 species and 7 genera (Leptotrombidium, Ascoschoengastia, Gahrliepia, Walchia, Blankaartia, Schoengastia and Schoutedenichia) from the Lao People’s Democratic Republic (Lao PDR) to the species level (complete characterization), and 153 chiggers from 5 genera (Leptotrombidium, Ascoschoengastia, Helenicula, Schoengastiella and Walchia) from Thailand, Cambodia and Lao PDR to the genus level. A phylogenetic tree constructed from 77 coi gene sequences (approximately 640 bp length, n = 52 new coi sequences and n = 25 downloaded from GenBank), demonstrated clear grouping of assigned morphotypes at the genus levels, although evidence of both genetic polymorphism and morphological plasticity was found. Conclusions/Significance With this new methodology, we provided the largest collection of characterized coi gene sequences for trombiculid mites to date, and almost doubled the number of available characterized coi gene sequences with a single study. The ability to provide paired phenotypic-genotypic data is of central importance for future characterization of mites and dissecting the molecular epidemiology of mites transmitting diseases like scrub typhus. PMID:29494599
Development of chloroplast simple sequence repeats (cpSSRs) for the intraspecific study of Gracilaria tenuistipitata (Gracilariales, Rhodophyta) from different populations

PubMed Central

2014-01-01

Background Gracilaria tenuistipitata is an agarophyte with substantial economic potential because of its high growth rate and tolerance to a wide range of environment factors. This red seaweed is intensively cultured in China for the production of agar and fodder for abalone. Microsatellite markers were developed from the chloroplast genome of G. tenuistipitata var. liui to differentiate G. tenuistipitata obtained from six different localities: four from Peninsular Malaysia, one from Thailand and one from Vietnam. Eighty G. tenuistipitata specimens were analyzed using eight simple sequence repeat (SSR) primer-pairs that we developed for polymerase chain reaction (PCR) amplification. Findings Five mononucleotide primer-pairs and one trinucleotide primer-pair exhibited monomorphic alleles, whereas the other two primer-pairs separated the G. tenuistipitata specimens into two main clades. G. tenuistipitata from Thailand and Vietnam were grouped into one clade, and the populations from Batu Laut, Middle Banks and Kuah (Malaysia) were grouped into another clade. The combined dataset of these two primer-pairs separated G. tenuistipitata obtained from Kelantan, Malaysia from that obtained from other localities. Conclusions Based on the variations in repeated nucleotides of microsatellite markers, our results suggested that the populations of G. tenuistipitata were distributed into two main geographical regions: (i) populations in the west coast of Peninsular Malaysia and (ii) populations facing the South China Sea. The correct identification of G. tenuistipitata strains with traits of high economic potential will be advantageous for the mass cultivation of seaweeds. PMID:24490797
PlantFuncSSR: Integrating First and Next Generation Transcriptomics for Mining of SSR-Functional Domains Markers

PubMed Central

Sablok, Gaurav; Pérez-Pulido, Antonio J.; Do, Thac; Seong, Tan Y.; Casimiro-Soriguer, Carlos S.; La Porta, Nicola; Ralph, Peter J.; Squartini, Andrea; Muñoz-Merida, Antonio; Harikrishna, Jennifer A.

2016-01-01

Analysis of repetitive DNA sequence content and divergence among the repetitive functional classes is a well-accepted approach for estimation of inter- and intra-generic differences in plant genomes. Among these elements, microsatellites, or Simple Sequence Repeats (SSRs), have been widely demonstrated as powerful genetic markers for species and varieties discrimination. We present PlantFuncSSRs platform having more than 364 plant species with more than 2 million functional SSRs. They are provided with detailed annotations for easy functional browsing of SSRs and with information on primer pairs and associated functional domains. PlantFuncSSRs can be leveraged to identify functional-based genic variability among the species of interest, which might be of particular interest in developing functional markers in plants. This comprehensive on-line portal unifies mining of SSRs from first and next generation sequencing datasets, corresponding primer pairs and associated in-depth functional annotation such as gene ontology annotation, gene interactions and its identification from reference protein databases. PlantFuncSSRs is freely accessible at: http://www.bioinfocabd.upo.es/plantssr. PMID:27446111
Homology-based Modeling of Rhodopsin-like Family Members in the Inactive State: Structural Analysis and Deduction of Tips for Modeling and Optimization.

PubMed

Pappalardo, Matteo; Rayan, Mahmoud; Abu-Lafi, Saleh; Leonardi, Martha E; Milardi, Danilo; Guccione, Salvatore; Rayan, Anwar

2017-08-01

Modeling G-Protein Coupled Receptors (GPCRs) is an emergent field of research, since utility of high-quality models in receptor structure-based strategies might facilitate the discovery of interesting drug candidates. The findings from a quantitative analysis of eighteen resolved structures of rhodopsin family "A" receptors crystallized with antagonists and 153 pairs of structures are described. A strategy termed endeca-amino acids fragmentation was used to analyze the structures models aiming to detect the relationship between sequence identity and Root Mean Square Deviation (RMSD) at each trans-membrane-domain. Moreover, we have applied the leave-one-out strategy to study the shiftiness likelihood of the helices. The type of correlation between sequence identity and RMSD was studied using the aforementioned set receptors as representatives of membrane proteins and 98 serine proteases with 4753 pairs of structures as representatives of globular proteins. Data analysis using fragmentation strategy revealed that there is some extent of correlation between sequence identity and global RMSD of 11AA width windows. However, spatial conservation is not always close to the endoplasmic side as was reported before. A comparative study with globular proteins shows that GPCRs have higher standard deviation and higher slope in the graph with correlation between sequence identity and RMSD. The extracted information disclosed in this paper could be incorporated in the modeling protocols while using technique for model optimization and refinement. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Len Gen: The international lentil genome sequencing project

USDA-ARS?s Scientific Manuscript database

We have been sequencing CDC Redberry using NGS of paired-end and mate-pair libraries over a wide range of sizes and technologies. The most recent draft (v0.7) of approximately 150x coverage produced scaffolds covering over half the genome (2.7 Gb of the expected 4.3 Gb). Long reads from PacBio sequ...
Large scale DNA microsequencing device

DOEpatents

Foote, R.S.

1997-08-26

A microminiature sequencing apparatus and method provide a means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus cosists of a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means. 17 figs.
Phylogenetics of the phlebotomine sand fly group Verrucarum (Diptera: Psychodidae: Lutzomyia).

PubMed

Cohnstaedt, Lee W; Beati, Lorenza; Caceres, Abraham G; Ferro, Cristina; Munstermann, Leonard E

2011-06-01

Within the sand fly genus Lutzomyia, the Verrucarum species group contains several of the principal vectors of American cutaneous leishmaniasis and human bartonellosis in the Andean region of South America. The group encompasses 40 species for which the taxonomic status, phylogenetic relationships, and role of each species in disease transmission remain unresolved. Mitochondrial cytochrome c oxidase I (COI) phylogenetic analysis of a 667-bp fragment supported the morphological classification of the Verrucarum group into series. Genetic sequences from seven species were grouped in well-supported monophyletic lineages. Four species, however, clustered in two paraphyletic lineages that indicate conspecificity--the Lutzomyia longiflocosa-Lutzomyia sauroida pair and the Lutzomyia quasitownsendi-Lutzomyia torvida pair. COI sequences were also evaluated as a taxonomic tool based on interspecific genetic variability within the Verrucarum group and the intraspecific variability of one of its members, Lutzomyia verrucarum, across its known distribution.
Cytochrome b sequences in black-crowned night-herons (Nycticorax nycticorax) from heronries exposed to genotoxic contaminants

USGS Publications Warehouse

Dahl, Christopher R.; Bickham, John W.; Wickliffe, Jeffery K.; Custer, Thomas W.

2001-01-01

DNA sequence analysis of a 215 base-pair region of the mitochondrial cytochrome b gene was used to examine genetic variation and search for evidence of an increased mutation rate in black-crowned night-herons. We examined five populations exposed to environmental contamination (primarily PAHs and PCBs) and one reference population from the eastern U.S. There was no evidence of a high mutation rate even within populations previously shown to exhibit increased variation in DNA content among somatic cells as a result of petroleum exposure. Three haplotypes were observed among 99 individuals. The low level of variability could be evidence for a genetic bottleneck, or that cytochrome b is too conservative for use in population genetic studies of this species. With the exception of one population from Louisiana, pair-wise Phist estimates were very low, indicative of little population structure and potentially high rates of effective migration among populations.
Phylogenetics of the Phlebotomine Sand Fly Group Verrucarum (Diptera: Psychodidae: Lutzomyia)

PubMed Central

Cohnstaedt, Lee W.; Beati, Lorenza; Caceres, Abraham G.; Ferro, Cristina; Munstermann, Leonard E.

2011-01-01

Within the sand fly genus Lutzomyia, the Verrucarum species group contains several of the principal vectors of American cutaneous leishmaniasis and human bartonellosis in the Andean region of South America. The group encompasses 40 species for which the taxonomic status, phylogenetic relationships, and role of each species in disease transmission remain unresolved. Mitochondrial cytochrome c oxidase I (COI) phylogenetic analysis of a 667-bp fragment supported the morphological classification of the Verrucarum group into series. Genetic sequences from seven species were grouped in well-supported monophyletic lineages. Four species, however, clustered in two paraphyletic lineages that indicate conspecificity—the Lutzomyia longiflocosa–Lutzomyia sauroida pair and the Lutzomyia quasitownsendi–Lutzomyia torvida pair. COI sequences were also evaluated as a taxonomic tool based on interspecific genetic variability within the Verrucarum group and the intraspecific variability of one of its members, Lutzomyia verrucarum, across its known distribution. PMID:21633028
Uncoupling of sgRNAs from their associated barcodes during PCR amplification of combinatorial CRISPR screens

PubMed Central

2018-01-01

Many implementations of pooled screens in mammalian cells rely on linking an element of interest to a barcode, with the latter subsequently quantitated by next generation sequencing. However, substantial uncoupling between these paired elements during lentiviral production has been reported, especially as the distance between elements increases. We detail that PCR amplification is another major source of uncoupling, and becomes more pronounced with increased amounts of DNA template molecules and PCR cycles. To lessen uncoupling in systems that use paired elements for detection, we recommend minimizing the distance between elements, using low and equal template DNA inputs for plasmid and genomic DNA during PCR, and minimizing the number of PCR cycles. We also present a vector design for conducting combinatorial CRISPR screens that enables accurate barcode-based detection with a single short sequencing read and minimal uncoupling. PMID:29799876
DNA Barcodes for Species Identification in the Hyperdiverse Ant Genus Pheidole (Formicidae: Myrmicinae)

PubMed Central

Ng'endo, R.N.; Osiemo, Z.B.; Brandl, R.

2013-01-01

DNA sequencing is increasingly being used to assist in species identification in order to overcome taxonomic impediment. However, few studies attempt to compare the results of these molecular studies with a more traditional species delineation approach based on morphological characters. Mitochondrial DNA Cytochrome oxidase subunit 1 (CO1) gene was sequenced, measuring 636 base pairs, from 47 ants of the genus Pheidole (Formicidae: Myrmicinae) collected in the Brazilian Atlantic Forest to test whether the morphology-based assignment of individuals into species is supported by DNA-based species delimitation. Twenty morphospecies were identified, whereas the barcoding analysis identified 19 Molecular Operational Taxonomic Units (MOTUs). Fifteen out of the 19 DNA-based clusters allocated, using sequence divergence thresholds of 2% and 3%, matched with morphospecies. Both thresholds yielded the same number of MOTUs. Only one MOTU was successfully identified to species level using the CO1 sequences of Pheidole species already in the Genbank. The average pairwise sequence divergence for all 47 sequences was 19%, ranging between 0–25%. In some cases, however, morphology and molecular based methods differed in their assignment of individuals to morphospecies or MOTUs. The occurrence of distinct mitochondrial lineages within morphological species highlights groups for further detailed genetic and morphological studies, and therefore a pluralistic approach using several methods to understand the taxonomy of difficult lineages is advocated. PMID:23902257
Efficient moving target analysis for inverse synthetic aperture radar images via joint speeded-up robust features and regular moment

NASA Astrophysics Data System (ADS)

Yang, Hongxin; Su, Fulin

2018-01-01

We propose a moving target analysis algorithm using speeded-up robust features (SURF) and regular moment in inverse synthetic aperture radar (ISAR) image sequences. In our study, we first extract interest points from ISAR image sequences by SURF. Different from traditional feature point extraction methods, SURF-based feature points are invariant to scattering intensity, target rotation, and image size. Then, we employ a bilateral feature registering model to match these feature points. The feature registering scheme can not only search the isotropic feature points to link the image sequences but also reduce the error matching pairs. After that, the target centroid is detected by regular moment. Consequently, a cost function based on correlation coefficient is adopted to analyze the motion information. Experimental results based on simulated and real data validate the effectiveness and practicability of the proposed method.
A Comparison of Two Measures of HIV Diversity in Multi-Assay Algorithms for HIV Incidence Estimation

PubMed Central

Cousins, Matthew M.; Konikoff, Jacob; Sabin, Devin; Khaki, Leila; Longosz, Andrew F.; Laeyendecker, Oliver; Celum, Connie; Buchbinder, Susan P.; Seage, George R.; Kirk, Gregory D.; Moore, Richard D.; Mehta, Shruti H.; Margolick, Joseph B.; Brown, Joelle; Mayer, Kenneth H.; Kobin, Beryl A.; Wheeler, Darrell; Justman, Jessica E.; Hodder, Sally L.; Quinn, Thomas C.; Brookmeyer, Ron; Eshleman, Susan H.

2014-01-01

Background Multi-assay algorithms (MAAs) can be used to estimate HIV incidence in cross-sectional surveys. We compared the performance of two MAAs that use HIV diversity as one of four biomarkers for analysis of HIV incidence. Methods Both MAAs included two serologic assays (LAg-Avidity assay and BioRad-Avidity assay), HIV viral load, and an HIV diversity assay. HIV diversity was quantified using either a high resolution melting (HRM) diversity assay that does not require HIV sequencing (HRM score for a 239 base pair env region) or sequence ambiguity (the percentage of ambiguous bases in a 1,302 base pair pol region). Samples were classified as MAA positive (likely from individuals with recent HIV infection) if they met the criteria for all of the assays in the MAA. The following performance characteristics were assessed: (1) the proportion of samples classified as MAA positive as a function of duration of infection, (2) the mean window period, (3) the shadow (the time period before sample collection that is being assessed by the MAA), and (4) the accuracy of cross-sectional incidence estimates for three cohort studies. Results The proportion of samples classified as MAA positive as a function of duration of infection was nearly identical for the two MAAs. The mean window period was 141 days for the HRM-based MAA and 131 days for the sequence ambiguity-based MAA. The shadows for both MAAs were <1 year. Both MAAs provided cross-sectional HIV incidence estimates that were very similar to longitudinal incidence estimates based on HIV seroconversion. Conclusions MAAs that include the LAg-Avidity assay, the BioRad-Avidity assay, HIV viral load, and HIV diversity can provide accurate HIV incidence estimates. Sequence ambiguity measures obtained using a commercially-available HIV genotyping system can be used as an alternative to HRM scores in MAAs for cross-sectional HIV incidence estimation. PMID:24968135
Phylogenomic Analyses and Reclassification of Species within the Genus Tsukamurella: Insights to Species Definition in the Post-genomic Era.

PubMed

Teng, Jade L L; Tang, Ying; Huang, Yi; Guo, Feng-Biao; Wei, Wen; Chen, Jonathan H K; Wong, Samson S Y; Lau, Susanna K P; Woo, Patrick C Y

2016-01-01

Owing to the highly similar phenotypic profiles, protein spectra and 16S rRNA gene sequences observed between three pairs of Tsukamurella species (Tsukamurella pulmonis/Tsukamurella spongiae, Tsukamurella tyrosinosolvens/Tsukamurella carboxy-divorans, and Tsukamurella pseudospumae/Tsukamurella sunchonensis), we hypothesize that and the six Tsukamurella species may have been misclassified and that there may only be three Tsukamurella species. In this study, we characterized the type strains of these six Tsukamurella species by tradition DNA-DNA hybridization (DDH) and "digital DDH" after genome sequencing to determine their exact taxonomic positions. Traditional DDH showed 81.2 ± 0.6% to 99.7 ± 1.0% DNA-DNA relatedness between the two Tsukamurella species in each of the three pairs, which was above the threshold for same species designation. "Digital DDH" based on Genome-To-Genome Distance Calculator and Average Nucleotide Identity for the three pairs also showed similarity results in the range of 82.3-92.9 and 98.1-99.1%, respectively, in line with results of traditional DDH. Based on these evidence and according to Rules 23a and 42 of the Bacteriological Code, we propose that T. spongiae Olson et al. 2007, should be reclassified as a later heterotypic synonym of T. pulmonis Yassin et al. 1996, T. carboxydivorans Park et al. 2009, as a later heterotypic synonym of T. tyrosinosolvens Yassin et al. 1997, and T. sunchonensis Seong et al. 2008 as a later heterotypic synonym of T. pseudospumae Nam et al. 2004. With the advancement of genome sequencing technologies, classification of bacterial species can be readily achieved by "digital DDH" than traditional DDH.
biobambam: tools for read pair collation based algorithms on BAM files

PubMed Central

2014-01-01

Background Sequence alignment data is often ordered by coordinate (id of the reference sequence plus position on the sequence where the fragment was mapped) when stored in BAM files, as this simplifies the extraction of variants between the mapped data and the reference or of variants within the mapped data. In this order paired reads are usually separated in the file, which complicates some other applications like duplicate marking or conversion to the FastQ format which require to access the full information of the pairs. Results In this paper we introduce biobambam, a set of tools based on the efficient collation of alignments in BAM files by read name. The employed collation algorithm avoids time and space consuming sorting of alignments by read name where this is possible without using more than a specified amount of main memory. Using this algorithm tasks like duplicate marking in BAM files and conversion of BAM files to the FastQ format can be performed very efficiently with limited resources. We also make the collation algorithm available in the form of an API for other projects. This API is part of the libmaus package. Conclusions In comparison with previous approaches to problems involving the collation of alignments by read name like the BAM to FastQ or duplication marking utilities our approach can often perform an equivalent task more efficiently in terms of the required main memory and run-time. Our BAM to FastQ conversion is faster than all widely known alternatives including Picard and bamUtil. Our duplicate marking is about as fast as the closest competitor bamUtil for small data sets and faster than all known alternatives on large and complex data sets.
A simple approach to the generation of heterologous competitive internal controls for real-time PCR assays on the LightCycler.

PubMed

Stöcher, Markus; Leb, Victoria; Hölzl, Gabriele; Berg, Jörg

2002-12-01

The real-time PCR technology allows convenient detection and quantification of virus derived DNA. This approach is used in many PCR based assays in clinical laboratories. Detection and quantification of virus derived DNA is usually performed against external controls or external standards. Thus, adequacy within a clinical sample is not monitored for. This can be achieved using internal controls that are co-amplified with the specific target within the same reaction vessel. We describe a convenient way to prepare heterologous internal controls as competitors for real-time PCR based assays. The internal controls were devised as competitors in real-time PCR, e.g. LightCycler-PCR. The bacterial neomycin phosphotransferase gene (neo) was used as source for heterologous DNA. Within the neo gene a box was chosen containing sequences for four differently spaced forward primers, one reverse primer, and a pair of neo specific hybridization probes. Pairs of primers were constructed to compose of virus-specific primer sequences and neo box specific primer sequences. Using those composite primers in conventional preparative PCR four types of internal controls were amplified from the neo box and subsequently cloned. A panel of the four differently sized internal controls was generated and tested by LightCycler PCR using their virus-specific primers. All four different PCR products were detected with the single pair of neo specific FRET-hybridization probes. The presented approach to generate competitive internal controls for use in LightCycler PCR assays proved convenient und rapid. The obtained internal controls match most PCR product sizes used in clinical routine molecular assays and will assist to discriminate true from false negative results.
Automated extraction and classification of RNA tertiary structure cyclic motifs

PubMed Central

Lemieux, Sébastien; Major, François

2006-01-01

A minimum cycle basis of the tertiary structure of a large ribosomal subunit (LSU) X-ray crystal structure was analyzed. Most cycles are small, as they are composed of 3- to 5 nt, and repeated across the LSU tertiary structure. We used hierarchical clustering to quantify and classify the 4 nt cycles. One class is defined by the GNRA tetraloop motif. The inspection of the GNRA class revealed peculiar instances in sequence. First is the presence of UA, CA, UC and CC base pairs that substitute the usual sheared GA base pair. Second is the revelation of GNR(Xn)A tetraloops, where Xn is bulged out of the classical GNRA structure, and of GN/RA formed by the two strands of interior-loops. We were able to unambiguously characterize the cycle classes using base stacking and base pairing annotations. The cycles identified correspond to small and cyclic motifs that compose most of the LSU RNA tertiary structure and contribute to its thermodynamic stability. Consequently, the RNA minimum cycles could well be used as the basic elements of RNA tertiary structure prediction methods. PMID:16679452
3DNALandscapes: a database for exploring the conformational features of DNA.

PubMed

Zheng, Guohui; Colasanti, Andrew V; Lu, Xiang-Jun; Olson, Wilma K

2010-01-01

3DNALandscapes, located at: http://3DNAscapes.rutgers.edu, is a new database for exploring the conformational features of DNA. In contrast to most structural databases, which archive the Cartesian coordinates and/or derived parameters and images for individual structures, 3DNALandscapes enables searches of conformational information across multiple structures. The database contains a wide variety of structural parameters and molecular images, computed with the 3DNA software package and known to be useful for characterizing and understanding the sequence-dependent spatial arrangements of the DNA sugar-phosphate backbone, sugar-base side groups, base pairs, base-pair steps, groove structure, etc. The data comprise all DNA-containing structures--both free and bound to proteins, drugs and other ligands--currently available in the Protein Data Bank. The web interface allows the user to link, report, plot and analyze this information from numerous perspectives and thereby gain insight into DNA conformation, deformability and interactions in different sequence and structural contexts. The data accumulated from known, well-resolved DNA structures can serve as useful benchmarks for the analysis and simulation of new structures. The collective data can also help to understand how DNA deforms in response to proteins and other molecules and undergoes conformational rearrangements.
Draft genome of the mountain pine beetle, Dendroctonus ponderosae Hopkins, a major forest pest

PubMed Central

2013-01-01

Background The mountain pine beetle, Dendroctonus ponderosae Hopkins, is the most serious insect pest of western North American pine forests. A recent outbreak destroyed more than 15 million hectares of pine forests, with major environmental effects on forest health, and economic effects on the forest industry. The outbreak has in part been driven by climate change, and will contribute to increased carbon emissions through decaying forests. Results We developed a genome sequence resource for the mountain pine beetle to better understand the unique aspects of this insect's biology. A draft de novo genome sequence was assembled from paired-end, short-read sequences from an individual field-collected male pupa, and scaffolded using mate-paired, short-read genomic sequences from pooled field-collected pupae, paired-end short-insert whole-transcriptome shotgun sequencing reads of mRNA from adult beetle tissues, and paired-end Sanger EST sequences from various life stages. We describe the cytochrome P450, glutathione S-transferase, and plant cell wall-degrading enzyme gene families important to the survival of the mountain pine beetle in its harsh and nutrient-poor host environment, and examine genome-wide single-nucleotide polymorphism variation. A horizontally transferred bacterial sucrose-6-phosphate hydrolase was evident in the genome, and its tissue-specific transcription suggests a functional role for this beetle. Conclusions Despite Coleoptera being the largest insect order with over 400,000 described species, including many agricultural and forest pest species, this is only the second genome sequence reported in Coleoptera, and will provide an important resource for the Curculionoidea and other insects. PMID:23537049

Some links on this page may take you to non-federal websites. Their policies may differ from this site.